SPSS, SAS and Stata data in R with haven

In this short article I introduce the haven package in R, and how it can be used for importing data from SPSS in SAV-file format into R. I already demonstrated this as part of a longer coding example, in which I animated spatial distribution of specimen detections as documented in a SAV-file. In previous posts I also demonstrated how to read in data in R from formats such as csv, json, xlsx and xml. I have also written articles covering sqlite3 in R and sqlite3 in Python. I furthermore demonstrated how one can connect a sqlite3 database engine to Python.

In this article I just simply want to demonstrate the haven package – in case you are wondering how to import SPSS data into R.

The haven package is useful for analysts working with SPSS, SAS and Stata

Analysts can use the “haven” package in R for reading and processing data from SAS, SPSS and Stata. I can use different functions for this, provided by the haven package: – read_sas() for reading data in SAS format – read_sav() for reading data in SAV-file format, from SPSS – read_dta() for reading in data dta-format from Stata.

Below I install the haven package and read in an exemplaric sav-file, using the read_sav function.

#install.packages("haven")
library(haven)
## Warning: package 'haven' was built under R version 3.6.3
data = read_sav("data.sav")
head(data)
## # A tibble: 6 x 35
##   CollectionID Country Countrycode Location Region Site  Latitude Longitude
##          <dbl> <chr>   <chr>       <chr>    <chr>  <chr>    <dbl>     <dbl>
## 1        63079 France  FR          Thiverv~ Yveli~ N/A       48.9      1.92
## 2        63081 France  FR          Monchy ~ Somme  N/A       49.8      3.05
## 3        63086 France  FR          Goudelin Côtes~ N/A       48.5     -3.02
## 4        63089 France  FR          Fiefs    Pas-d~ N/A       50.4      2.33
## 5        63090 France  FR          Villene~ Esson~ N/A       48.4      2.25
## 6        63099 France  FR          Tatingh~ Pas-d~ N/A       50.7      2.21
## # ... with 27 more variables: Long_BIN <dbl>, Lat_BIN <dbl>,
## #   Long_Lat_Bin <dbl>, Envi_BIN <chr>, Cultivar <chr>, Geneticgroup <S3:
## #   haven_labelled>, Racename <chr>, Yr_complexity <dbl>, miss_Yr <dbl>,
## #   Yr1 <dbl>, Yr2 <dbl>, Yr3 <dbl>, Yr4 <dbl>, Yr6 <dbl>, Yr7 <dbl>,
## #   Yr8 <dbl>, Yr9 <dbl>, Yr10 <dbl>, Yr15 <dbl>, Yr17 <dbl>, Yr24 <dbl>,
## #   Yr25 <dbl>, Yr27 <dbl>, Yr32 <dbl>, YrSp <dbl>, YrAvS <dbl>,
## #   YrAmb <dbl>

Data is read in tibble format, as demonstrated below.

class(data)
## [1] "tbl_df"     "tbl"        "data.frame"

The haven package converts value labels – and they can be factorized

When reading in data from e.g. a SAV-file using the haven R-package value labels are translated into new labels, of class “labelled”. This ensures that the original semantic can be maintained. Labels can be turned into factors using as_factor().

The haven package does NOT convert character vectors into factors.

When reading in data using the haven package dates and times are converted into date and time class format.

You May Also Like

Leave a Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.