SPSS, SAS and Stata data in R with haven

In this short article I introduce the haven package in R, and how it can be used for importing data from SPSS in SAV-file format into R. I already demonstrated this as part of a longer coding example, in which I animated spatial distribution of specimen detections as documented in a SAV-file. In previous posts I also demonstrated how to read in data in R from formats such as csv, json, xlsx and xml. I have also written articles covering sqlite3 in R and sqlite3 in Python. I furthermore demonstrated how one can connect a sqlite3 database engine to Python.

In this article I just simply want to demonstrate the haven package – in case you are wondering how to import SPSS data into R.

The haven package is useful for analysts working with SPSS, SAS and Stata

Analysts can use the “haven” package in R for reading and processing data from SAS, SPSS and Stata. I can use different functions for this, provided by the haven package: – read_sas() for reading data in SAS format – read_sav() for reading data in SAV-file format, from SPSS – read_dta() for reading in data dta-format from Stata.

Below I install the haven package and read in an exemplaric sav-file, using the read_sav function.

#install.packages("haven")
library(haven)

## Warning: package 'haven' was built under R version 3.6.3

data = read_sav("data.sav")
head(data)

## # A tibble: 6 x 35
##   CollectionID Country Countrycode Location Region Site  Latitude Longitude
##          <dbl> <chr>   <chr>       <chr>    <chr>  <chr>    <dbl>     <dbl>
## 1        63079 France  FR          Thiverv~ Yveli~ N/A       48.9      1.92
## 2        63081 France  FR          Monchy ~ Somme  N/A       49.8      3.05
## 3        63086 France  FR          Goudelin Côtes~ N/A       48.5     -3.02
## 4        63089 France  FR          Fiefs    Pas-d~ N/A       50.4      2.33
## 5        63090 France  FR          Villene~ Esson~ N/A       48.4      2.25
## 6        63099 France  FR          Tatingh~ Pas-d~ N/A       50.7      2.21
## # ... with 27 more variables: Long_BIN <dbl>, Lat_BIN <dbl>,
## #   Long_Lat_Bin <dbl>, Envi_BIN <chr>, Cultivar <chr>, Geneticgroup <S3:
## #   haven_labelled>, Racename <chr>, Yr_complexity <dbl>, miss_Yr <dbl>,
## #   Yr1 <dbl>, Yr2 <dbl>, Yr3 <dbl>, Yr4 <dbl>, Yr6 <dbl>, Yr7 <dbl>,
## #   Yr8 <dbl>, Yr9 <dbl>, Yr10 <dbl>, Yr15 <dbl>, Yr17 <dbl>, Yr24 <dbl>,
## #   Yr25 <dbl>, Yr27 <dbl>, Yr32 <dbl>, YrSp <dbl>, YrAvS <dbl>,
## #   YrAmb <dbl>

Data is read in tibble format, as demonstrated below.

class(data)

## [1] "tbl_df"     "tbl"        "data.frame"

The haven package converts value labels – and they can be factorized

When reading in data from e.g. a SAV-file using the haven R-package value labels are translated into new labels, of class “labelled”. This ensures that the original semantic can be maintained. Labels can be turned into factors using as_factor().

The haven package does NOT convert character vectors into factors.

When reading in data using the haven package dates and times are converted into date and time class format.

Linnart Felkl

Data scientist focusing on simulation, optimization and modeling in R, SQL, VBA and Python

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

SPSS, SAS and Stata data in R with haven

The haven package is useful for analysts working with SPSS, SAS and Stata

The haven package converts value labels – and they can be factorized

Leave a Reply

Leave a Reply Cancel reply

The haven package is useful for analysts working with SPSS, SAS and Stata

The haven package converts value labels – and they can be factorized

You May Also Like

Parking lot simulator with simmer in R

Backlog simulation of FIFO production

Receival inspection simulation with simmer

Leave a Reply

Leave a Reply Cancel reply