Importing SAV-file in R with haven

In this short article I introduce the haven package in R, and how it can be used for importing data from SPSS in SAV-file format into R. In previous posts I demonstrated how to read in data in R from formats such as csv, json, xlsx and xml. I have also written articles covering sqlite3 in R and sqlite3 in Python. I demonstrated how one can connect a sqlite3 database engine to Python.

Analysts can use the “haven” package in R for reading and processing data from SAS, SPSS and Stata. I can use different functions for this, provided by the haven package: – read_sas() for reading data in SAS format – read_sav() for reading data in SAV-file format, from SPSS – read_dta() for reading in data dta-format from Stata

Below I install the haven package and read in an exemplaric sav-file, using the read_sav function.

#install.packages("haven")
library(haven)
## Warning: package 'haven' was built under R version 3.6.3
data = read_sav("data.sav")
head(data)
## # A tibble: 6 x 35
##   CollectionID Country Countrycode Location Region Site  Latitude Longitude
##          <dbl> <chr>   <chr>       <chr>    <chr>  <chr>    <dbl>     <dbl>
## 1        63079 France  FR          Thiverv~ Yveli~ N/A       48.9      1.92
## 2        63081 France  FR          Monchy ~ Somme  N/A       49.8      3.05
## 3        63086 France  FR          Goudelin Côtes~ N/A       48.5     -3.02
## 4        63089 France  FR          Fiefs    Pas-d~ N/A       50.4      2.33
## 5        63090 France  FR          Villene~ Esson~ N/A       48.4      2.25
## 6        63099 France  FR          Tatingh~ Pas-d~ N/A       50.7      2.21
## # ... with 27 more variables: Long_BIN <dbl>, Lat_BIN <dbl>,
## #   Long_Lat_Bin <dbl>, Envi_BIN <chr>, Cultivar <chr>, Geneticgroup <S3:
## #   haven_labelled>, Racename <chr>, Yr_complexity <dbl>, miss_Yr <dbl>,
## #   Yr1 <dbl>, Yr2 <dbl>, Yr3 <dbl>, Yr4 <dbl>, Yr6 <dbl>, Yr7 <dbl>,
## #   Yr8 <dbl>, Yr9 <dbl>, Yr10 <dbl>, Yr15 <dbl>, Yr17 <dbl>, Yr24 <dbl>,
## #   Yr25 <dbl>, Yr27 <dbl>, Yr32 <dbl>, YrSp <dbl>, YrAvS <dbl>,
## #   YrAmb <dbl>

Data is read in tibble format, as demonstrated below.

class(data)
## [1] "tbl_df"     "tbl"        "data.frame"

When reading in data from e.g. a sav-file using the haven R-package value labels are translated into new labels, of class labelled. This ensures that the original semantic can be maintained. Labels can be turned into factors using as_factor().

The haven package does NOT convert character vectors into factors.

When reading in data using the haven package dates and times are converted into date and time class format.

You May Also Like

Leave a Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.