In this short article I introduce the haven package in R, and how it can be used for importing data from SPSS in SAV-file format into R. In previous posts I demonstrated how to read in data in R from formats such as csv, json, xlsx and xml. I have also written articles covering sqlite3 in R and sqlite3 in Python. I demonstrated how one can connect a sqlite3 database engine to Python.
Analysts can use the “haven” package in R for reading and processing data from SAS, SPSS and Stata. I can use different functions for this, provided by the haven package: – read_sas() for reading data in SAS format – read_sav() for reading data in SAV-file format, from SPSS – read_dta() for reading in data dta-format from Stata
Below I install the haven package and read in an exemplaric sav-file, using the read_sav function.
#install.packages("haven")
library(haven)
## Warning: package 'haven' was built under R version 3.6.3
data = read_sav("data.sav")
head(data)
## # A tibble: 6 x 35
## CollectionID Country Countrycode Location Region Site Latitude Longitude
## <dbl> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl>
## 1 63079 France FR Thiverv~ Yveli~ N/A 48.9 1.92
## 2 63081 France FR Monchy ~ Somme N/A 49.8 3.05
## 3 63086 France FR Goudelin Côtes~ N/A 48.5 -3.02
## 4 63089 France FR Fiefs Pas-d~ N/A 50.4 2.33
## 5 63090 France FR Villene~ Esson~ N/A 48.4 2.25
## 6 63099 France FR Tatingh~ Pas-d~ N/A 50.7 2.21
## # ... with 27 more variables: Long_BIN <dbl>, Lat_BIN <dbl>,
## # Long_Lat_Bin <dbl>, Envi_BIN <chr>, Cultivar <chr>, Geneticgroup <S3:
## # haven_labelled>, Racename <chr>, Yr_complexity <dbl>, miss_Yr <dbl>,
## # Yr1 <dbl>, Yr2 <dbl>, Yr3 <dbl>, Yr4 <dbl>, Yr6 <dbl>, Yr7 <dbl>,
## # Yr8 <dbl>, Yr9 <dbl>, Yr10 <dbl>, Yr15 <dbl>, Yr17 <dbl>, Yr24 <dbl>,
## # Yr25 <dbl>, Yr27 <dbl>, Yr32 <dbl>, YrSp <dbl>, YrAvS <dbl>,
## # YrAmb <dbl>
Data is read in tibble format, as demonstrated below.
class(data)
## [1] "tbl_df" "tbl" "data.frame"
When reading in data from e.g. a sav-file using the haven R-package value labels are translated into new labels, of class labelled. This ensures that the original semantic can be maintained. Labels can be turned into factors using as_factor().
The haven package does NOT convert character vectors into factors.
When reading in data using the haven package dates and times are converted into date and time class format.
Data scientist focusing on simulation, optimization and modeling in R, SQL, VBA and Python
Leave a Reply