In this short article I introduce the haven package in R, and how it can be used for importing data from SPSS in SAV-file format into R. I already demonstrated this as part of a longer coding example, in which I animated spatial distribution of specimen detections as documented in a SAV-file. In previous posts I also demonstrated how to read in data in R from formats such as csv, json, xlsx and xml. I have also written articles covering sqlite3 in R and sqlite3 in Python. I furthermore demonstrated how one can connect a sqlite3 database engine to Python.
In this article I just simply want to demonstrate the haven package – in case you are wondering how to import SPSS data into R.
The haven package is useful for analysts working with SPSS, SAS and Stata
Analysts can use the “haven” package in R for reading and processing data from SAS, SPSS and Stata. I can use different functions for this, provided by the haven package: – read_sas() for reading data in SAS format – read_sav() for reading data in SAV-file format, from SPSS – read_dta() for reading in data dta-format from Stata.
Below I install the haven package and read in an exemplaric sav-file, using the read_sav function.
#install.packages("haven")
library(haven)
## Warning: package 'haven' was built under R version 3.6.3
data = read_sav("data.sav")
head(data)
## # A tibble: 6 x 35
## CollectionID Country Countrycode Location Region Site Latitude Longitude
## <dbl> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl>
## 1 63079 France FR Thiverv~ Yveli~ N/A 48.9 1.92
## 2 63081 France FR Monchy ~ Somme N/A 49.8 3.05
## 3 63086 France FR Goudelin Côtes~ N/A 48.5 -3.02
## 4 63089 France FR Fiefs Pas-d~ N/A 50.4 2.33
## 5 63090 France FR Villene~ Esson~ N/A 48.4 2.25
## 6 63099 France FR Tatingh~ Pas-d~ N/A 50.7 2.21
## # ... with 27 more variables: Long_BIN <dbl>, Lat_BIN <dbl>,
## # Long_Lat_Bin <dbl>, Envi_BIN <chr>, Cultivar <chr>, Geneticgroup <S3:
## # haven_labelled>, Racename <chr>, Yr_complexity <dbl>, miss_Yr <dbl>,
## # Yr1 <dbl>, Yr2 <dbl>, Yr3 <dbl>, Yr4 <dbl>, Yr6 <dbl>, Yr7 <dbl>,
## # Yr8 <dbl>, Yr9 <dbl>, Yr10 <dbl>, Yr15 <dbl>, Yr17 <dbl>, Yr24 <dbl>,
## # Yr25 <dbl>, Yr27 <dbl>, Yr32 <dbl>, YrSp <dbl>, YrAvS <dbl>,
## # YrAmb <dbl>
Data is read in tibble format, as demonstrated below.
class(data)
## [1] "tbl_df" "tbl" "data.frame"
The haven package converts value labels – and they can be factorized
When reading in data from e.g. a SAV-file using the haven R-package value labels are translated into new labels, of class “labelled”. This ensures that the original semantic can be maintained. Labels can be turned into factors using as_factor().
The haven package does NOT convert character vectors into factors.
When reading in data using the haven package dates and times are converted into date and time class format.
Data scientist focusing on simulation, optimization and modeling in R, SQL, VBA and Python
Leave a Reply