knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
import_spss()
allows importing data from SPSS
(.sav
and .zsav
files) into R
by using the R
package haven
.
This vignette illustrates a typical workflow of importing a SPSS
file using import_spss()
and extractData2()
. For illustrative purposes we use a small example data set from the campus files of the German PISA Plus assessment. The complete campus files and the original data set can be accessed here and here.
library(eatGADS)
We can import an .sav
data set via the import_spss()
function. Checks on variable names (for data base compatibility) are performed automatically. Changes to the variable names are reported to the console. This behavior can be suppressed by setting checkVarNames = FALSE
.
sav_path <- system.file("extdata", "pisa.zsav", package = "eatGADS") gads_obj <- import_spss(sav_path)
GADSdat
objectsThe resulting object is of the class GADSdat
. It is basically a named list containing the actual data (dat
) and the meta data (labels
).
class(gads_obj) names(gads_obj)
The names of the variables in a GADSdat
object can be accessed via the namesGADS()
function. The meta data of variables can be accessed via the extractMeta()
function.
namesGADS(gads_obj) extractMeta(gads_obj, vars = c("schtype", "idschool"))
Commonly, the most informative columns are varLabel
(containing variable labels), value
(referencing labeled values), valLabel
(containing value labels) and missings
(missing tag: is a labeled value a missing value ("miss"
) or not ("valid"
)).
GADSdat
If we want to use the data for analyses in R
we have to extract it from the GADSdat
object via the function extractData2()
.
In doing so, we have to make two important decisions: (a) how should values marked as missing values be treated (convertMiss
)?
And (b) how should labeled values in general be treated (labels2character
, labels2factor
, labels2ordered
, dropPartialLabels
)?
If a variable name is not provided under any of labels2character
, labels2factor
, labels2ordered
, all value labels of the corresponding variable are simply dropped.
If a variable name is provided under labels2character
, the value labels of the corresponding variable are applied and the resulting variable is a character variable. labels2factor
converts variables to factor and labels2ordered
converts variables to ordered factors.
See ?extractData2
for more details.
## convert all labeled variables to character dat1 <- extractData2(gads_obj, labels2character = namesGADS(gads_obj)) dat1[1:5, 1:10] ## leave labeled variables as numeric dat2 <- extractData2(gads_obj) dat2[1:5, 1:10] ## leave labeled variables as numeric but convert some variables to character and some to factor dat3 <- extractData2(gads_obj, labels2character = c("gender", "language"), labels2factor = c("schtype", "sameteach")) dat3[1:5, 1:10]
In general, we recommend leaving labeled variables as numeric and converting values with missing codes to NA
.
Both are the default behavior for extractData2()
.
If required, value labels can always be accessed via using extractMeta()
on the GADSdat
object or the data base.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.