knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
When trying to understand data, most often not only the actual data is required, but also so called meta data. Meta data usually includes:
While the data.frame
class in R
supports value labels to a certain degree with the factor
class, its functionality is limited. Other data formats like .xlsx
or .csv
support no meta data at all. Commercial software like SPSS
provides such functionality but can not compete with the various tools for analyzing data that R
provides.
eatGADS
is an R
package that was developed to bridge this gap. Its main purpose is providing a data format in R
specifically designed for storing meta data together with data in one place. Therefore it provides an S3
class called GADSdat
. The following vignette concentrates on how to import data into the GADSdat
format and work with it in the R
environment. In collaboration with the IQB Forschungsdatenzentrum (FDZ)
the package can also be used to distribute data.
Note that eatGADS
also allows the handling of large hierarchical data structures via relational data bases. This functionality is explained in more detail in an additional vignette.
The package can be installed from GitHub. Note that older R versions had issues with installations from online repositories like GitHub. R
version > 3.6.0
should work without any issues.
devtools::install_github("beckerbenj/eatGADS")
# loading the package library(eatGADS)
GADSdat
formatR
offers a variety of tools to import data from all sorts of data formats. SPSS
data (.sav
files) can be imported directly into the GADSdat
format, with haven
used as a backend. Note that this is the easiest way to import data into the GADSdat
format.
# importing an SPSS file gads <- import_spss("path/example.sav")
All other file types should be imported into R
first and then supplied as data.frames
to import_raw
. Below is a small selection of functions that import data as data.frames
. For an extensive overview of importing functions using the package readr
see also this book chapter, while the package readxl
is explained in more detail on this [homepage] (https://readxl.tidyverse.org/). As these files are plain data files, meta data has to be supplied as separate data sheets.
Note that none of the data.frames
can contain variables of the class factor
, as this in itself constitutes meta data. If using base R
to import data make sure to use the argument stringsAsFactors = FALSE
. If necessary, convert factors
to character via as.character
.
# importing text files input_txt <- read.table("path/example.txt", stringsAsFactors = FALSE) # importing German csv files (; separated) input_csv <- read.csv2("path/example.csv", stringsAsFactors = FALSE) # importing Excel files input_xlsx <- readxl::read_excel("path/example.xlsx")
import_raw
takes three separate data.frames
as input. The actual data set (df
), the variable labels (varLabels
) and the value labels (valLabels
). These three objects have to be supplied in a very specific format.
The varLabels
object has to contain two variables: varName
, which should exactly correspond to the variable names in df
and varLabels
which should contain the desired variable labels as strings. Note that this data.frame
should contain as many rows as there are variables in df
.
The optional valLabels
object has to contain four variables: varName
, which should exactly correspond to the variable names in df
; values
, which should correspond to the respective values in df
and has to be a numeric vector (labels for character vectors are currently not supported); valLabels
, which should contain the value labels as strings; and missings
, a column indicating whether the value indicates a missing value. Valid values for missings
are "valid"
= no missing code and "miss"
= missing code. Note that this data.frame
can not contain any varNames
that are not variables in df
. However, not all variables in df
have to occur in valLabels
.
# Example data set df <- data.frame(ID = 1:4, sex = c(0, 0, 1, 1), forename = c("Tim", "Bill", "Ann", "Chris"), stringsAsFactors = FALSE) # Example variable labels varLabels <- data.frame(varName = c("ID", "sex", "forename"), varLabel = c("Person Identifier", "Sex as self reported", "first name as reported by teacher"), stringsAsFactors = FALSE) # Example value labels valLabels <- data.frame(varName = rep("sex", 3), value = c(0, 1, -99), valLabel = c("male", "female", "missing - omission"), missings = c("valid", "valid", "miss"), stringsAsFactors = FALSE) df varLabels valLabels # import gads <- import_raw(df = df, varLabels = varLabels, valLabels = valLabels)
GADSdat
classThe resulting object is of the class GADSdat
and contains a data sheet and a meta data sheet.
# Inpsect resulting object
gads
GADSdat
objectsGADSdat
objects, for example, can be saved as RDS
files. This is also the preferred data format for distributing GADSdat
objects to the FDZ
.
# Inpsect resulting object saveRDS(gads, "path/gads.RDS")
GADSdat
objects in ReatGADS
provides convenient functions for extracting data and meta data from GADSdat
objects. extractMeta
is used to access the meta data for specific variables (or all variables, if no specific variable name is provided).
# Inpsect resulting object extractMeta(gads, vars = c("sex")) extractMeta(gads)
extractData
is used to extract data. With its arguments the structure of the resulting data can be defined. If convertMiss = TRUE
, which is the default, is used, values that are listed as missing codes are recoded to NAs
. With the convertLabels
argument it can be specified how value labels should be used. If set to "character"
all labeled values are recoded to character, the same applies to "factor
". If set to "numeric"
, the value labels are not applied.
# Extract data without applying labels dat1 <- extractData(gads, convertMiss = TRUE, convertLabels = "numeric") dat1 dat2 <- extractData(gads, convertMiss = TRUE, convertLabels = "character") dat2
GADSdat
objectsGADSdat
objects can also be modified even though only a certain amount of operations are supported. For smaller changes to the data and meta data a number of convenience functions exists. These functions allow modifying variable labels (changeVarLabels
), modifying variable names (changeVarNames
) and recoding values (recodeGADS
).
### wrapper functions # Modify variable labels gads2 <- changeVarLabels(gads, varName = c("ID"), varLabel = c("Test taker ID")) extractMeta(gads2, vars = "ID") # Modify variable name gads3 <- changeVarNames(gads, oldNames = c("ID"), newNames = c("idstud")) extractMeta(gads3, vars = "idstud") extractData(gads3) # recode GADS gads4 <- recodeGADS(gads, varName = "sex", oldValues = c(0, 1, -99), newValues = c(1, 2, 99)) extractMeta(gads4, vars = "sex") extractData(gads4, convertLabels = "numeric")
For simultaneous changes to multiple variables a set of functions is implemented that extract a table for changes and applies the changes as written into this change table. To enable an easier work flow the change table could also be saved as an Excel file, modified via Excel and again imported into R
. See the help pages of the respective functions for more details.
# extract changeTable varChanges <- getChangeMeta(gads, level = "variable") # modify changeTable varChanges[varChanges$varName == "ID", "varLabel_new"] <- "Test taker ID" # Apply changes gads5 <- applyChangeMeta(varChanges, gads) extractMeta(gads5, vars = "ID")
Objects of the class GADSdat
can also be exported into the SPSS format, utilizing haven
. Note that this function is slightly experimental and problems with specific character strings might occur.
write_spss(gads, "path/example_out.sav")
If the haven
format is preferred for working in R
, a GADSdat
object can also be transformed to its equivalent tibble
format, as if the data was imported from SPSS via haven
.
haven_dat <- export_tibble(gads) haven_dat lapply(haven_dat, attributes)
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.