Handling meta data"
In eatGADS: Data Management of Large Hierarchical Data

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

When trying to understand data, most often not only the actual data is required, but also so called meta data. Meta data usually includes:

variable labels: Labels describing, what a variable in the data represents
value labels: Labels describing, what a certain value on a certain variable represents

While the data.frame class in R supports value labels to a certain degree with the factor class, its functionality is limited. Other data formats like .xlsx or .csv support no meta data at all. Commercial software like SPSS provides such functionality but can not compete with the various tools for analyzing data that R provides.

eatGADS is an R package that was developed to bridge this gap. Its main purpose is providing a data format in R specifically designed for storing meta data together with data in one place. Therefore it provides an S3 class called GADSdat. The following vignette concentrates on how to import data into the GADSdat format and work with it in the R environment. In collaboration with the IQB Forschungsdatenzentrum (FDZ) the package can also be used to distribute data.

Note that eatGADS also allows the handling of large hierarchical data structures via relational data bases. This functionality is explained in more detail in an additional vignette.

Setup

The package can be installed from GitHub. Note that older R versions had issues with installations from online repositories like GitHub. R version > 3.6.0 should work without any issues.

devtools::install_github("beckerbenj/eatGADS")

# loading the package
library(eatGADS)

Importing data into the `GADSdat` format

Importing from SPSS

R offers a variety of tools to import data from all sorts of data formats. SPSS data (.sav files) can be imported directly into the GADSdat format, with haven used as a backend. Note that this is the easiest way to import data into the GADSdat format.

# importing an SPSS file
gads <- import_spss("path/example.sav")

Importing from Excel etc.

All other file types should be imported into R first and then supplied as data.frames to import_raw. Below is a small selection of functions that import data as data.frames. For an extensive overview of importing functions using the package readr see also this book chapter, while the package readxl is explained in more detail on this [homepage] (https://readxl.tidyverse.org/). As these files are plain data files, meta data has to be supplied as separate data sheets.

Note that none of the data.frames can contain variables of the class factor, as this in itself constitutes meta data. If using base R to import data make sure to use the argument stringsAsFactors = FALSE. If necessary, convert factors to character via as.character.

# importing text files
input_txt <- read.table("path/example.txt", stringsAsFactors = FALSE)
# importing German csv files (; separated)
input_csv <- read.csv2("path/example.csv", stringsAsFactors = FALSE)
# importing Excel files
input_xlsx <- readxl::read_excel("path/example.xlsx")

import_raw takes three separate data.frames as input. The actual data set (df), the variable labels (varLabels) and the value labels (valLabels). These three objects have to be supplied in a very specific format.

The varLabels object has to contain two variables: varName, which should exactly correspond to the variable names in df and varLabels which should contain the desired variable labels as strings. Note that this data.frame should contain as many rows as there are variables in df.

The optional valLabels object has to contain four variables: varName, which should exactly correspond to the variable names in df; values, which should correspond to the respective values in df and has to be a numeric vector (labels for character vectors are currently not supported); valLabels, which should contain the value labels as strings; and missings, a column indicating whether the value indicates a missing value. Valid values for missings are "valid" = no missing code and "miss" = missing code. Note that this data.frame can not contain any varNames that are not variables in df. However, not all variables in df have to occur in valLabels.

# Example data set
df <- data.frame(ID = 1:4, sex = c(0, 0, 1, 1), 
                 forename = c("Tim", "Bill", "Ann", "Chris"), stringsAsFactors = FALSE)
# Example variable labels
varLabels <- data.frame(varName = c("ID", "sex", "forename"), 
                        varLabel = c("Person Identifier", "Sex as self reported", 
                                     "first name as reported by teacher"), 
                        stringsAsFactors = FALSE)
# Example value labels
valLabels <- data.frame(varName = rep("sex", 3), 
                        value = c(0, 1, -99), 
                        valLabel = c("male", "female", "missing - omission"), 
                        missings = c("valid", "valid", "miss"), stringsAsFactors = FALSE)

df
varLabels
valLabels

# import 
gads <- import_raw(df = df, varLabels = varLabels, valLabels = valLabels)

`GADSdat` class

The resulting object is of the class GADSdat and contains a data sheet and a meta data sheet.

# Inpsect resulting object 
gads

Saving `GADSdat` objects

GADSdat objects, for example, can be saved as RDS files. This is also the preferred data format for distributing GADSdat objects to the FDZ.

# Inpsect resulting object 
saveRDS(gads, "path/gads.RDS")

Using `GADSdat` objects in R

eatGADS provides convenient functions for extracting data and meta data from GADSdat objects. extractMeta is used to access the meta data for specific variables (or all variables, if no specific variable name is provided).

# Inpsect resulting object 
extractMeta(gads, vars = c("sex"))
extractMeta(gads)

extractData is used to extract data. With its arguments the structure of the resulting data can be defined. If convertMiss = TRUE, which is the default, is used, values that are listed as missing codes are recoded to NAs. With the convertLabels argument it can be specified how value labels should be used. If set to "character" all labeled values are recoded to character, the same applies to "factor". If set to "numeric", the value labels are not applied.

# Extract data without applying labels
dat1 <- extractData(gads, convertMiss = TRUE, convertLabels = "numeric")
dat1

dat2 <- extractData(gads, convertMiss = TRUE, convertLabels = "character")
dat2

Modifying `GADSdat` objects

GADSdat objects can also be modified even though only a certain amount of operations are supported. For smaller changes to the data and meta data a number of convenience functions exists. These functions allow modifying variable labels (changeVarLabels), modifying variable names (changeVarNames) and recoding values (recodeGADS).

### wrapper functions
# Modify variable labels
gads2 <- changeVarLabels(gads, varName = c("ID"), varLabel = c("Test taker ID"))
extractMeta(gads2, vars = "ID")

# Modify variable name
gads3 <- changeVarNames(gads, oldNames = c("ID"), newNames = c("idstud"))
extractMeta(gads3, vars = "idstud")
extractData(gads3)

# recode GADS
gads4 <- recodeGADS(gads, varName = "sex", oldValues = c(0, 1, -99), newValues = c(1, 2, 99))
extractMeta(gads4, vars = "sex")
extractData(gads4, convertLabels = "numeric")

For simultaneous changes to multiple variables a set of functions is implemented that extract a table for changes and applies the changes as written into this change table. To enable an easier work flow the change table could also be saved as an Excel file, modified via Excel and again imported into R. See the help pages of the respective functions for more details.

# extract changeTable
varChanges <- getChangeMeta(gads, level = "variable")
# modify changeTable
varChanges[varChanges$varName == "ID", "varLabel_new"] <- "Test taker ID"
# Apply changes
gads5 <- applyChangeMeta(varChanges, gads)
extractMeta(gads5, vars = "ID")

Writing SPSS files

Objects of the class GADSdat can also be exported into the SPSS format, utilizing haven. Note that this function is slightly experimental and problems with specific character strings might occur.

write_spss(gads, "path/example_out.sav")

If the haven format is preferred for working in R, a GADSdat object can also be transformed to its equivalent tibble format, as if the data was imported from SPSS via haven.

haven_dat <- export_tibble(gads)
haven_dat
lapply(haven_dat, attributes)

Any scripts or data that you put into this service are public.

eatGADS documentation built on Oct. 9, 2024, 5:09 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

eatGADS
Data Management of Large Hierarchical Data

Handling meta data"
In eatGADS: Data Management of Large Hierarchical Data

Setup

Importing data into the `GADSdat` format

Importing from SPSS

Importing from Excel etc.

`GADSdat` class

Saving `GADSdat` objects

Using `GADSdat` objects in R

Modifying `GADSdat` objects

Writing SPSS files

Try the eatGADS package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

eatGADS Data Management of Large Hierarchical Data

Handling meta data" In eatGADS: Data Management of Large Hierarchical Data

Setup

Importing data into the GADSdat format

Importing from SPSS

Importing from Excel etc.

GADSdat class

Saving GADSdat objects

Using GADSdat objects in R

Modifying GADSdat objects

Writing SPSS files

Try the eatGADS package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

eatGADS
Data Management of Large Hierarchical Data

Handling meta data"
In eatGADS: Data Management of Large Hierarchical Data

Importing data into the `GADSdat` format

`GADSdat` class

Saving `GADSdat` objects

Using `GADSdat` objects in R

Modifying `GADSdat` objects