knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
This vignette illustrates how a relational eatGADS
data base can be accessed and used. Therefore, the vignette is targeted at users who make use of an existing data base.
For illustrative purposes we use a small example data base based on the campus files of the German PISA Plus assessment. The complete campus files and the original data set can be accessed here and here. The data base is installed alongside eatGADS
and the path can be accessed via the system.file()
function.
library(eatGADS) db_path <- system.file("extdata", "pisa.db", package = "eatGADS") db_path
Relational data bases created by eatGADS
provide an alternative way of storing hierarchically structured data (e.g. from educational large-scale assessments). Compared to conventional approaches (one big or multiple .sav
/.Rdata
files) this yields the following advantages:
.sav
files)eatGADS
we can choose which variables to load into R
R
We can inspect the data base structure with the namesGADS()
function. The function returns a named list
. Every list element represents a hierarchy level. The corresponding character vector contains all variable names on this hierarchy level.
nam <- namesGADS(db_path) nam
The example data base contains two hierarchy levels: A student level (noImp
) and a plausible value level (PVs
). On the student level, each row represents an individual student. On the plausible value level, each row represents an imputation number of a specific domain of an individual student.
We can access meta information of the variables in the data set using the extractMeta()
function.
# Meta data for one variable extractMeta(db_path, "age")
To supply variables names we can also use the named list nam
extracted earlier. This way, we can extract all meta information available for a hierarchy level.
extractMeta(db_path, nam$PVs)
Commonly the most informative columns are varLabel
(containing variable labels), value
(referencing labeled values), valLabel
(containing value labels) and missings
(is a labeled value a missing value ("miss"
) or not ("valid"
)).
# Meta data for manually chosen multiple variables extractMeta(db_path, c("idstud", "schtype"))
To extract a data set from the data base, we can use the function getGADS()
. If the data base is stored on a server drive, getGADS_fast()
provides identical functionality but substantially increases the performance. With the vSelect
argument we specify our variable selection. It is important to note that getGADS()
returns a so called GADSdat
object. This object type contains complex meta information (that is for example also available in a SPSS
data set), and is therefore not directly usable for data analysis. We can, however, use the extractMeta()
function on it to access the meta data.
gads1 <- getGADS(filePath = db_path, vSelect = c("idstud", "schtype", "gender")) class(gads1) extractMeta(gads1)
GADSdat
If we want to use the data for analyses in R
we have to extract it from the GADSdat
object via the function extractData2()
. In doing so, we have to make two important decisions: (a) how should values marked as missing values be treated (convertMiss
)? And (b) how should labeled values in general be treated (labels2character
, labels2factor
, labels2ordered
, and dropPartialLabels
)?
Per default, all missing tags are applied, meaning all values tagged as missing are recoded to NA
(convertMiss == TRUE
). Furthermore, per default, all value labels are dropped (labels2character = NULL
, labels2factor = NULL
, labels2ordered = NULL
). If for specific variables, value labels should be applied and the resulting variable should be a character variable, this can specified via, for example, setting labels2character = c("var1", "var2")
.
## leave all labeled variables as numeric, convert missings to NA dat1 <- extractData2(gads1) head(dat1) ## convert selected labeled variable(s) to character, convert missings to NA dat2 <- extractData2(gads1, labels2character = c("schtype")) head(dat2) ## convert all labeled variables to character, convert missings to NA dat3 <- extractData2(gads1, labels2character = namesGADS(gads1)) head(dat3)
In general, we recommend leaving labeled variables as numeric and converting values with missing codes to NA
. If required, value labels can always be accessed via using extractMeta()
on the GADSdat
object or the data base.
An important feature of eatGADS
relational data bases are that data sets are automatically returned on the correct hierarchy level. For an overview of different data structures, see "Tidy Data" or this article explaining long and wide format using repeated measures. In educational large-scale assessments, data usually contain multiple imputations or plausible values. Packages that enable us analyzing these types of data (like eatRep
) often require these data in the long format.
The function getGADS()
extracts data automatically in the appropriate structure, depending on our variable selection. If we select only variables from the student level, the data returned is on the student level. Each student is represented in a single row.
gads1 <- getGADS(db_path, vSelect = c("schtype", "g8g9")) dat1 <- extractData2(gads1) dim(dat1) head(dat1)
If additionally variables from the plausible Value data table are extracted, the returned data structure changes. In the PVs
data table, data is stored on the "student x dimension x plausible value number" level. The returned data has exactly this structure.
gads2 <- getGADS(db_path, vSelect = c("schtype", "value")) dat2 <- extractData2(gads2) dim(dat2) head(dat2)
These two examples highlight another feature of getGADS()
: Only variables of substantial interest have to be selected for extraction. The correct ID variables are added automatically.
In educational large-scale assessments, a common challenge is reporting longitudinal developments (trends). getTrendGADS
allows extracting data from multiple data bases with identical variables in it.
trend_path1 <- system.file("extdata", "trend_gads_2020.db", package = "eatGADS") trend_path2 <- system.file("extdata", "trend_gads_2015.db", package = "eatGADS") trend_path3 <- system.file("extdata", "trend_gads_2010.db", package = "eatGADS")
eatGADS
comes with three small trend data bases which can be used for illustrative purposes.
gads_trend <- getTrendGADS(filePaths = c(trend_path1, trend_path2, trend_path3), vSelect = c("idstud", "dimension", "score"), years = c(2020, 2015, 2010), fast = FALSE) dat_trend <- extractData2(gads_trend) head(dat_trend)
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.