library(knitr) library(dplyr) library(tibble) library(scaledic) library(stringr) knitr::opts_chunk$set( message = FALSE, warning = FALSE, collapse = TRUE, comment = "#>" )
When you conduct research based on questionnaires or psychometric tests (and you are working in R), you typically create a data.frame with one column (variable) for each item on that questionnaire and one row for each person who participated. You can only store a limited amount of additional information about each item in that questionnaire within a data.frame (or tibble). You can give a variable a name and define a variable as a factor with appropriate levels. But basically, that is it. You cannot, at least not conveniently, include a longer label for each item, the name of a scale to which that item belongs to, information about reverse coding, etc.
I call the collection of this additional information about items an item dictionary. A dictionary contains a short label, a longer description, scale affiliation, and more for each item.
A dictionary file is a table with one row for each variable and one column for each attribute of those variables. The most convenient way to create a dictionary file is in a spreadsheet program for later use with data sets.
Here is an extract from an example dic-file:
dic_itrf |> select(item_name, item_label, scale, scale_label, subscale, subscale_label, values, value_labels, missing, type) |> slice(2:4) |> kable()
A dictionary file can contain any additional attributes. This means that you can add a column with any name to store relevant information (e.g. the scale and scale label to which an item belongs, a translation of the item name). However, there are some predefined attributes with a specific meaning. The table below shows these attributes:
out <- tribble( ~Parameter, ~Meaning, ~Example, "item_name", "A short item name", "itrf_1", "item_label", "Full text of the item", "Vermeidet die Teilnahme an Diskussionen im Unterricht", #"scale", "Abreviation of the scale the item belongs to", "irtf", #"scale_label", "Name of the scale", "Integrated Teacher Report Form", #"subscale", "Abrevation of the sub scale", "int", #"subscale_label", "Name of the sub scale", "internalizing problems", "values", "Valid response values in an R manner", "1:5 (for integers 1 to 5) 1,2,3 (for integers 1, 2, 3)", "value_labels", "Labels for each response value", "0 = nicht; 1 = leicht; 2 = mäßig; 3 = stark", "missing", "Missing values", "-888, -999", "type", "Data type (factor, integer, float, real)", "integer", "weight", "Reversion of item and its weight", "1 (positive), -1 (reverse), 1.5 (positive, weights 1.5 times)", ) kable(out, caption = "Basic columns of a dictionary file")
When you combine a dataset with a dictionary file, each variable in the dataset that corresponds to a variable described in the dictionary is completed with the given dictionary information.\
The resulting dataset is now ready for use with all other scaledic
functions.
The apply_dic
function takes the name of the dataset and the dictionary file and combines them. Missing values are replaced by NAs:
# Here we use the example dataset "dat_itrf" and the example dic file "dic_itrf" dat <- apply_dic(dat_itrf, dic_itrf)
Let us take a look at all the scales in the dataset:
list_scales(dat, paste0(c("scale", "subscale", "subscale_2"), "_label")) %>% kable()
Firstly, we check for invalid values in the dataset (e.g., typos) and replace them with NA:
dat <- check_values(dat, replace = NA)
Now we impute missing values:
# Imputation for items of the subscale Ext dat <- impute_missing(dat, subscale == "Ext") # Imputation for items of the subscale Int dat <- impute_missing(dat, subscale == "Int")
dat <- impute_missing(dat, subscale == "Ext") dat <- impute_missing(dat, subscale == "Int")
Let us look at the descriptive statistics for the internalising subscale:
dat %>% select_items(subscale == "Int") %>% descriptives(round = 1)
It is more convenient to see the original items rather than the short labels:
dat %>% select_items(subscale == "Int") %>% rename_items() %>% descriptives(round = 1) %>% kable()
And then we analyse the factor structure. Here we use the rename_item()
function to get a more convenient description.
dat %>% select_items(scale == "ITRF") %>% rename_items(pattern = "({reverse}){subscale}_{subscale_2}: {label}", max_chars = 70) %>% exploratory_fa(nfactors = 4, cut = 0.4) %>% kable()
and provide item analyses
scales <- ex_itrf %>% get_scales( 'APD' = subscale_2 == "APD", 'OPP' = subscale_2 == "OPP", "SW" = subscale_2 == "SW", "AD" = subscale_2 == "AD" ) alpha_table(dat, scales = scales) %>% kable()
Now we will create scores for the internalizing and externalizing scales.
dat$itrf_ext <- score_scale(dat, scale == "ITRF" & subscale == "Ext", label = "Externalizing") dat$itrf_int <- score_scale(dat, scale == "ITRF" & subscale == "Int", label = "Internalizing")
and get descriptives for those scores
dat %>% select_scores() %>% rename_items() %>% descriptives(round = 1)
Many scales come with norm tables to convert raw scores to t-scores, percentile ranks, etc.
The lookup_norms
function helps with this conversion.
Firstly, you need a data frame (or Excel table etc) which includes raw-scores and corresponding norm-scores.
Here is an example of such a table:
ex_normtable_int %>% slice(1:10) %>% kable()
Then we need raw-scores from a scale. If they do not exist, you may use the score_scales
function to add sum scores. Therefore set the sum argument to TRUE
. By setting max_na = 0
, we do not allow missing values in any scale item:
dat$raw_int <- score_scale(dat, subscale == "Int", sum = TRUE, max_na = 0) dat$raw_ext <- score_scale(dat, subscale == "Ext", sum = TRUE, max_na = 0)
By default, lookup_norms
looks for T values:
dat$T_int <- lookup_norms(dat$raw_int, normtable = ex_normtable_int) dat$T_ext <- lookup_norms(dat$raw_ext, normtable = ex_normtable_ext)
But this can easily be changed to percentile ranks, if included:
dat$PR_int <- lookup_norms(dat$raw_int, normtable = ex_normtable_int, to = "PR") dat$PR_ext <- lookup_norms(dat$raw_ext, normtable = ex_normtable_ext, to = "PR")
dat[1:10, c("T_int", "T_ext", "PR_int", "PR_ext")] %>% kable()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.