library(knitr)
library(dplyr)
library(tibble)
library(scaledic)
library(stringr)

knitr::opts_chunk$set(
  message = FALSE,
  warning = FALSE,
  collapse = TRUE,
  comment = "#>"
)

What is a dictionary file?

When you conduct research based on questionnaires or psychometric tests (and you are working in R), you typically create a data.frame with one column (variable) for each item on that questionnaire and one row for each person who participated. You can only store a limited amount of additional information about each item in that questionnaire within a data.frame (or tibble). You can give a variable a name and define a variable as a factor with appropriate levels. But basically, that is it. You cannot, at least not conveniently, include a longer label for each item, the name of a scale to which that item belongs to, information about reverse coding, etc.

I call the collection of this additional information about items an item dictionary. A dictionary contains a short label, a longer description, scale affiliation, and more for each item.

A dictionary file

A dictionary file is a table with one row for each variable and one column for each attribute of those variables. The most convenient way to create a dictionary file is in a spreadsheet program for later use with data sets.

Here is an extract from an example dic-file:

dic_itrf  |>  
  select(item_name, item_label, scale, scale_label, subscale, subscale_label, values, value_labels, missing, type) |> 
  slice(2:4)  |> 
  kable()

A dictionary file can contain any additional attributes. This means that you can add a column with any name to store relevant information (e.g. the scale and scale label to which an item belongs, a translation of the item name). However, there are some predefined attributes with a specific meaning. The table below shows these attributes:

out <- tribble(
  ~Parameter, ~Meaning, ~Example,
  "item_name", "A short item name", "itrf_1",
  "item_label", "Full text of the item", "Vermeidet die Teilnahme an Diskussionen im Unterricht",
  #"scale", "Abreviation of the scale the item belongs to", "irtf",
  #"scale_label", "Name of the scale", "Integrated Teacher Report Form",
  #"subscale", "Abrevation of the sub scale", "int",
  #"subscale_label", "Name of the sub scale", "internalizing problems",
  "values", "Valid response values in an R manner", "1:5 (for integers 1 to 5) 1,2,3 (for integers 1, 2, 3)",
  "value_labels", "Labels for each response value", "0 = nicht; 1 = leicht; 2 = mäßig; 3 = stark",
  "missing", "Missing values", "-888, -999",
  "type", "Data type (factor, integer, float, real)", "integer",
  "weight", "Reversion of item and its weight", "1 (positive), -1 (reverse), 1.5 (positive, weights 1.5 times)",
)
kable(out, caption = "Basic columns of a dictionary file")

Apply a dictionary file

When you combine a dataset with a dictionary file, each variable in the dataset that corresponds to a variable described in the dictionary is completed with the given dictionary information.\ The resulting dataset is now ready for use with all other scaledic functions.

The apply_dic function takes the name of the dataset and the dictionary file and combines them. Missing values are replaced by NAs:

# Here we use the example dataset "dat_itrf" and the example dic file "dic_itrf"
dat <- apply_dic(dat_itrf, dic_itrf)

Let us take a look at all the scales in the dataset:

list_scales(dat, paste0(c("scale", "subscale", "subscale_2"), "_label")) %>% kable()

Clean raw data

Firstly, we check for invalid values in the dataset (e.g., typos) and replace them with NA:

dat <- check_values(dat, replace = NA)

Now we impute missing values:

# Imputation for items of the subscale Ext
dat <- impute_missing(dat, subscale == "Ext")

# Imputation for items of the subscale Int
dat <- impute_missing(dat, subscale == "Int")
dat <- impute_missing(dat, subscale == "Ext")
dat <- impute_missing(dat, subscale == "Int")

Select scales for analyszing

Let us look at the descriptive statistics for the internalising subscale:

dat %>% 
  select_items(subscale == "Int") %>%
  descriptives(round = 1)

See items instead of labels

It is more convenient to see the original items rather than the short labels:

dat %>% 
  select_items(subscale == "Int") %>%
  rename_items() %>%
  descriptives(round = 1)  %>% 
  kable()

And then we analyse the factor structure. Here we use the rename_item() function to get a more convenient description.

dat %>%
  select_items(scale == "ITRF") %>%
  rename_items(pattern = "({reverse}){subscale}_{subscale_2}: {label}", max_chars = 70) %>%
  exploratory_fa(nfactors = 4, cut = 0.4) %>% kable()

and provide item analyses

scales <- ex_itrf %>% get_scales(
  'APD' = subscale_2 == "APD",
  'OPP' = subscale_2 == "OPP",
  "SW" = subscale_2 == "SW",
  "AD" = subscale_2 == "AD"
)
alpha_table(dat, scales = scales) %>% kable()

Build scale scores

Now we will create scores for the internalizing and externalizing scales.

dat$itrf_ext <- score_scale(dat, scale == "ITRF" & subscale == "Ext", label = "Externalizing")
dat$itrf_int <- score_scale(dat, scale == "ITRF" & subscale == "Int", label = "Internalizing")

and get descriptives for those scores

dat %>%
  select_scores() %>%
  rename_items() %>%
  descriptives(round = 1)

Look up norms from a norm table

Many scales come with norm tables to convert raw scores to t-scores, percentile ranks, etc.

The lookup_norms function helps with this conversion.

Firstly, you need a data frame (or Excel table etc) which includes raw-scores and corresponding norm-scores.

Here is an example of such a table:

ex_normtable_int %>% slice(1:10) %>% kable()

Then we need raw-scores from a scale. If they do not exist, you may use the score_scales function to add sum scores. Therefore set the sum argument to TRUE. By setting max_na = 0, we do not allow missing values in any scale item:

dat$raw_int <- score_scale(dat, subscale == "Int", sum = TRUE, max_na = 0)
dat$raw_ext <- score_scale(dat, subscale == "Ext", sum = TRUE, max_na = 0)

By default, lookup_norms looks for T values:

dat$T_int <- lookup_norms(dat$raw_int, normtable = ex_normtable_int)
dat$T_ext <- lookup_norms(dat$raw_ext, normtable = ex_normtable_ext)

But this can easily be changed to percentile ranks, if included:

dat$PR_int <- lookup_norms(dat$raw_int, normtable = ex_normtable_int, to = "PR")
dat$PR_ext <- lookup_norms(dat$raw_ext, normtable = ex_normtable_ext, to = "PR")
dat[1:10, c("T_int", "T_ext", "PR_int", "PR_ext")] %>% kable()


jazznbass/scaledic documentation built on July 19, 2023, 12:50 a.m.