library(knitr)
library(dplyr)
library(tibble)
library(psych)
library(scaledic)
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = ""
)

In the following examples I will lay out the construction of a dictionary file.

First, we need an example data.frame to work with (i.e. the ex_scaledic_data example dataset):

ex_scaledic_data %>% kable(caption = "ex_scaledic_data")

Second, we need a dictionary file that is also a data.frame. Each row of a dictionary file addresses a variable (also called item in this context) and each column holds a specific attribute. These attributes can be divided in two classes. Firstly, predefined attributes with a specific name and additional attributed that can have any name. The predefined attributes are identified by their specific names. Theses are: item_name, item_label, values, value_labels, missing, weight, and type[^1].

[^1]: Actually there are three more predefined attributes class, scale_function, and scale_filter that are not explained in this tutorial.

Additional attributes can have any name and are mainly used for selecting items or displaying additional information of an item (e.g., the scale and subscale an item belongs to; the reference and the author of an item; the translation of an item).

Building a dictionary step by step

We start the example with a very simple dictionary file containing only item labels and the corresponding item names (i.e, the corresponding variable names in our data frame)[^2]. The dictionary file has two columns, item_name and item_label:

[^2]: The easiest way to create a dictionary file is to use a spreadsheet program and to import the file.

dic_file <- ex_scaledic_dic[, 1:2]
dic_file %>% kable(caption = "Dictionary file")

We combine the data file and the dic file with the apply_dic() function:

dat_dic <- apply_dic(ex_scaledic_data, dic_file)

We get the message that the dic file does not contain information of the datatype (e.g. integer, factor, character). The type attribute is used in various scaledic functions. For example when the data is checked for invalid values, missing values are imputed, or scales are scored. A variable can be one of the following types: integer for numbers without decimals, float for numbers with decimals, character for variables with text, and factor for variables with text or numbers that are levels of a factor. Scaledic will estimate the data type from the given data when the type attribute is not provided in the dic file.

Let us add type information to the dic file:

dic_file <- ex_scaledic_dic %>% select(1,2, type)
dic_file %>% kable(caption = "Dictionary file")

Most item values are of type integer (the item has only whole numbers) with the exception of gender which is of type factor (i.e. it has a nominal scale with several levels).

dat_dic <- apply_dic(ex_scaledic_data, dic_file)

Now we can add weight information to the dic file. The weight attribute tells various scaledic functions a) whether the item is inverted or not, and b) how the values of an item are weighted. If a weight value is unsigned (e.g. 1), the item is not inverted. If a weight value has a negative sign (e.g. -1), the item is inverted. If the (absolute) value is not 1, the item will be weighted when calculating the item scores (e.g. 1.5 will give an item a weight of 1.5).

dic_file <- ex_scaledic_dic %>% select(1,2, weight, type)
dic_file %>% kable(caption = "Dictionary file")

In this example, all weights are 1 and no item is inverted.

Again, we join dic and data file:

dat_dic <- apply_dic(ex_scaledic_data, dic_file)

Now we add the values attribute to the dic file. values defines the valid values that a variable can take. Explicitly defining the values makes it possible to automatically identify invalid values in a data frame (e.g. typos). values are also necessary to reverse score an inverted item. For coding the values, you provide the possible values separated with a comma (e.g., 1, 2, 3, 4 for the integers 1 to 4). It is also possible to use a colon to define a range of integers (e.g., 5:11 indicated all integers starting with 5 and ending with 11). For type float, values represent the maximum and the minimum of valid values (e.g. 5, 11 for all values within 5 and 11 including decimal numbers). For variables of type factor or character you may want to provide text values. in that case, put these values within quotes (e.g., 'm', 'f', 'd' indicates three valid text entries).

Here is the dic file with added values:

dic_file <- ex_scaledic_dic %>% select(1,2, "values", "weight", "type")
dic_file %>% kable(caption = "Dictionary file")
dat_dic <- apply_dic(ex_scaledic_data, dic_file)

Invalid values can be replaced automatically with the check_values() function.

dat_dic <- check_values(dat_dic, report = TRUE, replace = NA) 
dat_dic %>% kable(caption = "data frame with replaced invalid values")

Next, we add value_labels. Value labels give longer labels for all or some of the values. Value labels are coded in the form value = label with a semicolon separating each entry. m = male; f = female; d = diverse codes three value label (quotes are not necessary).

Here is the dic file with added value_labels:

dic_file <- ex_scaledic_dic %>% select(1,2, "values", "value_labels", "weight", "type")
dic_file %>% kable(caption = "Dictionary file")
dat_dic <- apply_dic(ex_scaledic_data, dic_file) %>% check_values(replace = NA)

Now lets see the coding for some of the variables:

dat_dic$rel_1

Valid values are all integers from 1 to 6 and each value has a label.

dat_dic$sui_1

Here, valid values are 0, 1, 2, 3, 4 (short 0:4) but only the poles (0 and 4) have labels.

dat_dic$gender

gender is of type factor. The valid values 'm', 'f', 'd' have been turned into three factor levels with the corresponding labels.

Caution: The labels provided in the value_labels attribute are not automtically turned into the levels of a factor. This is because R internally would turn the values (here 'm', 'f', 'd') into integers (here 1, 2, 3)

dat_dic$age

age can take all numbers from 5 to 11 (minimum is 5 and maximum is 11).

It is a common practice to code missing values with a specific number (rather than just leaving an empty entry on a datasheet). In the example dataset used here, -999 is used as a missing value. To account for this, we add the missing attribute to the dic file containing the missing number (e.g., -999). Multiple missing values are separated by commas (e.g. -99, -77).

dic_file <- ex_scaledic_dic %>% select(1,2, values, value_labels, weight, type, missing)
dic_file %>% kable(caption = "Dictionary file")

The values within the missing attributes are automatically replaced with NA when joining a data with a dic file:

dat_dic <- apply_dic(ex_scaledic_data, dic_file)

To turn off this behavior, set the argument replace_missing = FALSE. Default, values are not checked for invalid entries. To do this, either set check_values = TRUE or apply the check_values() function.

dat_dic <- apply_dic(ex_scaledic_data, dic_file, check_values = TRUE)
dat_dic %>%
  slice(9:20) %>% 
  kable(caption = "Extract from the data frame with replaced missing values and checked values")

In the last step of this tutorial we will add information about the scales that the items belong to. Therefore, we add a new attribute called scale to the dic file and another attribute scale_label for a longer description (we could have named these attributes in any other way as they are not predefined attributes):

dic_file <- ex_scaledic_dic
dic_file %>% kable(caption = "Dictionary file")
dat_dic <- apply_dic(ex_scaledic_data, dic_file, check_values = TRUE)

When can use the scale attribute to select items:

dat_dic %>% 
  select_items(scale == "rel") %>% 
  descriptives()

and the rename_items() function to get the item labels:

dat_dic %>% 
  select_items(scale == "rel") %>% 
  rename_items() %>%
  descriptives() %>%
  kable()


jazznbass/scaledic documentation built on July 19, 2023, 12:50 a.m.