knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

1. Aim

This vignette outlines how PsychPower can be used to determine how many and which unique symptom phenotypes are present in a sample. A detailed motivation for this analysis as well as the technicalities of the procedure are outlined in Spiller et al. (2021).

2. Installation

This package is currently only hosted on GitHub. Installation requires the devtools package to be installed and loaded.

library("devtools")
devtools::install_github("orduek/PsychPower")

Once the PsychPower is installed it can be loaded the usual way.

library("PsychPower")

3. Example

This example outlines the analysis performed in Spiller et al. (2021) using sample data from that publication.

The sample data consist of 39,700 complete answers to the 13-item anxiety subscale of the Depression Anxiety Stress Scales (DASS; Nieuwenhuijsen et al. (2003)). All 13 items were rated on a 4-point Likert scale ranging from 1 to 4. The raw data were collected and provided by the Open-Source Psychometrics Project.

3.1 Load Data

First, we load the sample data.

data("data_test")

Next, we inspect the first 10 rows (participants) and 5 of the total 13 columns (questionnaire items) of the sample data.

data_test[,1:5]

3.2 Define symptom combinations

As outlined in the publication, a phenotype is defined as a unique combination of a finite set of items (in this example symptoms) used to describe or define a psychological construct (in this example a mental disorder).

3.2.1 Binarize variables

Here, we are only interested in presence or absence of a symptom and not in their severity. Therefore, we need to dichotomize or binarize the participants' answers. We have to choose a cut-off, defining which ratings should be indicative of a symptom's presence or absence.

In this example, a cut-off of 2 is chosen based on the literature.

data_binarized <- binarize(data_test, cut_off = 2)

NOTE: binarize() handles the cut-off by binarzing any value smaller or equal to the cut-off to 0 and any value greater than the cut-off to 1. It names the binarized variables "v_binN", with ascending N starting with 1. The binarized variables are added to the input data.

Inspection of the binarized ratings of the first 5 variables reveals that the ratings have been binarized.

data_binarized[,14:19]

3.2.2 Determine Symptom Combination Frequency

Next, we determine the number of unique symptom combinations in the sample and count how many times each unique phenotype was reported.

data_frequency <- pheno_frequency(
  data_binarized, target_columns = tidyselect::starts_with("v_bin"))

NOTE: The function pheno_frequency() uses target_columns = tidyselect::starts_with("v_bin") as default, expecting that the variables were dichotomized with binarize(). However, the columns used to define the phenotype of interest can also be selected manually using target_columns = [j:k] (e.g., if the ratings were already collected as a binary response). The output of pheno_frequency() is a dataframe in which every row represents one unique phenotype. The frequency of each phenotype is indicated in the column "freq". Hence, whenever there is at least one phenotype that is reported more than once, the dataframe does not have the same length as the initial data.

3.3 Describe symptom combinations

Next, we explore the characteristics of the different symptom combinations present in the sample.

3.3.1 Most common symptom combinations

First, we inspect the symptom combinations of the most common symptom combinations (in this example, the one of the 5 most common).

common_pheno(data_frequency, frequency = "freq", n_phenotypes = 5)

NOTE: common_pheno() needs to identify the frequency of each phenotypes. By default, it assumes that this information is stored in the "freq" column assuming the frequency was calculated using pheno_frequency(). However, this can be overruled using the argument frequency = "j" with j specifying the name of the column indicating the frequency of each phenotype.

3.3.2 Characteristics

Second, the number of unique phenotypes, the frequency of the most common phenotype and the median frequency of all phenotypes is described.

desc_pheno <- describe_pheno(data_frequency, frequency = "freq")
desc_pheno

3.3.3 Plot

Third, we plot the frequency of the most common phenotypes. In this plot, each bar represents a unique phenotype with its height corresponding to the frequency indicted on the Y axis.

fig1 <- plot_pheno(data_frequency, frequency = "freq", 
                   n_phenotypes = 50, color = "grey26")
fig1

4. Citation

To cite PsychPower in publications, please use:

Spiller, T. R., Duek, O., Helmer, M., Murray, J. D., von Känel, R., & Harpaz-Rotem, I. (2021, October 8). The uncommon is common: Structural similarities of symptom heterogeneity across mental disorders. DOI: 10.31234/osf.io/g4kf8



orduek/PsychPower documentation built on Oct. 25, 2023, 7:36 a.m.