knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
This vignette outlines how PsychPower
can be used to determine how many and which unique symptom phenotypes are present in a sample. A detailed motivation for this analysis as well as the technicalities of the procedure are outlined in Spiller et al. (2021).
This package is currently only hosted on GitHub. Installation requires the devtools
package to be installed and loaded.
library("devtools") devtools::install_github("orduek/PsychPower")
Once the PsychPower
is installed it can be loaded the usual way.
library("PsychPower")
This example outlines the analysis performed in Spiller et al. (2021) using sample data from that publication.
The sample data consist of 39,700 complete answers to the 13-item anxiety subscale of the Depression Anxiety Stress Scales (DASS; Nieuwenhuijsen et al. (2003)). All 13 items were rated on a 4-point Likert scale ranging from 1 to 4. The raw data were collected and provided by the Open-Source Psychometrics Project.
First, we load the sample data.
data("data_test")
Next, we inspect the first 10 rows (participants) and 5 of the total 13 columns (questionnaire items) of the sample data.
data_test[,1:5]
As outlined in the publication, a phenotype is defined as a unique combination of a finite set of items (in this example symptoms) used to describe or define a psychological construct (in this example a mental disorder).
Here, we are only interested in presence or absence of a symptom and not in their severity. Therefore, we need to dichotomize or binarize
the participants' answers. We have to choose a cut-off, defining which ratings should be indicative of a symptom's presence or absence.
In this example, a cut-off of 2 is chosen based on the literature.
data_binarized <- binarize(data_test, cut_off = 2)
NOTE: binarize()
handles the cut-off by binarzing any value smaller or equal to the cut-off to 0 and any value greater than the cut-off to 1. It names the binarized variables "v_binN", with ascending N starting with 1. The binarized variables are added to the input data.
Inspection of the binarized ratings of the first 5 variables reveals that the ratings have been binarized.
data_binarized[,14:19]
Next, we determine the number of unique symptom combinations in the sample and count how many times each unique phenotype was reported.
data_frequency <- pheno_frequency( data_binarized, target_columns = tidyselect::starts_with("v_bin"))
NOTE: The function pheno_frequency()
uses target_columns = tidyselect::starts_with("v_bin")
as default, expecting that the variables were dichotomized with binarize()
. However, the columns used to define the phenotype of interest can also be selected manually using target_columns = [j:k]
(e.g., if the ratings were already collected as a binary response).
The output of pheno_frequency()
is a dataframe in which every row represents one unique phenotype. The frequency of each phenotype is indicated in the column "freq". Hence, whenever there is at least one phenotype that is reported more than once, the dataframe does not have the same length as the initial data.
Next, we explore the characteristics of the different symptom combinations present in the sample.
First, we inspect the symptom combinations of the most common symptom combinations (in this example, the one of the 5 most common).
common_pheno(data_frequency, frequency = "freq", n_phenotypes = 5)
NOTE: common_pheno()
needs to identify the frequency of each phenotypes. By default, it assumes that this information is stored in the "freq" column assuming the frequency was calculated using pheno_frequency()
. However, this can be overruled using the argument frequency = "j"
with j specifying the name of the column indicating the frequency of each phenotype.
Second, the number of unique phenotypes, the frequency of the most common phenotype and the median frequency of all phenotypes is described.
desc_pheno <- describe_pheno(data_frequency, frequency = "freq") desc_pheno
Third, we plot the frequency of the most common phenotypes. In this plot, each bar represents a unique phenotype with its height corresponding to the frequency indicted on the Y axis.
fig1 <- plot_pheno(data_frequency, frequency = "freq", n_phenotypes = 50, color = "grey26") fig1
To cite PsychPower
in publications, please use:
Spiller, T. R., Duek, O., Helmer, M., Murray, J. D., von Känel, R., & Harpaz-Rotem, I. (2021, October 8). The uncommon is common: Structural similarities of symptom heterogeneity across mental disorders. DOI: 10.31234/osf.io/g4kf8
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.