knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
After installing the package, you can load it with:
library(ABCDscores)
Alternatively you can call functions directly, without loading the package,
using ::
, e.g., ABCDscores::name_of_function(...)
To compute summary scores, you'll need to have downloaded data from the ABCD Study®. To request access to the data, visit the NIH Brain Development Cohorts (NBDC) Data Hub Once you have access, you can use different tools to access and download the data; they are described in more detail in the ABCD data documentation.
Here we assume that you created a dataset containing the variables you want to
summarize in DEAP and downloaded it in the
rds
format. Afterwards, unzip the dataset.rds.zip
file to the working
directory (or move the zip file to the working directory and use
utils::unzip("dataset.rds.zip")
to extract all files). The unzipped files
should consist of a dataset.rds
file and an Excel file with the data
dictionary and categorical levels.
Load the data into R using the following command:
data <- readRDS("dataset.rds")
Before computing summary scores, it is important to understand the structure and nomenclature of the functions in the package:
compute_<score_name>()
. For example, the function to compute the score
fc_p_psb_mean
is named compute_fc_p_psb_mean()
.^[There are a few exceptions
to this rule—the summary scores in the tables su_y_sui
and su_y_tlfb
are
computed using higher level functions as explained in the SUI and
TLFB vignettes.]compute_<table_name>_all()
function that computes all scores for that
measure/table. For example, the function to compute all scores for the
fc_p_psb
measure/table is named compute_fc_p_psb_all()
.vars_<measure_name>
. For example, the vector with the
columns that are summarized by the fc_p_psb_mean
function is named
vars_fc_p_psb
.The references page provides a list of all available functions and their parameters.
After reading in the data, we can start to compute summary scores. As an
example, we will demonstrate how to compute the two summary scores for the
fc_p_psb
measure/table (fc_p_psb_mean
and fc_p_psb_nm
) in two different
ways:
_all()
function to compute all scores for the measure/table at
once.When we refer to the documentation for compute_fc_p_psb_mean()
, we see that it
requires the following variables: fc_p_psb_001
, fc_p_psb_002
, and
fc_p_psb_003
. If these variables are part of the dataset created in and
downloaded from DEAP, they should be present
in the data after reading in dataset.rds
as demonstrated above.
Here, for demonstration purposes, we will create a dummy data frame with these columns:
data <- tibble::tibble( fc_p_psb_001 = c("1", "2", "3", "4", "5"), fc_p_psb_002 = c("1", NA, "3", "4", NA), fc_p_psb_003 = c("1", "2", "2", "4", NA) ) data
For most summary score functions, only the data
argument (input data frame) is
required, i.e., we can just use the function like this:
compute_fc_p_psb_mean(data)
We can do the same using fc_p_psb_nm()
:
compute_fc_p_psb_nm(data)
We can also compute both scores at the same time by chaining the function calls using the pipe operator:
data |> compute_fc_p_psb_mean() |> compute_fc_p_psb_nm()
Lastly, if we want to compute all scores for the measure with one function call,
we can use the compute_<table_name>_all()
function for the fc_p_psb
table:
compute_fc_p_psb_all(data)
data
The data
argument is the input data frame that contains the columns required
to compute the score. The required columns are documented in the function
documentation for each score.
name
The name
argument is used to specify the name of the output score. The default
default value for this parameter is the official name of the column in the
released data, but it can be overridden by users with a custom name.
compute_fc_p_psb_mean(data, name = "my_custom_name")
For example, this is useful when the data frame specified in data
contains the
official summary score that one is trying to reproduce. In this case, the user
is required to specify a different name; otherwise the function will return an
error.
combine
The combine
argument is used to specify whether to combine the output score
with the input data frame. The default value is TRUE
, which means the output
score is appended as a new column on the right hand side of the input data
frame. If the argument is set to FALSE
, the output score is returned as a
single-column data frame:
compute_fc_p_psb_mean(data, combine = FALSE)
max_na
The max_na
argument is used to specify the maximum number of missing values
across all summarized variables a given row (or participant/event) can have for
the summary score to still be computed. If the number of missing values
in a row exceeds the specified value, the score for that row is set to NA
.
Depending on the summary score, the number of missing values allowed may vary
and not all summary score functions have this argument.
NULL
: No limit on missing values.0
: No missing values allowed.1
: At most one missing value allowed.For most summary scores in the ABCD data resource, max_na
is set to a number
that ensures that >=80% of the variables that the given score summarizes have a
non-missing value. Users can use the max_na
argument if they want to compute
the summary score in a more lenient or more restrictive manner.
As an example, let's explore how the summary score changes when we set max_na
argument to 1
(above we used the default, which in the case of
compute_fc_p_psb_mean()
is 0
). Now a score is computed for the second row
which has one missing value but not for the last row which has two missing
values:
compute_fc_p_psb_mean(data, max_na = 1)
When we change max_na
to 2
, a score is also computed for the last row:
compute_fc_p_psb_mean(data, max_na = 2)
exclude
The exclude
argument is used to specify values that should be excluded
from the computation of the score. Some specific values in the data might
be considered as missing values, e.g., coded non-responses like "Don't know"
(999
), "Decline to answer" (777
), etc. This argument allows the user to
specify these values so that they are treated as missing values during the
computation of the score (importantly, the max_na
argument applies to all
values that are either missing, NA
, or specified as values to be excluded
using the exclude
argument). Not all score functions have this argument.
In this example we use another score function compute_mh_p_abcl__afs__frnd_sum
which has the exclude
argument. We first construct a dummy data frame:
data <- tibble::tibble( mh_p_abcl__frnd_001 = c(1, 2, 3, 4, 5), mh_p_abcl__frnd_002 = c(1, 777, 3, 4, 777), mh_p_abcl__frnd_003 = c(1, 2, NA, 4, 777), mh_p_abcl__frnd_004 = c(1, 2, 3, 4, 999), ) data
When we compute the score, only the 1, 4 rows are computed, because other rows
contain 777
or 999
or NA
values.
compute_mh_p_abcl__afs__frnd_sum(data, exclude = c("777", "999"))
We can also exclude custom values, for example, we can exclude 4
, and then
only the first row is computed.
compute_mh_p_abcl__afs__frnd_sum(data, exclude = c("777", "999", "4"))
The compute_<score_name>()
functions are the main functions to compute summary
scores, with one summary score function for each score (besides a few exceptions
that are documented in the other vignettes). However, to be more
concise, the main functions often use utility functions. These utility functions
are not necessarily meant to be used directly by users of this package, but they
are documented and exported for transparency and reproducibility. For the
documentation of these functions, see the
reference page.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.