View source: R/cb_add_summary_stats.R
cb_add_summary_stats | R Documentation |
The input to cb_add_summary_stats is a data frame and a column from that
data frame in the format cb_add_summary_stats(study, "id"). The column name
is a character string because it is passed from a for loop in the codebook
function. The purpose of cb_add_summary_stats is to attempt to figure out
whether the column is:
Numeric (e.g., height)
Categorical - many categories (e.g. participant id)
Categorical - few categories (e.g. gender)
Time - including dates
This matters because the table of summary stats shown in the codebook document depends on the value cb_add_summary_stats chooses.
cb_add_summary_stats(
df,
.x,
many_cats = 10,
num_to_cat = 4,
digits = 2,
n_extreme_cats = 5
)
df |
Data frame of interest |
.x |
Column of interest |
many_cats |
The many_cats argument sets the cutoff value that partially (i.e., along with the col_type attribute) determines whether cb_add_summary_stats will categorize the variable as categorical with few categories or categorical with many categories. The number of categories that constitutes "many" is defined by the value passed to the many_cats argument. The default is 10. |
num_to_cat |
The num_to_cat argument sets the cutoff value that partially (i.e., along with the col_type attribute) determines whether cb_add_summary_stats will categorize a numeric as categorical. If the col_type attribute is not set for a column AND the number of unique non-missing values is <= num_to_cat, then cb_add_summary_stats will guess that the variable is categorical. The default value for num_to_cat is 4. |
digits |
Number of digits after the decimal to display |
n_extreme_cats |
Number of extreme values to display when the column is
classified as |
The user can tell the cb_add_summary_stats function what to choose explicitly by giving the column a col_type attribute set to one of the following values:
Numeric. For example, height and/or weight.
study <- cb_add_col_attributes(study, height, col_type = "numeric")
Categorical. We describe how many categories vs few categories is determined below.
study <- cb_add_col_attributes(study, id, col_type = "categorical")
Time. Dates, times, and datetimes.
cb_add_col_attributes(study, date, col_type = "time")
If the user does not explicitly set the col_type attribute to one of these values, then cb_add_summary_stats will guess which col_type attribute to assign to each column based on the column's class and the number of unique non-missing values the it has.
However, the number of unique non-missing values isn't used in an absolute way (e.g., 10 or more unique values is ALWAYS many_cats). Instead, the number of unique non-missing values used relative to the values passed to the many_cats parameter and/or the num_to_cat parameter – depending on the class of the column.
A tibble of results
Other add_summary_stats:
cb_summary_stats_few_cats()
,
cb_summary_stats_many_cats()
,
cb_summary_stats_numeric()
,
cb_summary_stats_time()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.