R/data.R

#' Variable table
#'
#' Variable names, labels, the role they play in the analyses and comments.
#' For RITA-items comments this includes the full wording of the item translated
#' from english by the study authors.
#'
#' @format A data frame with 114 rows and 5 variables:
#' \describe{
#'   \item{Variable}{Variable name as used in R scripts}
#'   \item{Label}{Short descriptive label}
#'   \item{Role}{The role the variable plays in the analyses.
#'   Can be outcome, predictor_static, predictor_dynamic, predictor_term, or descriptive}
#'   \item{Descriptives}{The type of descriptive statistics to produce for the variable}
#'   \item{Comment}{Additional comments}
#'
#' }
#' @source variable_table.csv compiled by Benny Salo
"variable_table"



#' Model performances in the training set based on 1000 resamples
#'
#' This a data object of class \code{resamples} generated by
#' \code{caret::resamples} that contains the model performance of all models in
#' the training set based on 250 resamples of a 4-fold cross-validation. Model names can be looked up
#' \code{model_grid}.
#'
#' The performance metrics are:
#' - d_AUC: Cohen's d based on a naive transformation from Area Under Curve
#' - E_O_ratio
#' - logLoss
#' - McF_R2: McFadden's Pseudo R^2
#' - ROC: Area Under Curve from a receiver operating characteristic curve.
#'
#'
#' @source The data object is created in
#'   "data-raw/06_model_perfs_training_set.Rmd". See also
#'   "data-raw/00_About_data_raw.Rmd"
"model_perfs_training_set1000"


#' Descriptive statistics for numerical variables
#'
#' #' @format A data frame with 20 rows and 7 variables: \describe{
#' \item{variable}{Variable name as used in R scripts}
#' \item{no_reoffence_mean}{Mean for the group of individuals that did not
#' reoffend}
#' \item{reoffence_nonviolent_mean}{Mean for the group of individuals
#' that reoffended with a non-violent crime}
#' \item{reoffence_violent_mean}{Mean for the group of individuals
#' that reoffended with assault or homicide}
#' \item{no_reoffence_sd}{Standard deviation for the group of individuals that
#' did not reoffend}
#' \item{reoffence_nonviolent_sd}{SD for the group of individuals
#' that reoffended with a non-violent crime}
#' \item{reoffence_violent_sd}{SD for the group of individuals
#' that reoffended with assault or homicide} }
#'
#' @source data-raw/descriptives.Rmd
"descriptive_stats_num"

#' Descriptive statistics for categorical variables
#'
#' #' @format A data frame with 233 rows and 8 variables: \describe{
#' \item{variable}{Variable name as used in R scripts}
#' \item{var_level}{Level on the variable to which the frequencies and percentages refer}
#' \item{no_reoffence_freq}{Frequency in the group of individuals that did not
#' reoffend}
#' \item{reoffence_nonviolent_freq}{Frequency in the group of individuals
#' that reoffended with a non-violent crime}
#' \item{reoffence_violent_freq}{Frequency in the group of individuals
#' that reoffended with assault or homicide}
#' \item{no_reoffence_perc}{Percentage in the group of individuals that
#' did not reoffend}
#' \item{reoffence_nonviolent_perc}{Percentage in the group of individuals
#' that reoffended with a non-violent crime}
#' \item{reoffence_violent_perc}{Percentage in the group of individuals
#' that reoffended with assault or homicide} }
#'
#' @source data-raw/descriptives.Rmd
"descriptive_stats_cat"
bennysalo/predict-recidivism documentation built on May 29, 2019, 10:34 a.m.