placement-and-recidivism:

Lasso regression for selecting covariates

Benny Salo 2018-06-14

We try to select the covariates that have a unique effect on recidivism. Excluding covariates that are not needed should reduce variance of the analyses and make the presentation clearer.

To do this we use lasso regression.

devtools::wd()
analyzed_data_plac <- readRDS("not_public/analyzed_data_plac.rds")

devtools::load_all(".")
library(tidyverse)

We define a formula for predicting reoffence. We consider all predictors previously defined. Supervision of parole is not included. We will do exact matching on this variable.

pred_reci_formula <- write_formula("reoffenceThisTerm", all_predictors)

Do lasso regression with ten fold cross-validation.

set.seed(120618)
# Create folds
ten_folds_reoffence <- 
  caret::createFolds(y = analyzed_data_plac$reoffenceThisTerm, k = 10)

# Define control function to be used with caret::train
ctrl_fun_reoffence <- caret::trainControl(
  method = "cv",
  number = 10,
  summaryFunction = caret::twoClassSummary,
  classProbs = TRUE,
  verboseIter = TRUE,
  savePredictions = "final",
  index = ten_folds_reoffence,
  returnData = FALSE
  )

# Train model
trained_lasso <- 
  caret::train(
      form      = pred_reci_formula,
      data      = analyzed_data_plac,
      method    = "glmnet",
      family    = "binomial", # passed to glmnet, define as logistic regression
      standardize = TRUE,     # passed to glmnet, explicitly standardize  
      metric    = "ROC",
      trControl = ctrl_fun_reoffence,
      tuneGrid  = expand.grid(alpha  = 1,
                              lambda = seq(0, 0.3, 0.001)))

## Loading required package: lattice

## 
## Attaching package: 'caret'

## The following object is masked from 'package:purrr':
## 
##     lift

## + Fold01: alpha=1, lambda=0.3 
## - Fold01: alpha=1, lambda=0.3 
## + Fold02: alpha=1, lambda=0.3 
## - Fold02: alpha=1, lambda=0.3 
## + Fold03: alpha=1, lambda=0.3 
## - Fold03: alpha=1, lambda=0.3 
## + Fold04: alpha=1, lambda=0.3 
## - Fold04: alpha=1, lambda=0.3 
## + Fold05: alpha=1, lambda=0.3 
## - Fold05: alpha=1, lambda=0.3 
## + Fold06: alpha=1, lambda=0.3 
## - Fold06: alpha=1, lambda=0.3 
## + Fold07: alpha=1, lambda=0.3 
## - Fold07: alpha=1, lambda=0.3 
## + Fold08: alpha=1, lambda=0.3 
## - Fold08: alpha=1, lambda=0.3 
## + Fold09: alpha=1, lambda=0.3 
## - Fold09: alpha=1, lambda=0.3 
## + Fold10: alpha=1, lambda=0.3 
## - Fold10: alpha=1, lambda=0.3 
## Aggregating results
## Selecting tuning parameters
## Fitting alpha = 1, lambda = 0.04 on full training set

We can look at the relative importance of the predictors in the trained model with the chosen level of regularization.

caret::varImp(trained_lasso)

## glmnet variable importance
## 
##   only 20 most important variables shown (out of 30)
## 
##                                                  Overall
## drug_related_probl                               100.000
## o_traffic                                         99.538
## economy_problems                                  45.337
## o_sexual                                          36.443
## ps_remandTerms_mr                                 33.127
## o_assault                                         16.250
## o_homicide                                        13.879
## ps_prisonTerms_mr                                 11.882
## ageAtRelease                                       5.493
## ageFirstSentence_mr                                2.645
## ageFirst_missing                                   0.000
## alcohol_problems                                   0.000
## ps_escapeHistoryUnauthorized absence of any kind   0.000
## o_whitecollar                                      0.000
## employment_probl                                   0.000
## resistance_change                                  0.000
## o_againstOfficer                                   0.000
## o_damage                                           0.000
## o_otherProperty                                    0.000
## ps_defaultTerms_mr                                 0.000

Predictors with non-zero coefficients will be treated as potential confounders. We extract this character vector and save it for later use.

potential_confounders <- varImp(trained_lasso)$importance %>% 
  rownames_to_column(var = "predictor") %>% 
  filter(Overall > 0) %>% 
  select(predictor) %>% 
  simplify()

potential_confounders

##            predictor1            predictor2            predictor3 
## "ageFirstSentence_mr"        "ageAtRelease"   "ps_prisonTerms_mr" 
##            predictor4            predictor5            predictor6 
##   "ps_remandTerms_mr"           "o_traffic"           "o_assault" 
##            predictor7            predictor8            predictor9 
##          "o_homicide"            "o_sexual"    "economy_problems" 
##           predictor10 
##  "drug_related_probl"

Save for later use.

devtools::use_data(potential_confounders, overwrite = TRUE)

bennysalo/placement-and-recidivism documentation built on May 21, 2019, 9:47 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

bennysalo/placement-and-recidivism

data-raw/02_selecting_covariates_though_lass_regression.md
In bennysalo/placement-and-recidivism:

Lasso regression for selecting covariates

Load data and setup.

R Package Documentation

Browse R Packages

We want your feedback!

bennysalo/placement-and-recidivism

data-raw/02_selecting_covariates_though_lass_regression.md In bennysalo/placement-and-recidivism:

Lasso regression for selecting covariates

Load data and setup.

R Package Documentation

Browse R Packages

We want your feedback!

data-raw/02_selecting_covariates_though_lass_regression.md
In bennysalo/placement-and-recidivism: