dlsr: Double Lasso Selection


The doubleLassoSelect function is an alternative implementation of Double Lasso Selection on OLS. This implementation is based primarily on package glmet and a paper of O. Urminsky, C. Hansen, and V. Chernozhukov (working draft), as listed in the reference.

For original implementation, please see hdm package by Chernozhukov, Hansen, and Spindler. About the mathematical details of the method, please refer to the original papers listed in the reference section.


Compared to rlassoEffects function in the original hdm, doubleLassoSelect function in dlsr provides an alternative implementation of this specific method with the following benefits: 1. The doubleLassoSelect function accepts character vectors as variable input, instead of matrix indices or logical vectors. This improves code readability and facilitates batch implementation with external data source such as a csv. 1. It Supports interaction terms as variable input. The doubleLassoSelect handles the matrix expansion for you. 1. Instead of the result of a linear model, the doubleLassoSelect outputs a data frame (data.table) with the selected variables. This provides users more flexibility in subsequent operations, for example, applying the selected result further in a latent class model.


Maintainer: Chih-Yu Chiang





doubleLassoSelect(df, outcome, treatment, test, k=15)


This function implements Double Lasso Selection on a specified data frame, with specified treatment variables to be included in the final model and covariates to be tested via the selection process.


Argument | Description ------- | ------ df | Accepts data.frame and data.table. The data frame must contain all the variables specified in outcome, treatment, and test. outcome | Accepts single character value. It cannot be an empty character. The character specifies the outcome variable's name, which will be searched in the column names of provided data frame. treatment | Accepts single character value or a character vector. It specifies the treatment variable's name(s), which will be searched in the column names of provided data frame. The treatment variables are those variables will NOT go through the selection and will be included in the final output data set. This parameter accepts empty character, which implies no treatment variable to be included in the process. test | Accepts single empty character or a character vector with a length >= 2 (restricted by the glmet package). It specifies the test variable's name(s), which will be searched in the column names of provided data frame. The test variables are those covariates will go through the selection and may or may not be included in the final data set. This parameter accepts empty character, which implies performing selection on all variables except for the outcome and treatment variables. k | Accepts a numeric value. This is the number of times lambda being updated. The lambda here is a parameter used in lasso regression to represent the degree of regularization. You do not have to adjust this value in most situations. The default value is suggested by the paper specified in the package reference.


This function returns a data frame (data.table) with selected variables.


#Fetch data for demonstration

#Input example 1:
#Character vectors as `treatment` and `test` input with an interaction term
outcome <- "mpg"
treatment <- c("cyl", "hp")
test <- c("drat", "disp", "vs", "cyl:hp")

#Input example 2:
#Empty character as `treatment` and `test` input
outcome <- "mpg"
treatment <- ""
test <- ""

#Acquire the selected data frame
DT_select <- doubleLassoSelect(df=mtcars, outcome=outcome, treatment=treatment, test=test)

#Implement a linear model after the selection
model_lm <- lm(as.formula(sprintf("`%s` ~ .", outcome)), data=DT_select)

ChihYuChiang/dlsr documentation built on May 30, 2019, 10:45 p.m.