The doubleLassoSelect
function is an alternative implementation of Double Lasso Selection on OLS. This implementation is based primarily on package glmet
and a paper of O. Urminsky, C. Hansen, and V. Chernozhukov (working draft), as listed in the reference.
For original implementation, please see hdm
package by Chernozhukov, Hansen, and Spindler. About the mathematical details of the method, please refer to the original papers listed in the reference section.
Compared to rlassoEffects
function in the original hdm, doubleLassoSelect
function in dlsr
provides an alternative implementation of this specific method with the following benefits:
1. The doubleLassoSelect
function accepts character vectors as variable input, instead of matrix indices or logical vectors. This improves code readability and facilitates batch implementation with external data source such as a csv.
1. It Supports interaction terms as variable input. The doubleLassoSelect
handles the matrix expansion for you.
1. Instead of the result of a linear model, the doubleLassoSelect
outputs a data frame (data.table
) with the selected variables. This provides users more flexibility in subsequent operations, for example, applying the selected result further in a latent class model.
Maintainer: Chih-Yu Chiang chihyuchiang@uchicago.edu
install.packages("devtools")
devtools::install_github("ChihYuChiang/dlsr")
doubleLassoSelect(df, outcome, treatment, test, k=15)
This function implements Double Lasso Selection on a specified data frame, with specified treatment variables to be included in the final model and covariates to be tested via the selection process.
Argument | Description
------- | ------
df | Accepts data.frame
and data.table
. The data frame must contain all the variables specified in outcome, treatment, and test.
outcome | Accepts single character
value. It cannot be an empty character
. The character specifies the outcome variable's name, which will be searched in the column names of provided data frame.
treatment | Accepts single character
value or a character vector
. It specifies the treatment variable's name(s), which will be searched in the column names of provided data frame. The treatment variables are those variables will NOT go through the selection and will be included in the final output data set. This parameter accepts empty character
, which implies no treatment variable to be included in the process.
test | Accepts single empty character
or a character vector
with a length >= 2 (restricted by the glmet
package). It specifies the test variable's name(s), which will be searched in the column names of provided data frame. The test variables are those covariates will go through the selection and may or may not be included in the final data set. This parameter accepts empty character, which implies performing selection on all variables except for the outcome and treatment variables.
k | Accepts a numeric
value. This is the number of times lambda
being updated. The lambda
here is a parameter used in lasso regression to represent the degree of regularization. You do not have to adjust this value in most situations. The default value is suggested by the paper specified in the package reference.
This function returns a data frame (data.table
) with selected variables.
#Fetch data for demonstration
data(mtcars)
#Input example 1:
#Character vectors as `treatment` and `test` input with an interaction term
outcome <- "mpg"
treatment <- c("cyl", "hp")
test <- c("drat", "disp", "vs", "cyl:hp")
#Input example 2:
#Empty character as `treatment` and `test` input
outcome <- "mpg"
treatment <- ""
test <- ""
#Acquire the selected data frame
DT_select <- doubleLassoSelect(df=mtcars, outcome=outcome, treatment=treatment, test=test)
#Implement a linear model after the selection
model_lm <- lm(as.formula(sprintf("`%s` ~ .", outcome)), data=DT_select)
summary(model_lm)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.