dml: Double Machine Learning Estimates

Description Usage Arguments Value References Examples

View source: R/dml.R

Description

Implements the Double Machine Learning approach (Chernozhukov et al., 2018), which constructs estimates for low-dimensional target parameters in the presence of high-dimensional nuisance parameters.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
dml(
  f,
  d,
  model = "linear",
  ml = "lasso",
  n = 101,
  k = 5,
  score = "concentrate",
  workers = 1,
  drop_na = FALSE,
  family = NULL,
  poly_degree = 1,
  lambda = NULL,
  args = NULL
)

Arguments

f

an object of class formula representing the model to be fitted.

d

a dataframe containing the variables in f.

model

model type or list of user created moment functions. The following model types are implementable: linear for partial linear model, poisson for a partial linear poisson model". If the argument is a list, the list must have three functions in order to generate theta, the coefficient of interest.

  1. psi: function that gives the value of the Neyman-Orthogonal moment at a given value of theta

  2. psi_grad: function that returns the gradient of psi with respect to theta

  3. psi_plr_op: function that gives the variance estimator at a given value of theta.

The default is model = "linear".

n

number of times to repeat the sample splitting and take median of results over the n samples. Default is n = 101.

k

number of folds for cross-fitting

score

takes either value finite or concentrate. finite refers to using the finite nuisance parameter orthogonal score construction, and concentrate refers to using the concentrating out approach. Default is score = "finite"

workers

number of workers to use in running the n dml calculations in parallel. Default is workers = 1, in which case the process is sequential.

drop_na

if TRUE, then any row with an NA value is dropped. Default is false

family

if ml = "lasso", this is passed onto cv.glmnet to describe the response variable type.

poly_degree

degree of polynomial for the nuisance parameters, to be used when ml = "lasso". Default is poly_degree = 1.

lambda

user supplied regularization parameter used when ml = "lasso". The default is NULL, in which case a lambda value is computed using cv.glmnet.

args

list of additional arguments to be passed to cv.glmnet or regression_forest (depending on the value for ml), e.g. list(trace.it = 1)

Value

dml returns an object of class "dml" with the following components:

coefficients

a named vector of coefficients.

vcov

variance-covariance matrix of the main parameters.

nobs

number of observations used

call

original function call with given arguments

References

V. Chernozhukov, D. Chetverikov, M. Demirer, E. Duflo, C. Hansen, W. Newey, and J. Robins. Double/debiased machine learning for treatment and structural parameters.The Econometrics Journal, 21(1):C1–C68, 2018a.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# Effect of temperature and precipitation on corn yield in the presence of
# time and locational effects

data(corn_yield)
library(magrittr)

dml_yield <-
  "logcornyield ~ lower + higher + prec_lo + prec_hi | year + fips" %>%
  as.formula() %>%
  dml(corn_yield, "linear", n = 5,  ml = "lasso", poly_degree = 3, score = "finite")

# use the modelsummary package to export regression tables
library(modelsummary)
modelsummary(list("Lasso" = dml_yield), fmt = 5)

yixinsun1216/crossfit documentation built on June 8, 2021, 8:29 p.m.