cv_dr_transform: Cross-Validated Doubly Robust Pseudo-Outcome Transformation
In Netflix/sherlock: Causal Machine Learning for Segment Discovery and Analysis

Generation of the doubly robust pseudo-outcome required for estimation of the conditional average treatment effect (CATE) based upon a transformation using the form of the efficient influence function (EIF; a key quantity in semiparametric statistics) of the average treatment effect. Creation of the pseudo-outcome happens within a particular cross-validation fold; however, depending on the value of the argument split_type either the CATE is estimated by a call to cv_fit_cate or a placeholder column for the CATE estimate is generated. In the latter case, actual estimation of the CATE is deferred to an optional step in est_cate, in which the CATE is estimated on the full data.

1 2	cv_dr_transform(fold, data_est_spec, ps_learner, or_learner, cate_learner, split_type)

`fold`	An object specifying the cross-validation folds into which the observations fall, as generated by `make_folds`.
`data_est_spec`	An input `data.table` object created from the input data by `set_est_data`. Note that this data container object has specialized attributes appended to it, so it must be created by that internal utility function.
`ps_learner`	An instantiated learner object, with class inheriting from `Lrnr_base`, from sl3, to be used for estimation of the propensity score (the probability of receiving treatment, conditional on covariates). Note that the outcome of this estimation task is strictly binary and that algorithms or ensembles should be set up accordingly.
`or_learner`	An instantiated learner object, with class inheriting from `Lrnr_base`, from sl3, to be used for estimation of the outcome regression (the mean of the response variable, conditional on exposure and covariates).
`cate_learner`	An instantiated learner object, with class inheriting from `Lrnr_base`, from sl3, to be used to estimate the CATE, based on a regression of a doubly robust pseudo-outcome on the specified segmentation covariates. Note that the outcome of this estimation task is derived from the other nuisance parameter estimates and should be expected to always be continuous-valued, so algorithms or ensembles should be set up accordingly.
`split_type`	A `character` string (of length one) indicating the sample-splitting "level" at which estimation of the CATE is performed. The choices are "inner", for estimation of the CATE within folds (i.e., at the the same level at which nuisance parameters are estimated), and "outer", in which case the CATE is estimated at the "full-sample" level.

A list (as required by cross_validate) containing a data.table of the validation sample for the given cross-validation fold, augmented with additional columns that specify the nuisance parameter estimates, the doubly robust pseudo-outcome, and (possibly) the estimated CATE.

Netflix/sherlock documentation built on Dec. 17, 2021, 5:22 a.m.