Home

/

CRAN

/

CovSelHigh

/

cov.sel.high: Model-Free Covariate Selection in High Dimensions

cov.sel.high: Model-Free Covariate Selection in High Dimensions
In CovSelHigh: Model-Free Covariate Selection in High Dimensions

Description Usage Arguments Details Value Note Author(s) References See Also Examples

Model-free selection of covariates in high dimensions under unconfoundedness for situations where the parameter of interest is an average causal effect. This package is based on model-free backward elimination algorithms proposed in de Luna, Waernbaum and Richardson (2011) and VanderWeele and Shpitser (2011). Confounder selection can be performed via either Markov/Bayesian networks, random forests or LASSO.

cov.sel.high(T=NULL, Y=NULL, X=NULL,type=c("mmpc","mmhc","rf","lasso"), 
                    betahat=TRUE, parallel=FALSE, Simulate=TRUE,N=NULL, Setting=1,
                    rep=1, Models=c("Linear", "Nonlinear", "Binary"), 
                    alpha=0.05, mmhc_score=c("aic","bic"))

`T`	A vector, containing `0` and `1`, indicating a binary treatment variable.
`Y`	A vector of observed outcomes.
`X`	A matrix or data frame containing columns of covariates. The covariates may be a mix of continuous, unordered discrete (to be specified in the data frame using `factor`), and ordered discrete (to be specified in the data frame using `ordered`).
`type`	The type of method used for selection. The networks algorithms are `"mmpc"` for min-max parents and children (Markov network) and `"mmhc"` for max-min hill climbing (Bayesian network). Other available methods are random forests, `"rf"`, and LASSO, `"lasso"`.
`betahat`	If `betahat=TRUE` the average treatment effect for each selected subset and the full covariate vector is estimated using propensity score matching (PSM) via the function `Match` and using targeted maximum likelihood estimation (TMLE) via the function `tmle`.
`parallel`	If `parallel=TRUE` and there is a registered parallel backend then the computation will be parallelized. Default is `parallel=FALSE`.
`Simulate`	If data is to be simulated according to one of the designs in Häggström (2017) then `Simulate` should be set to `TRUE`.
`N`	If Simulate=TRUE, `N` is the number of observations to be simulated.
`Setting`	If Simulate=TRUE, `Setting` is the simulation setting to be used. Unconfoundedness holds given X if Setting=1. M-bias given X if Setting=2.
`rep`	If Simulate=TRUE, `rep` is the number of replications to be simulated.
`Models`	If Simulate=TRUE, `Models` is the type of outcome models to be used, options are `"Linear"`, `"Nonlinear"` and `"Binary"`.
`alpha`	A numeric value, the target nominal type I error rate (tuning parameter) for `"mmpc"` and `"mmhc"`.
`mmhc_score`	The score to use for `"mmhc"`.

See Häggström (2017).

cov.sel.high returns a list with the following content:

`X.T`	The set of covariates targeting the subset containing all causes of `T`.
`Q.0`	The set of covariates targeting the subset of `X.T` which is also associated with `Y` given `T`=0, the response in the control group.
`Q.1`	The set of covariates targeting the subset of `X.T` which is also associated with `Y` given `T`=1, the response in the treatment group.
`Q`	Union of Q.0 and Q.1.
`X.0`	The set of covariates targeting the subset containing all causes of `Y` given `T`=0.
`X.1`	The set of covariates targeting the subset containing all causes of `Y` given `T`=1.
`X.Y`	Union of X.0 and X.1.
`Z.0`	The set of covariates targeting the subset of `X.0` which is also associated with `T`.
`Z.1`	The set of covariates targeting the subset of `X.1` which is also associated with `T`.
`Z`	Union of Z.0 and Z.1.
`X.TY`	Union of X.T and X.Y, the set of covariates targeting the subset containing all causes of `T` and `Y`.
`cardinalities`	The cardinalities of each selected subset.
`est_psm`	The PSM estimate of the average causal effect, for the full covariate vector and each selected subset.
`se_psm`	The Abadie-Imbens standard error for the PSM estimate of the average causal effect, for the full covariate vector and each selected subset.
`est_tmle`	The TMLE estimate of the average causal effect, for the full covariate vector and each selected subset.
`se_psm`	The influence-curve based standard error for the TMLE estimate of the average causal effect, for the full covariate vector and each selected subset.
`N`	The number of observations.
`Setting`	The Setting used.
`rep`	The number of replications.
`Models`	Models used.
`type`	type used.
`alpha`	alpha used.
`mmhc_score`	score used.
`varnames`	Variable names of the used data.

Depending on the method type specified cov.sel.high calls one of the functions mmpc, mmhc, randomForest, cv.glmnet and, if betahat=TRUE, Match and tmle, therefore the packages bnlearn, randomForest, glmnet, Matching and tmle are required.

Jenny Häggström, <jenny.haggstrom@umu.se>

de Luna, X., I. Waernbaum, and T. S. Richardson (2011). Covariate selection for the nonparametric estimation of an average treatment effect. Biometrika 98. 861-875

Häggström, J. (2017). Data-Driven Confounder Selection via Markov and Bayesian Networks. ArXiv e-prints.

Nagarajan, R., M. Scutari and S. Lebre. (2013) Bayesian Networks in R with Applications in Systems Biology. Springer, New York. ISBN 978-1461464457.

Scutari, M. (2010). Learning Bayesian Networks with the bnlearn R Package. Journal of Statistical Software, 35, 1-22. URL http://www.jstatsoft.org/v35/i03/.

Sekhon, J.S. (2011). Multivariate and Propensity Score Matching Software with Automated Balance Optimization: The Matching Package for R. Journal of Statistical Software, 42, 1-52. URL http://www.jstatsoft.org/v42/i07/.

bnlearn-package, randomForest, cv.glmnet, Match and tmle

##Use simulated data, select subsets using mmpc 
ans<-cov.sel.high(type="mmpc",N=1000, rep=2, Models="Linear", betahat=FALSE, mmhc_score="aic")


##Use simulated data, select subsets using mmpc and estimate ACEs, parallell version
#library(doParallel)
#library(doRNG)
#cl <- makeCluster(4)
#registerDoParallel(cl)
#ans<-cov.sel.high(type="mmpc", parallel=TRUE,  N=500, rep=10, Models="Linear", mmhc_score="aic")
#stopCluster(cl)

CovSelHigh documentation built on May 2, 2019, 3:25 a.m.

CovSelHigh index

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

CovSelHigh
Model-Free Covariate Selection in High Dimensions

cov.sel.high: Model-Free Covariate Selection in High Dimensions
In CovSelHigh: Model-Free Covariate Selection in High Dimensions

Description

Usage

Arguments

Details

Value

Note

Author(s)

References

See Also

Examples

Related to cov.sel.high in CovSelHigh...

R Package Documentation

Browse R Packages

We want your feedback!

CovSelHigh Model-Free Covariate Selection in High Dimensions

cov.sel.high: Model-Free Covariate Selection in High Dimensions In CovSelHigh: Model-Free Covariate Selection in High Dimensions

Description

Usage

Arguments

Details

Value

Note

Author(s)

References

See Also

Examples

Related to cov.sel.high in CovSelHigh...

R Package Documentation

Browse R Packages

We want your feedback!

CovSelHigh
Model-Free Covariate Selection in High Dimensions

cov.sel.high: Model-Free Covariate Selection in High Dimensions
In CovSelHigh: Model-Free Covariate Selection in High Dimensions