regImputation: Linear regression and lasso based imputation

Description Usage Arguments Value Examples

Description

Creates a dataframe with imputed values using either linear regression or lasso based models. For each variable in given data frame, the function finds the best correlated predictors (number of which is set by top_predictors), and uses these to construct models for predicting missing values.

Usage

1
2
3
regImputation(dataframe, matrix, continuous = "", categorical = "",
  method = "lm", parallel = 0, threshold = 0.4, top_predictors = 3,
  debug = 0, degree = 1, test = 0, failmode = "skip")

Arguments

method

method for imputation ("lm" for ordinary least squares linear regression or "lasso" for lasso regularization)

parallel

whether to use parallel processes (for MacOSX only at the moment)

threshold

for selection of predictors based on correlation; values between 0 and 1.

top_predictors

how many predictors to use in imputation prediction; more values can lead to better quality but more sparsely available predictions.

debug

debug mode; shows which models are running, the quality of predictions relative to original data, and any model errors. 1=progress, errors and warnings, 2=progress,errors, warnings and prediction quality.

degree

the degree of polynomial effects to estimate (1=main effects only, 2=quadratic, 3=cubic, etc.)

test

test mode; runs on only the first 4 variables; helpful for trying out the function options before running full imputation.

failmode

what to do if prediction fails for any reason. Defaults to returning the original variable vector (failmode='skip'), but can be told to impute central tendency instead with option (failmode='impute')

Value

Dataframe containing imputed variables, with imputations performed only on missing values and retaining original data where available.

Examples

1
## Not run: regImputation(dataframe, matrix, method='polywog', parallel=1, debug=1, test=1)

annafil/FFCRegressionImputation documentation built on May 12, 2019, 1:59 p.m.