emil: Introduction to the emil package

Description Central topics and functions Methods included in the package Author(s)

Description

The emil package implements a framework for working with predictive modeling problems without information leakage. For an overview of its functionality please read the original publication included as the package's vignette (to be added).

Central topics and functions

Setting up modeling problems

resample

Functions for generating and resampling schemes and information on how to implement custom resampling methods.

pre_process

Data pre-processing functions.

modeling_procedure

Manages algorithms used for fitting models, making predictions, and extracting feature importance scores.

error_fun

Performance estimation functions used to tune parameters and evaluate performance of modeling procedures.

Solving modeling problems

fit

Fit a model (according to a procedure).

tune

Tune parameters of a procedure.

predict

Use a fitted model to predict the response of observations.

evaluate

Evaluate the performance of a procedure using resampling.

learning_curve

Learning curve analysis.

Managing the results of modeling problems

get_prediction

Extract predictions from resampled modeling results.

get_tuning

Extract feature importance scores of a fitted model or resampled modeling results.

get_importance

Extract feature importance scores of a fitted model or resampled modeling results.

subtree

Extracts results from the output of evaluate. It is essentially a recursive version of lapply and sapply.

select

Interface between emil and the dplyr package for data manipulation. Can be used to subset modeling results, reorganize or summarize to help interpretation or prepare for plotting.

Methods included in the package

Resampling methods

See resample for information on usage and implementation of custom methods.

resample_holdout

Repeated holdout.

resample_crossvalidation

Cross validation.

Data pre-processing methods

See pre_process for information on usage and implementation of custom methods. The imputation functions can also be used outside of the resampling scheme, see impute.

pre_split

Only split, no transformation.

pre_center

Center data to have mean 0 of each feature.

pre_scale

Center and scale data to have mean 0 and standard deviation 1.

pre_impute_median

Impute missing values with feature medians.

pre_impute_knn

Impute missing values with k-NN, see pre_impute_knn for details on how to set parameters.

Modeling methods

The following modeling methods are included in the emil package. For a complete list of available methods in both the emil package and other loaded packages, please use list_method. See modeling_procedure for information on usage and extension for information on implementation of custom methods.

cforest

Conditional inference forest.

coxph

Cox proportional hazards model.

glmnet

Elastic net.

lasso

LASSO.

lda

Linear discriminant.

lm

Linear model.

pamr

Nearest shrunken centroids.

qda

Quadratic discriminant.

randomForest

Random forest.

ridge_regression

Ridge regression.

rpart

Decision trees.

It is also possible to incorporate any method from the ‘caret’ package by using the function fit_caret.

To search for emil compatible methods in all attached packages use the list_method function.

Performance estimation methods

See error_fun for information on usage and implementation of custom methods. Since the framework is designed to minimize the error when tuning parameters, some measures are negated, e.g. neg_auc.

For classification problems:

error_rate

Fraction of predictions that were incorrect.

weighted_error_rate

See its own documentation.

neg_auc

Negative area under ROC curve. To plot the ROC curves see roc_curve.

neg_gmpa

Negative geometric mean of class-specific prediction accuracy. Good for problems with imbalanced class sizes.

For regression problems:

mse

Mean square error.

rmse

Root mean square error.

For survival analysis problem:

neg_harrell_c

Negative Harrell's concordance index.

Plotting

Plotting is not the one of the main aims of the package and the methods that do exist mainly serves as examples for how to write your own. These exists for:

Author(s)

Christofer Bäcklin


Molmed/emil documentation built on May 7, 2019, 4:58 p.m.