Description Central topics and functions Methods included in the package Author(s)
The emil package implements a framework for working with predictive modeling problems without information leakage. For an overview of its functionality please read the original publication included as the package's vignette (to be added).
resampleFunctions for generating and resampling schemes and information on how to implement custom resampling methods.
pre_processData pre-processing functions.
modeling_procedureManages algorithms used for fitting models, making predictions, and extracting feature importance scores.
error_funPerformance estimation functions used to tune parameters and evaluate performance of modeling procedures.
fitFit a model (according to a procedure).
tuneTune parameters of a procedure.
predictUse a fitted model to predict the response of observations.
evaluateEvaluate the performance of a procedure using resampling.
learning_curveLearning curve analysis.
get_predictionExtract predictions from resampled modeling results.
get_tuningExtract feature importance scores of a fitted model or resampled modeling results.
get_importanceExtract feature importance scores of a fitted model or resampled modeling results.
subtreeExtracts results
from the output of evaluate. It is
essentially a recursive version of lapply and
sapply.
selectInterface between emil and the
dplyr package for data manipulation.
Can be used to subset modeling results, reorganize or summarize
to help interpretation or prepare for plotting.
See resample for information on usage and implementation
of custom methods.
resample_holdoutRepeated holdout.
resample_crossvalidationCross validation.
See pre_process for information on usage and
implementation of custom methods. The imputation functions
can also be used outside of the resampling scheme, see
impute.
pre_splitOnly split, no transformation.
pre_centerCenter data to have mean 0 of each feature.
pre_scaleCenter and scale data to have mean 0 and standard deviation 1.
pre_impute_medianImpute missing values with feature medians.
pre_impute_knnImpute missing values
with k-NN, see pre_impute_knn for details on
how to set parameters.
The following modeling methods are included in the emil package.
For a complete list of available methods in both the emil package and
other loaded packages, please use list_method.
See modeling_procedure for information on usage
and extension for information on
implementation of custom methods.
cforestConditional inference forest.
coxphCox proportional hazards model.
glmnetElastic net.
lassoLASSO.
ldaLinear discriminant.
lmLinear model.
pamrNearest shrunken centroids.
qdaQuadratic discriminant.
randomForestRandom forest.
ridge_regressionRidge regression.
rpartDecision trees.
It is also possible to incorporate any method from the ‘caret’
package by using the function fit_caret.
To search for emil compatible methods in all attached packages use the
list_method function.
See error_fun for information on usage and implementation
of custom methods. Since the framework is designed to minimize the error
when tuning parameters, some measures are negated, e.g. neg_auc.
For classification problems:
error_rateFraction of predictions that were incorrect.
weighted_error_rateSee its own documentation.
neg_aucNegative area under ROC curve.
To plot the ROC curves see roc_curve.
neg_gmpaNegative geometric mean of class-specific prediction accuracy. Good for problems with imbalanced class sizes.
For regression problems:
mseMean square error.
rmseRoot mean square error.
For survival analysis problem:
neg_harrell_cNegative Harrell's concordance index.
Plotting is not the one of the main aims of the package and the methods that do exist mainly serves as examples for how to write your own. These exists for:
Learning curve analyses.
Resampling schemes.
ROC-curves.
Christofer Bäcklin
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.