easy_analysis: The core recipe of easyml.
In easyml: Easily Build and Evaluate Machine Learning Models

Description Usage Arguments Value See Also

This recipe is the workhorse behind all of the easy_* functions.

easy_analysis(.data, dependent_variable, algorithm, family = "gaussian",
  resample = NULL, preprocess = NULL, measure = NULL,
  exclude_variables = NULL, categorical_variables = NULL,
  train_size = 0.667, foldid = NULL, survival_rate_cutoff = 0.05,
  n_samples = 1000, n_divisions = 1000, n_iterations = 10,
  random_state = NULL, progress_bar = TRUE, n_core = 1,
  coefficients = NULL, variable_importances = NULL, predictions = NULL,
  model_performance = NULL, model_args = list())

`.data`	A data.frame; the data to be analyzed.
`dependent_variable`	A character vector of length one; the dependent variable for this analysis.
`algorithm`	A character vector of length one; the algorithm to run on the data. Choices are currently one of c("deep_neural_network", "glinternet", "glmnet", "neural_network", "random_forest", "support_vector_machine").
`family`	A character vector of length one; the type of regression to run on the data. Choices are one of c("gaussian", "binomial"). Defaults to "gaussian".
`resample`	A function; the function for resampling the data. Defaults to NULL.
`preprocess`	A function; the function for preprocessing the data. Defaults to NULL.
`measure`	A function; the function for measuring the results. Defaults to NULL.
`exclude_variables`	A character vector; the variables from the data set to exclude. Defaults to NULL.
`categorical_variables`	A character vector; the variables that are categorical. Defaults to NULL.
`train_size`	A numeric vector of length one; specifies what proportion of the data should be used for the training data set. Defaults to 0.667.
`foldid`	A vector with length equal to `length(y)` which identifies cases belonging to the same fold.
`survival_rate_cutoff`	A numeric vector of length one; for `easy_glmnet`, specifies the minimal threshold (as a percentage) a coefficient must appear out of n_samples. Defaults to 0.05.
`n_samples`	An integer vector of length one; specifies the number of times the coefficients and predictions should be generated. Defaults to 1000.
`n_divisions`	An integer vector of length one; specifies the number of times the data should be divided when replicating the measures of model performance. Defaults to 1000.
`n_iterations`	An integer vector of length one; during each division, specifies the number of times the predictions should be generated. Defaults to 10.
`random_state`	An integer vector of length one; specifies the seed to be used for the analysis. Defaults to NULL.
`progress_bar`	A logical vector of length one; specifies whether to display a progress bar during calculations. Defaults to TRUE.
`n_core`	An integer vector of length one; specifies the number of cores to use for this analysis. Currently only works on Mac OSx and Unix/Linux systems. Defaults to 1.
`coefficients`	A logical vector of length one; whether or not to generate coefficients for this analysis.
`variable_importances`	A logical vector of length one; whether or not to generate variable importances for this analysis.
`predictions`	A logical vector of length one; whether or not to generate predictions for this analysis.
`model_performance`	A logical vector of length one; whether or not to generate measures of model performance for this analysis.
`model_args`	A list; the arguments to be passed to the algorithm specified.

A list of class easy_*, where * is the name of the algorithm.

call: An object of class call; the original function call.
data: A data.frame; the original data.
dependent_variable: A character vector of length one; the dependent variable for this analysis.
algorithm: A character vector of length one; the algorithm to run on the data.
class: A character vector of length one; the class of the object.
family: A character vector of length one; the type of regression to run on the data. Choices are one of c("gaussian", "binomial"). Defaults to "gaussian".
resample: A function; the function for resampling the data.
preprocess: A function; the function for preprocessing the data.
measure: A function; the function for measuring the results.
exclude_variables: A character vector; the variables from the data set to exclude.
train_size: A numeric vector of length one; specifies what proportion of the data should be used for the training data set.
survival_rate_cutoff: A numeric vector of length one; for easy_glmnet, specifies the minimal threshold (as a percentage) a coefficient must appear out of n_samples.
n_samples: An integer vector of length one; specifies the number of times the coefficients and predictions should be generated.
n_divisions: An integer vector of length one; specifies the number of times the data should be divided when generating measures of model performance.
n_iterations: An integer vector of length one; during each division, specifies the number of times the predictions should be generated.
random_state: An integer vector of length one; specifies the seed to be used for the analysis.
progress_bar: A logical vector of length one; specifies whether to display a progress bar during calculations.
n_core: An integer vector of length one; specifies the number of cores to use for this analysis.
generate_coefficients: A logical vector of length one; whether or not to generate coefficients for this analysis.
generate_variable_importances: A logical vector of length one; whether or not to generate variable importances for this analysis.
generate_predictions: A logical vector of length one; whether or not to generate predictions for this analysis.
generate_model_performance: A logical vector of length one; whether or not to generate measures of model performance for this analysis.
model_args: A list; the arguments to be passed to the algorithm specified.
column_names: A character vector; the column names.
categorical_variables: A logical vector; the variables that are categorical.
X: A data.frame; the full dataset to be used for modeling.
y: A vector; the full response variable to be used for modeling.
coefficients: A (n_variables, n_samples) matrix; the generated coefficients.
coefficients_processed: A data.frame; the coefficients after being processed.
plot_coefficients_processed: A ggplot object; the plot of the processed coefficients.
X_train: A data.frame; the train dataset to be used for modeling.
X_test: A data.frame; the test dataset to be used for modeling.
y_train: A vector; the train response variable to be used for modeling.
y_test: A vector; the test response variable to be used for modeling.
predictions_train: A (nrow(X_train), n_samples) matrix; the train predictions.
predictions_test: A (nrow(X_test), n_samples) matrix; the test predictions.
predictions_train_mean: A vector; the mean train predictions.
predictions_test_mean: A vector; the mean test predictions.
plot_predictions: A function; the function for plotting predictions generated by the model.
plot_predictions_train_mean: A ggplot object; the plot of the mean train predictions.
plot_predictions_test_mean: A ggplot object; the plot of the mean test predictions.
model_performance_train: A vector of length n_divisions; the measures of model performance on the train datasets.
model_performance_test: A vector of length n_divisions; the measures of model performance on the test datasets.
plot_model_performance: A function; the function for plotting the measures of model performance.
plot_model_performance_train: A ggplot object; the plot of the measures of model performance on the train datasets.
plot_model_performance_test: A ggplot object; the plot of the measures of model performance on the test datasets.