Description Usage Arguments Value Author(s) Examples
k_folds cross-validation. Except for pTraining and validation split (replaced by k_folds), all inputs are the same as kms(). See ?kms
1 2 3 4 5 6 7 8 9 10 11 | kms_kcv(input_formula, data, keras_model_seq = NULL, N_layers = 3,
units = c(256, 128), activation = c("relu", "relu", "softmax"),
dropout = 0.4, use_bias = TRUE, kernel_initializer = NULL,
kernel_regularizer = "regularizer_l1",
bias_regularizer = "regularizer_l1",
activity_regularizer = "regularizer_l1", embedding = FALSE,
k_folds = 5, Nepochs = 15, batch_size = NULL, loss = NULL,
metrics = NULL, optimizer = "optimizer_adam",
scale_continuous = "zero_one", drop_intercept = TRUE,
sparse_data = FALSE, seed = list(seed = NULL, disable_gpu = FALSE,
disable_parallel_cpu = FALSE), verbose = 1, ...)
|
input_formula |
an object of class "formula" (or one coerceable to a formula): a symbolic description of the keras inputs. "mpg ~ cylinders". kms treats numeric data with more than two distinct values a continuous outcome for which a regression-style model is fit. Factors and character variables are classified; to force classification, "as.factor(cyl) ~ .". |
data |
a data.frame. |
keras_model_seq |
A compiled Keras sequential model. If non-NULL (NULL is the default), then bypasses the following 'kms' parameters: N_layers, units, activation, dropout, use_bias, kernel_initializer, kernel_regularizer, bias_regularizer, activity_regularizer, loss, metrics, and optimizer. |
N_layers |
How many layers in the model? Default == 3. Subsequent parameters (units, activation, dropout, use_bias, kernel_initializer, kernel_regularizer, bias_regularizer, and activity_regularizer) may be inputted as vectors that are of length N_layers (or N_layers - 1 for units and dropout). The length of those vectors may also be length 1 or a multiple of N_layers (or N_layers - 1 for units and dropout). |
units |
How many units in each layer? The final number of units will be added based on whether regression or classification is being done. Should be length 1, length N_layers - 1, or something that can be repeated to form a length N_layers - 1 vector. Default is c(256, 128). |
activation |
Activation function for each layer, starting with the input. Default: c("relu", "relu", "softmax"). Should be length 1, length N_layers, or something that can be repeated to form a length N_layers vector. |
dropout |
Dropout rate for each layer, starting with the input. Not applicable to final layer. Default: c(0.4, 0.3). Should be length 1, length N_layers - 1, or something that can be repeated to form a length N_layers - 1 vector. |
use_bias |
See ?keras::use_bias. Default: TRUE. Should be length 1, length N_layers, or something that can be repeated to form a length N_layers vector. |
kernel_initializer |
Defaults to "glorot_uniform" for classification and "glorot_normal" for regression (but either can be inputted). Should be length 1, length N_layers, or something that can be repeated to form a length N_layers vector. |
kernel_regularizer |
Must be precisely either "regularizer_l1", "regularizer_l2", or "regulizer_l1_l2". Default: "regularizer_l1". Should be length 1, length N_layers, or something that can be repeated to form a length N_layers vector. |
bias_regularizer |
Must be precisely either "regularizer_l1", "regularizer_l2", or "regulizer_l1_l2". Default: "regularizer_l1". Should be length 1, length N_layers, or something that can be repeated to form a length N_layers vector. |
activity_regularizer |
Must be precisely either "regularizer_l1", "regularizer_l2", or "regulizer_l1_l2". Default: "regularizer_l1". Should be length 1, length N_layers, or something that can be repeated to form a length N_layers vector. |
embedding |
If TRUE, the first layer will be an embedding with the number of output dimensions determined by 'units' (so to speak, that means there will really be N_layers + 1). Note input 'kernel_regularizer' is passed on as the 'embedding_regularizer'. Note pad_sequences() may be used as part of the input_formula and you may wish to set scale_continuous to NULL. See ?layer_embedding. |
k_folds |
Number of folds. For example, if k_folds == 5 (default), the data are split into 80% training, 20% testing (five times). |
Nepochs |
Number of epochs; default == 15. To be passed to keras::fit. |
batch_size |
Default batch size is 32 unless emedding == TRUE in which case batch size is 1. (Smaller eases memory issues but may affect ability of optimizer to find global minimum). To be passed to several functions library(keras) functions like fit(), predict_classes(), and layer_embedding(). If embedding==TRUE, number of training obs must be a multiple of batch size. |
loss |
To be passed to keras::compile. Defaults to "binary_crossentropy", "categorical_crossentropy", or "mean_squared_error" based on input_formula and data. |
metrics |
Additional metric(s) beyond the loss function to be passed to keras::compile. Defaults to "mean_absolute_error" and "mean_absolute_percentage_error" for continuous and c("accuracy") for binary/categorical (as well whether whether examples are correctly classified in one of the top five most popular categories or not if the number of categories K > 20). |
optimizer |
To be passed to keras::compile. Defaults to "optimizer_adam", an algorithm for first-order gradient-based optimization of stochastic objective functions introduced by Kingma and Ba (2015) here: https://arxiv.org/pdf/1412.6980v8.pdf. |
scale_continuous |
How to scale each non-binary column of the training data (and, if y is continuous, the outcome). The default 'scale_continuous = 'zero_one” places each non-binary column of the training model matrix on [0, 1]; 'scale_continuous = z' standardizes; 'scale_continuous = NULL' leaves the data on its original scale. |
drop_intercept |
TRUE by default. |
sparse_data |
Default == FALSE. If TRUE, X is constructed by sparse.model.matrix() instead of model.matrix(). Recommended to improve memory usage if there are a large number of categorical variables or a few categorical variables with a large number of levels. May compromise speed, particularly if X is mostly numeric. |
seed |
Integer vector of length k_folds or list containing k_folds-length seed vector to be passed to the sources of variation: R, Python's Numpy, and Tensorflow. If seed is NULL, automatically generated. Note setting seed ensures data will be partitioned in the same way but to ensure identical results, set disable_gpu = TRUE and disable_parallel_cpu = TRUE. Wrapper for use_session_with_seed(), which is to be called before compiling by the user if a compiled Keras model is passed into kms. See also see https://stackoverflow.com/questions/42022950/. |
verbose |
Default == 1. Setting to 0 disables progress bar and epoch-by-epoch plots (disabling them is recommended for knitting RMarkdowns if X11 not installed). |
... |
Additional parameters to be passsed to Matrix::sparse.model.matrix. |
An kms_kcv_fit object; nested list containing train and test estimates produced by kms() and predict.kms(), respectively.
Pete Mohanty
1 2 3 4 5 6 7 8 9 10 |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.