Description Usage Arguments Value
View source: R/new_workflows.R
A learning and prediction workflow that may deal with NAs and use internal validation to parametrize a re-sampling technique to balance an imbalanced regression problem.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | internal_workflow(
train,
test,
form,
model,
time,
site_id,
resample.grid,
resample.pars = NULL,
internal.est = NULL,
internal.est.pars = NULL,
internal.evaluator = "int_util_evaluate",
internal.eval.pars = NULL,
metrics = c("F1.u", "rmse_phi"),
metrics.max = c(TRUE, FALSE),
stat = "MED",
handleNAs = "centralImputNAs",
min_train = 2,
nORp = 0.2,
.int_parallel = FALSE,
.intRes = TRUE,
.full_intRes = FALSE,
...
)
|
train |
a data frame for training |
test |
a data frame for testing |
form |
a formula describing the model to learn |
model |
the name of the algorithm to use |
time |
the name of the column in |
site_id |
the name of the column in |
resample.grid |
a data.frame with columns indicating resample.pars to test using internal.est. Any NA value in resample.grid will have the argument set to NULL. |
resample.pars |
parameters to be passed to re-sample function. Default is NULL. |
internal.est |
character string identifying the internal estimator function to use |
internal.est.pars |
named list of internal estimator parameters (e.g., tr.perc or nfolds) |
internal.evaluator |
character string indicating internal evaluation function |
internal.eval.pars |
named list of parameters to feed to internal evaluation function |
metrics |
vector of names of two metrics to be used to determine the best parametrization (the second metric is only used in case of ties) |
metrics.max |
vector of Booleans indicating whether each metric in parameter metrics should be maximized (TRUE) or minimized (FALsE) for best results |
stat |
parameter indicating summary statistic that should be used to determine the best internal evaluation metric: "MED" (for median) or "MEAN" (for mean) |
handleNAs |
string indicating how to deal with NAs. If "centralImputNAs", training observations with at least 80% of non-NA columns, will have their NAs substituted by the mean value and testing observatiosn will have their NAs filled in with mean value regardless. Default is NULL. |
min_train |
a minimum number of observations that must be
left to train a model. If there are not enough observations,
predictions will be |
nORp |
a maximum number or fraction of columns/rows with missing
values above which a row/column will be removed from train before
learning the model. Only works if |
.int_parallel |
a Boolean indicating whether rows in the grid search should be tested in parallel |
.intRes |
a Boolean indicating whether the evalRes object outputed by internal validation should be returned. Defaults to TRUE |
.full_intRes |
a Boolean indicating whether the full results object for internal validation should be returned as well. Defaults to FALSE |
... |
other parameters to feed to |
a data frame containing time-stamps, location IDs, true values and predicted values
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.