Description Usage Arguments Details Value See Also Examples
This function builds a collection of models from a single input dataset. It can handle either classification or regression data; that is, either categorical or continuous data.
1 2 3 |
data |
the input data frame, see |
modelTypes |
a character vector of model types to generate; one or more of |
fx |
(optional) a formula object specifying the variable relationships; will be generated from x and y if unspecified. |
x |
(optional) vector names of 'predictor' variables to use; defaults to all columns less the y variable; defaults to all columns other than y if fx is also not provided. |
y |
(optional) the name of the column of the 'response' variable; defaults to first column if fx is also not provided. It can be either categorical or continuous data, and it will attempt to coerce vectors of unknown types (e.g. boolean) into one of these two groups, albeit in a rather rudimentary fashion. If it cannot succeed it will complain. |
grouping |
(optional) a transformation vector for input classes; if not provided, no grouping will be used. See
|
echo |
(optional) should the function report it's progress? Defaults to TRUE, but useful for automation. |
rf.args |
(optional) a list of arguments to pass to random forest type models; defaults will be generated for unspecified values. |
nn.args |
(optional) a list of arguments to pass to nearest neighbour type models; defaults will be generated for unspecified values. |
gbm.args |
(optional) a list of arguments to pass to gbm; defaults will be generated for unspecified values. |
In the most basic sense, this function is a loop wrapping the code to generate a model. However, it also standardizes the inputs for all the model packages and generates meaningful default arguments for all the supported packages. It is possible to pass the function either a formula object, or a list of x and y names from which to generate the models—it will compute whichever is not specified.
The various arguments are the most complex part of this function. Reasonably meaningful default values are generated within the function, but the user always has the option to override them. In most cases it is likely there will be at least a few arguments that will need to be provided. The argument lists are divided up by model type, not package:
Random Forest—currently: randomForest, and randomForestSRC.
mtry = floor(sqrt(length(x)))
the two different implementation of random forests, while they specify that they compute
the number of variables to use at each node split the same way, actually arrive at different answers internally—that is, given
the defaults, they do not generate the same output. By specifying it here, using the same formula they specify as the default, it
is possible ensure that they are doing the same thing.
importance = ‘permute’
one of the benefits of random forests is that it is relatively easy to compute a
variable importance metric (VIMP). While only randomForestSRC currently allows multiple options for methods, these options
can be specified here (including ‘none
’ and the arguments for randomForest will be generated automatically.
na.action = na.omit
what to do when na values are encountered.
proximity = FALSE
should proximity information be computed; see packages for more help.
Nearest Neighbour—currently: FNN, class, and kknn.
k = 2
the number of neighbours considered (for FNN and class).
kernel = ‘rectangular’
the kknn package allows the selection of different kernel functions as to how to weight
the distance metric—this specifies which to use. It is possible to use more than one and it will optimize over them all.
scale = TRUE
should we scale the data before running the model fit.
GBM—currently: gbm
n.trees = 1000
the maximum number of trees to grow. Note that this is not the optimal number of trees! This is an
overfit model; use gbm.perf
to find the optimal model.
keep.data = TRUE
should the data be embedded in the model. Since other methods in this package need the data. This also
prevents the data from potentially being stored twice.
A named list of models with attributes specifying the data, the function used, and the class.
See the package help NPEL.Classification for an overview of the analysis process.
For reading-in model data: readTile
, readShapePoints
, and extractPoints
; or the
raster package help for reading-in raster files directly.
For examples on computing derived raster variables, e.g. NDVI, slope, etc. see the example code in egTile
For examples on what to do with the generated models see: modelAccs
, writeTile
, and plotTile
Also see any of the supported packages, currently: randomForest, randomForestSRC, FNN, class, kknn, and gbm.
1 2 3 4 5 6 7 |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.