mzrf: Random Forests Analysis of LCMS data.

Description Usage Arguments Details Value Note Author(s) See Also

View source: R/mzrf.R

Description

mzrf() was used to perform the random forests analysis of the LCMS data (./data/mzdata.rda)

Usage

1
2
3
mzrf(parallel = TRUE, save.model = FALSE, view.plot = TRUE,
  save.plot = FALSE, plot.name = "mzrf_cv_plot",
  model.name = "mzrf_model", seed = 1978, pred.results = TRUE, ...)

Arguments

parallel

Logical indicating if parallel processing should be used.

save.model

Logical indicating if the model should be saved.

view.plot

Logical indicating if a plot of accuracy vs mtry should be printed to the plot viewer.

save.plot

Logical indicating if plot should be saved to a .pdf in the ./figs directory.

plot.name

Name of plot if save.plot = TRUE.

model.name

Name of model if save.model = TRUE.

seed

An integer for setting the RNG state.

pred.results

Logical indicating if the results of predicting the test data should be printed to the console.

...

Other arguments passed on to individual methods.

Details

mzrf() loads mzdata and performs a RF analysis of the data using mzdata$class as outcomes. The process is outlined as follows:

  1. The data is split into training and test sets using an 80:20 stratified split according to class and day mzdata$class_day.

  2. A list of random seeds is produced for each iteration of the CV process. For the 10-fold, repeated (3 times) CV used here, we require 10 * 3 seeds for each mtry value assesed (tune grid length).

  3. Define a tuning grid of mtry values. In this case we assess mtry values c(25, 75, 100, seq(from = 100, to = 500, by = 50)) giving a tunegrid length of 12.

  4. Define the CV parameters. 10 folds, 3 repeats, default summary. We also define the method for selecting the best tune. In this case, the best tune is the simplest model within one standard error of the empirically optimal model. This rule, as described by Breiman et al. (1984), may avoid overfitting the model. Note that k-fold CV as performed using trainControl(method = "repeatedcv") stratifies sampling according to class.

  5. The data is centred by subtracting the mean of the predictor's data from the predictor values

  6. The data is scaled by dividing the predictor's by the standard deviation.

  7. The model is run.

Value

returns a list with class train.

Note

Although this function is exported, mzrf() was not intended to be used outside of this package.

Author(s)

Benjamin R. Gordon

See Also

train ggplot The caret Package by Max Kuhn (2017)


brgordon17/coralclass documentation built on June 15, 2020, 9:21 p.m.