rmw_do_all: Function to train a random forest model to predict (usually)...

View source: R/rmw_do_all.R

rmw_do_allR Documentation

Function to train a random forest model to predict (usually) pollutant concentrations using meteorological and time variables and then immediately normalise a variable for "average" meteorological conditions.

Description

rmw_do_all is a user-level function to conduct the meteorological normalisation process in one step.

Usage

rmw_do_all(
  df,
  variables,
  variables_sample = NA,
  n_trees = 300,
  min_node_size = 5,
  mtry = NULL,
  keep_inbag = TRUE,
  n_samples = 300,
  replace = TRUE,
  se = FALSE,
  aggregate = TRUE,
  n_cores = NA,
  verbose = FALSE
)

Arguments

df

Input data frame after preparation with rmw_prepare_data. df has a number of constraints which will be checked for before modelling.

variables

Independent/explanatory variables used to predict "value".

variables_sample

Variables to use for the normalisation step. If not used, the default of all variables used for training the model with the exception of date_unix, the trend term (see rmw_normalise).

n_trees

Number of trees to grow to make up the forest.

min_node_size

Minimal node size.

mtry

Number of variables to possibly split at in each node. Default is the (rounded down) square root of the number variables.

keep_inbag

Should in-bag data be kept in the ranger model object? This needs to be TRUE if standard errors are to be calculated when predicting with the model.

n_samples

Number of times to sample df and then predict?

replace

Should variables be sampled with replacement?

se

Should the standard error of the predictions be calculated too? The standard error method is the "infinitesimal jackknife for bagging" and will slow down the predictions significantly.

aggregate

Should all the n_samples predictions be aggregated?

n_cores

Number of CPU cores to use for the model calculation. Default is system's total minus one.

verbose

Should the function give messages?

Value

Named list.

Author(s)

Stuart K. Grange

See Also

rmw_prepare_data, rmw_train_model, rmw_normalise

Examples




# Load package
library(dplyr)

# Keep things reproducible
set.seed(123)

# Prepare example data
data_london_prepared <- data_london %>% 
  filter(variable == "no2") %>% 
  rmw_prepare_data()

# Use the example data to conduct the steps needed for meteorological
# normalisation
list_normalised <- rmw_do_all(
  df = data_london_prepared,
  variables = c(
    "ws", "wd", "air_temp", "rh", "date_unix", "day_julian", "weekday", "hour"
  ),
  n_trees = 300,
  n_samples = 300
)




skgrange/rmweather documentation built on Nov. 29, 2023, 2:39 a.m.