rmw_train_model: Function to train a random forest model to predict (usually)...

View source: R/rmw_train_model.R

rmw_train_modelR Documentation

Function to train a random forest model to predict (usually) pollutant concentrations using meteorological and time variables.

Description

Function to train a random forest model to predict (usually) pollutant concentrations using meteorological and time variables.

Usage

rmw_train_model(
  df,
  variables,
  n_trees = 300,
  mtry = NULL,
  min_node_size = 5,
  keep_inbag = TRUE,
  n_cores = NA,
  verbose = FALSE
)

Arguments

df

Input tibble after preparation with rmw_prepare_data. df has a number of constraints which will be checked for before modelling.

variables

Independent/explanatory variables used to predict "value".

n_trees

Number of trees to grow to make up the forest.

mtry

Number of variables to possibly split at in each node. Default is the (rounded down) square root of the number variables.

min_node_size

Minimal node size.

keep_inbag

Should in-bag data be kept in the ranger model object? This needs to be TRUE if standard errors are to be calculated when predicting with the model.

n_cores

Number of CPU cores to use for the model calculation. Default is system's total minus one.

verbose

Should the function give messages?

Value

A ranger model object, a named list.

Author(s)

Stuart K. Grange

See Also

rmw_prepare_data, rmw_normalise

Examples




# Load package
library(dplyr)

# Keep things reproducible
set.seed(123)

# Prepare example data
data_london_prepared <- data_london %>% 
  filter(variable == "no2") %>% 
  rmw_prepare_data()

# Calculate a model using common meteorological and time variables
model <- rmw_train_model(
  data_london_prepared,
  variables = c(
    "ws", "wd", "air_temp", "rh", "date_unix", "day_julian", "weekday", "hour"
  ),
  n_trees = 300
)




skgrange/rmweather documentation built on Nov. 29, 2023, 2:39 a.m.