rmw_prepare_data: Function to prepare a data frame for modelling with...
In skgrange/rmweather: Tools to Conduct Meteorological Normalisation and Counterfactual Modelling for Air Quality Data

rmw_prepare_data

R Documentation

Function to prepare a data frame for modelling with rmweather.

Description

rmw_prepare_data will test and prepare a data frame for further use with rmweather.

Usage

rmw_prepare_data(
  df,
  value = "value",
  na.rm = FALSE,
  replace = FALSE,
  fraction = 0.8
)

Arguments

`df`	Input data frame. Generally a time series of air quality data with pollutant concentrations and meteorological variables.
`value`	Name of the dependent variable. Usually a pollutant, for example, `"no2"` or `"pm10"`.
`na.rm`	Should missing values (`NA`) be removed from `value`?
`replace`	When adding the date variables to the set, should they replace the versions already contained in the data frame if they exist?
`fraction`	Fraction of the observations to make up the training set. Default is 0.8, 80 %.

Details

rmw_prepare_data will check if a date variable is present and is of the correct data type, impute missing numeric and categorical values, randomly split the input into training and testing sets, and rename the dependent variable to "value". The date variable will also be used to calculate new variables such as date_unix, day_julian, weekday, and hour which can be used as independent variables. These attributes are needed for other rmweather functions to operate.

Use set.seed in an R session to keep results reproducible.

Value

Tibble, the input data transformed ready for modelling with rmweather.

Author(s)

Stuart K. Grange

Examples


# Load package
library(dplyr)

# Keep things reproducible
set.seed(123)

# Prepare example data for modelling, only use no2 data here
data_london_prepared <- data_london %>% 
  filter(variable == "no2") %>% 
  rmw_prepare_data()

skgrange/rmweather documentation built on July 4, 2025, 8:28 p.m.