rmw_prepare_data: Function to prepare a data frame for modelling with...

View source: R/rmw_prepare_data.R

rmw_prepare_dataR Documentation

Function to prepare a data frame for modelling with rmweather.

Description

rmw_prepare_data will test and prepare a data frame for further use with rmweather.

Usage

rmw_prepare_data(
  df,
  value = "value",
  na.rm = FALSE,
  replace = FALSE,
  fraction = 0.8
)

Arguments

df

Input data frame. Generally a time series of air quality data with pollutant concentrations and meteorological variables.

value

Name of the dependent variable. Usually a pollutant, for example, "no2" or "pm10".

na.rm

Should missing values (NA) be removed from value?

replace

When adding the date variables to the set, should they replace the versions already contained in the data frame if they exist?

fraction

Fraction of the observations to make up the training set. Default is 0.8, 80 %.

Details

rmw_prepare_data will check if a date variable is present and is of the correct data type, impute missing numeric and categorical values, randomly split the input into training and testing sets, and rename the dependent variable to "value". The date variable will also be used to calculate new variables such as date_unix, day_julian, weekday, and hour which can be used as independent variables. These attributes are needed for other rmweather functions to operate.

Use set.seed in an R session to keep results reproducible.

Value

Tibble, the input data transformed ready for modelling with rmweather.

Author(s)

Stuart K. Grange

See Also

set.seed, rmw_train_model, rmw_normalise

Examples


# Load package
library(dplyr)

# Keep things reproducible
set.seed(123)

# Prepare example data for modelling, only use no2 data here
data_london_prepared <- data_london %>% 
  filter(variable == "no2") %>% 
  rmw_prepare_data()


rmweather documentation built on June 22, 2024, 9:33 a.m.