View source: R/rmw_prepare_data.R
rmw_prepare_data | R Documentation |
rmw_prepare_data
will test and prepare a data frame for further use
with rmweather.
rmw_prepare_data(
df,
value = "value",
na.rm = FALSE,
replace = FALSE,
fraction = 0.8
)
df |
Input data frame. Generally a time series of air quality data with pollutant concentrations and meteorological variables. |
value |
Name of the dependent variable. Usually a pollutant, for example,
|
na.rm |
Should missing values ( |
replace |
When adding the date variables to the set, should they replace the versions already contained in the data frame if they exist? |
fraction |
Fraction of the observations to make up the training set. Default is 0.8, 80 %. |
rmw_prepare_data
will check if a date
variable is present and
is of the correct data type, impute missing numeric and categorical values,
randomly split the input into training and testing sets, and rename the
dependent variable to "value"
. The date
variable will also be
used to calculate new variables such as date_unix
, day_julian
,
weekday
, and hour
which can be used as independent variables.
These attributes are needed for other rmweather functions to operate.
Use set.seed
in an R session to keep results reproducible.
Tibble, the input data transformed ready for modelling with rmweather.
Stuart K. Grange
set.seed
, rmw_train_model
,
rmw_normalise
# Load package
library(dplyr)
# Keep things reproducible
set.seed(123)
# Prepare example data for modelling, only use no2 data here
data_london_prepared <- data_london %>%
filter(variable == "no2") %>%
rmw_prepare_data()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.