In NSAPH/airpred: A Framework For Building Air Pollution Estimation Models

knitr::opts_chunk$set(echo = TRUE)

Workflow Overview

In order to generate predicitons, we need to have previously trained models and have a data frame with the same variables and variable names as were used to train the

Configuration File

The prediction step uses many of the same settings as used in the model training step, most notably the training_output and training_models field as that determines where the package should look to find the stored models. The site_list field should also be updated with the coordinates of the locations for which predictions are being generated.

The new fields of importance to the prediction step are the following:

predict_data: The input data for a given round of prediction

predict_mid_process: The directory that holds all saved files generated in the prediction process.

predict_output: The directory that holds the generated predictions

Data Prep

Each data set should be prepared in a similar way to the dataset used for training. The most important variable that is required if the two stage modelling procedure is followed is a numeric column named site which contains an ID number running from 1 to the total number of sites in the dataset as this column is used to generate the nearby terms in conjunction with the list of sites provide in the site_list field.

As far as recreating the normalization, transformation, and imputation processes used there are two methods.

The recommended method is to call

load_predict_data()

which will use the files saved during the training cleaning process to repeat the calculations used then. The other option is to call

airpred.predict(prepped = F)

which will clean the data before generating predicitons. However, as this makes it so that the cleaning steps and prediction steps occur simultaneously, this is not recommended as it increases the processing time in case of errors at any point in the process.

Generating Predictions

Predictions are generated by calling airpred.predict(). The default assumes that the data is cleaned and stored in the predict_mid_process directory. THe data is read in, as are the h2o models saved during the training step and the model workflow is followed, ultimately generating and storing the predictions in a data.frame containing the detransformed and denormalized predictions, and the date and site number connected to the individual predictions.

NSAPH/airpred documentation built on May 7, 2020, 10:49 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com