make_standardizer: Make data standardizer

Description Usage Arguments Value Examples

View source: R/make-standardizer.R

Description

Create a data standardizer function using the relevant statistics in a input data frame.

Usage

1

Arguments

x

a base::data.frame() or similar object

Value

The returned object is a function that will standarize all non-binary variables using the means and SDs for relevant columns in the input x data frame.

The point of doing it this way is that we need to record the relevant training data means and SDs so we can standardize test data at the prediction stage as well.

One option would be to record the trainin data means and SDs instead, but we will in any case use that to call another function that does the standardization, so we might as well do it now.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
library("dplyr")
library("stats")
data("states")

# assume that the input consists entirely of features, and no NAs
train_x <- states %>%
  filter(year < 2010) %>%
  select(-starts_with("dv_"), -gwcode, -year) %>%
  filter(complete.cases(.))

test_x <- states %>%
  filter(year > 2009) %>%
  select(-starts_with("dv_"), -gwcode, -year) %>%
  filter(complete.cases(.))

standardize <- make_standardizer(train_x)
prepped_train_x <- standardize(train_x)

do.call(rbind,
        lapply(prepped_train_x, function(x) {
          data.frame(mean = mean(x), SD = sd(x))
        }))

# the training data means and SDs are used for standardization
prepped_test_x <- standardize(test_x)
do.call(rbind,
        lapply(prepped_test_x, function(x) {
          data.frame(mean = mean(x), SD = sd(x))
}))

andybega/demspaces documentation built on April 18, 2021, 11:05 p.m.