Description Usage Arguments Value Examples
View source: R/make-standardizer.R
Create a data standardizer function using the relevant statistics in a input data frame.
1 |
x |
a |
The returned object is a function that will standarize all non-binary
variables using the means and SDs for relevant columns in the input x
data frame.
The point of doing it this way is that we need to record the relevant training data means and SDs so we can standardize test data at the prediction stage as well.
One option would be to record the trainin data means and SDs instead, but we will in any case use that to call another function that does the standardization, so we might as well do it now.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 | library("dplyr")
library("stats")
data("states")
# assume that the input consists entirely of features, and no NAs
train_x <- states %>%
filter(year < 2010) %>%
select(-starts_with("dv_"), -gwcode, -year) %>%
filter(complete.cases(.))
test_x <- states %>%
filter(year > 2009) %>%
select(-starts_with("dv_"), -gwcode, -year) %>%
filter(complete.cases(.))
standardize <- make_standardizer(train_x)
prepped_train_x <- standardize(train_x)
do.call(rbind,
lapply(prepped_train_x, function(x) {
data.frame(mean = mean(x), SD = sd(x))
}))
# the training data means and SDs are used for standardization
prepped_test_x <- standardize(test_x)
do.call(rbind,
lapply(prepped_test_x, function(x) {
data.frame(mean = mean(x), SD = sd(x))
}))
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.