Description Usage Arguments Details Examples
View source: R/random_rotation.R
This function takes a data frame and performs the following tasks: (1) For each numeric column, it creates a ranking function based only on in-sample data (2) It applies this function to all numeric columns and to both in- and out-of-sample data (could also be applied to online data) (3) For each numeric column, it computes the median of only in-sample data (4) It imputes missing values in numeric columns with these in-sample medians (could also be applied to online data)
1 | post_sample_preprocessing(Xpre, Ypre, r_train, r_test)
|
Xpre |
data frame |
Ypre |
data frame, currently unused (but will make it easier later to add processing that depends on it) |
r_train |
vector of in-sample indices into the data frames |
r_test |
vector of out-of-sample indices into the data frames, currently unused |
Additional pre-processing functions could be added here. For now, Ypre and r_test are not used.
It should be noted that these steps occur AFTER the file is split into training and testing data. As long as only in-sample data is used to create the transformations, there is not bias when training a classifier.
Also see: pre_sample_preprocessing() for a function that already gets called BEFORE the file is split into training and testing data.
The trade-off is speed vs bias. Putting everything here leads to slower run-times but without any potential for bias and vice versa.
1 2 | r_train <- generate_training_row_indices(nrow(df), 0.6)
post_sample_preprocessing <- function(dfX, dfY, r_train, 0)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.