View source: R/preprocess_data.R

Generates a tibble with features optimized for machine learning

1 2 3 | ```
preprocess_data(x, target = "Truth", reduce_cols = FALSE,
factor_y = TRUE, impute = "zero", corr_cutoff = 0.9,
freq_cut = 95/5, unique_cut = 10, k = 10)
``` |

`x` |
data frame or tibble. |

`target` |
classifier column |

`reduce_cols` |
TRUE = Columns are reduced based on near zero variance and correlation; FALSE = Nothing |

`factor_y` |
FALSE = Recodes pred to 0 and 1; TRUE = Recodes pred to factor |

`impute` |
Impute NA by "knn","mean","zero" |

`corr_cutoff` |
Corelation coefficient level to cut off highly correlated columns, devaulted to .90 |

`freq_cut` |
the cutoff for the ratio of the most common value to the second most common value |

`unique_cut` |
the cutoff for the percentage of distinct values out of the number of total samples (knn takes substantially longer to compute, zero replaces NA with 0) |

`k` |
the number of nearest neighbours to use for imputate (defaults to 10) |

Details

This function returns a `tibble`

of optimized features

"Dallin Webb <[email protected]>"

preProcess

