cleanFeatures: Automated data cleaning

Description Usage Arguments Value Author(s) Examples

View source: R/cleanFeatures.R

Description

Cleans features in a dataset for machine learning purposes. Utilizes the edaFrame generated by exploreData. Cleaning involves imputation, clipping outliers and creating tracking features

Usage

1
2
3
cleanFeatures(x, feats, edaFrame, trackingFeats = TRUE,
  clipOutliers = TRUE, imputeMissing = TRUE, progress = FALSE,
  autoCode = TRUE)

Arguments

x

[data.frame | Required] Data.frame containing numeric features to transform

feats

[character vector | Required] Character vector of features to clean

edaFrame

[data.frame | Required] Data.frame object returned by exploreData function

trackingFeats

[logical | Optional] Should tracking features be created. Tracking features are binary features that keep track of data before changes have been applied to, useful for tree type models

clipOutliers

[logical | Optional] Should outliers be clipped using the method specified in the exploreData function

imputeMissing

[logical | Optional] Should features be imputed using median imputation for numerics and mode for categoricals

progress

[logical | Optional] Display progress

autoCode

[logical | Optional] Should code be generated when running the function

Value

List containing data.frame with cleaned features as well as code when autoCode is TRUE

Author(s)

Xander Horn

Examples

1
2
eda <- exploreData(iris)
cleaned <- cleanFeatures(x = iris, feats = names(iris), edaFrame = eda)

XanderHorn/autoML documentation built on Aug. 5, 2020, 11:45 a.m.