trim_df: Trim a Dataframe's Numeric Columns

Description Usage Arguments Details Value Figures Author(s) Examples

Description

Trim numeric dataframe variables using various methods.

Usage

1
trim_df(data, type, perc = NULL)

Arguments

data

The dataframe you want to trim.

type

A string value that specifies the technique to use to trim the variables. Current options are "iqr" or "1_99".

perc

A dataframe containing key percentile & interquartile range information. If not provided, the function will compute & use the percentile values of the data for trimming.

Details

If type="iqr", then for each numeric variable:

- Values below the 25th percentile by more than 1.5 x IQR are trimmed to be exactly 1.5 x IQR below the 25th percentile.

- Values above the 75th percentile by more than 1.5 x IQR are trimmed to be exactly 1.5 x IQR above the 75th percentile.


If type="1_99", then for each numeric variable:

- Values below the 1st percentile are trimmed to be exactly the value of the 1st percentile.

- Values above the 99th percentile are trimmed to be exactly the value of the 99th percentile.


Percentiles provided need to be in a dataframe where for each numeric variable in order:

1. The first row contains the 1st percentile values.

2. The second row contains the 25th percentile values.

3. The third row contains the 50th percentile (median) values. (Will not be used)

4. The fourth row contains the 75th percentile values.

5. The fifth row contains the 99th percentile values.

6. The sixth row contains the interquartile range values.

Value

Returns a dataframe where the numeric columns have trimmed values. Non-numeric data remains unchanged.

Figures

Author(s)

Andrew Kostandy (andrew.kostandy@gmail.com)

Examples

1
trim_df(iris, type = "iqr")

AndrewKostandy/MLtoolkit documentation built on May 7, 2019, 9:51 p.m.