filter_NA: Filter features

Description Usage Arguments Value Functions

View source: R/filter_features.R

Description

Filters columns (features) of Cell Profiler data to exclude any columns with NA/NaNs, low variance or are highly correlated.

Usage

1
2
3
4
5
filter_NA(df, NA_cutoff, out_cols = FALSE)

filter_lowVar(df, freqCut = 95/5, uniqueCut = 10, out_cols = FALSE)

filter_cor(df, cor_cutoff, out_cols = FALSE)

Arguments

df

Dataframe of Cell Profiler data to be filtered.

NA_cutoff

Optional argument. Columns with a total number of NAs or NaNs greater than the NA_cutoff number will be removed.

out_cols

Single logical indicating whether the filtered column names should be output as well.

freqCut

Passed to caret::nearZeroVar. The cutoff for the ratio of the most common value to the second most common value.

uniqueCut

Passed to caret::nearZeroVar. The cutoff for the percentage of distinct values out of the number of total samples.

Value

A dataframe with the filtered columns removed. If out_cols is set to TRUE, a list of 2, where the 1st element is the filtered dataframe and the 2nd element is a vector of the filtered column names.

Functions


lucyleeow/CellProfileR documentation built on May 21, 2019, 2:30 a.m.