Description Usage Arguments Value Functions
View source: R/filter_features.R
Filters columns (features) of Cell Profiler data to exclude any columns with NA/NaNs, low variance or are highly correlated.
1 2 3 4 5 | filter_NA(df, NA_cutoff, out_cols = FALSE)
filter_lowVar(df, freqCut = 95/5, uniqueCut = 10, out_cols = FALSE)
filter_cor(df, cor_cutoff, out_cols = FALSE)
|
df |
Dataframe of Cell Profiler data to be filtered. |
NA_cutoff |
Optional argument. Columns with a total number of NAs or NaNs greater than the NA_cutoff number will be removed. |
out_cols |
Single logical indicating whether the filtered column names should be output as well. |
freqCut |
Passed to |
uniqueCut |
Passed to |
A dataframe with the filtered columns removed. If out_cols is set to TRUE, a list of 2, where the 1st element is the filtered dataframe and the 2nd element is a vector of the filtered column names.
filter_NA
: Filters columns with any NA/NaN's or a total number
above a certain number if 'NA_cutoff' given.
filter_lowVar
: Filters columns with low variance (e.g. if entire
column consists of the same value) using the caret::nearZeroVar
function. See
caret
documentation for details on this function.
filter_cor
: Calculates the Pearson correlation matrix for the
'Median' columns then filters columns that are highly correlated to each
other using caret::findCorrelation
. Note that you should run
filter_lowVar
on your data before filtering for high correlations.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.