df.clean: Remove Columns in a Data Frame Based on a Threshold

Description Usage Arguments Author(s) See Also Examples

Description

This function removes variables in a data frame if the values they contain have averages that are smaller/greater than a threshold, or if they are highly correlated with other variables.

Usage

1
2
df.clean(df=NULL, type="correlation", threshold=NULL,
        smaller=TRUE, index=NULL)

Arguments

df

a data frame or matrix upon which the removal process is applied.

type

a character indicating what type of removal criterion should be applied. Options include "correlation" and "mean".

threshold

numeric; if type="mean", for each column in the data frame, the mean is compared against the threshold; if type="correlation", the pair-wise correlation coefficients among the variables are correlated.

smaller

logical; for the use of type="mean" only. Default is TRUE. If set FALSE, columns with a mean greater than the threshold will be removed.

index

integers connected by a colon indicating which columns in the data frame should be considered. If ommitted, all columns are considered by default.

Author(s)

Zehua Wu

See Also

df.rm

Examples

1
2
3
data(US_Unemployment)
mydata<-df.clean(US_Unemployment, threshold=0.95,
        index=3:ncol(US_Unemployment))

google-trends-v1/gtm documentation built on June 5, 2019, 5:13 p.m.