get_top_corrs: Get the correlation of variables in a dataset with a given...

Description Usage Arguments Details Value See Also Examples

Description

This function computes the correlation of each input variable in a dataframe with a given response variable and returns a dataframe listing the variables sorted in order of most to least correlated. NAs are removed from correlation computations, and only numeric variables are considered.

Usage

1
get_top_corrs(dat, response_var, parallel = FALSE)

Arguments

dat

a tbl

response_var

character string containing the name of a variable in dat that you would like the correlations to be computed with, or an integer specifying the position of this variable

parallel

logical. If TRUE, parallel foreach is used for computing correlations (if FALSE, single threaded foreach is used; still highly efficient). Default is FALSE.

Details

Use this technique for filtering out variables in the initial stages of data analysis, to get more familiar with how the individual input variables relate to the response variable of interest. Not recommended as a formal variable selection technique, since it will ignore interactions between inputs.

Value

a tbl with two columns: var_name gives the name of each variable and correlation gives its correlation with response_var.

See Also

Other descriptive: proc_freq

Examples

1
2
x <- iris
get_top_corrs(x,"Petal.Length")

awstringer/modellingTools documentation built on May 11, 2019, 4:11 p.m.