listTopCorrelatedVariables: List the variables that are highly correlated with each other

Description Usage Arguments Value Author(s) Examples

View source: R/listTopCorrelatedVariables.R

Description

This function computes the Pearson, Spearman, or Kendall correlation for each specified variable in the data set and returns a list of the variables that are correlated to them. It also provides a short variable list without the highly correlated variables.

Usage

1
2
3
4
5
	listTopCorrelatedVariables(variableList,
	                           data,
	                           pvalue = 0.001,
	                           corthreshold = 0.9,
	                           method = c("pearson", "kendall", "spearman"))

Arguments

variableList

A data frame with two columns. The first one must have the names of the candidate variables and the other one the description of such variables

data

A data frame where all variables are stored in different columns

pvalue

The maximum p-value, associated to method, allowed for a pair of variables to be defined as significantly correlated

corthreshold

The minimum correlation score, associated to method, allowed for a pair of variables to be defined as significantly correlated

method

Correlation method: Pearson product-moment ("pearson"), Spearman's rank ("spearman"), or Kendall rank ("kendall")

Value

correlated.variables

A data frame with two columns:

  1. cor.var.names: The variables that are correlated

  2. cor.var.value: The correlation value

short.list

A vector with a list of variables that are not correlated to each other. For every correlated pair, only the variable that first entered the correlation analysis was kept

Author(s)

Jose G. Tamez-Pena and Antonio Martinez-Torteya

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
	## Not run: 
	# Start the graphics device driver to save all plots in a pdf format
	pdf(file = "Example.pdf")
	# Get the stage C prostate cancer data from the rpart package
	library(rpart)
	data(stagec)
	# Split the stages into several columns
	dataCancer <- cbind(stagec[,c(1:3,5:6)],
	                    gleason4 = 1*(stagec[,7] == 4),
	                    gleason5 = 1*(stagec[,7] == 5),
	                    gleason6 = 1*(stagec[,7] == 6),
	                    gleason7 = 1*(stagec[,7] == 7),
	                    gleason8 = 1*(stagec[,7] == 8),
	                    gleason910 = 1*(stagec[,7] >= 9),
	                    eet = 1*(stagec[,4] == 2),
	                    diploid = 1*(stagec[,8] == "diploid"),
	                    tetraploid = 1*(stagec[,8] == "tetraploid"),
	                    notAneuploid = 1-1*(stagec[,8] == "aneuploid"))
	# Remove the incomplete cases
	dataCancer <- dataCancer[complete.cases(dataCancer),]
	# Load a pre-stablished data frame with the names and descriptions of all variables
	data(cancerVarNames)
	# Get the variables that have a correlation coefficient larger 
	# than 0.65 at a p-value of 0.05
	cor <- listTopCorrelatedVariables(variableList = cancerVarNames,
	                                  data = dataCancer,
	                                  pvalue = 0.05,
	                                  corthreshold = 0.65,
	                                  method = "pearson")
	# Shut down the graphics device driver
	dev.off()
## End(Not run)

FRESA.CAD documentation built on Jan. 13, 2021, 3:39 p.m.