correlate.data.frame: Compute the correlation coefficient between two numerical...

Description Usage Arguments Details Correlation coefficient information See Also Examples

Description

The correlate() compute Pearson's the correlation coefficient of the numerical data.

Usage

1
2
3
4
5
6
7
correlate(.data, ...)

## S3 method for class 'data.frame'
correlate(.data, ..., method = c("pearson", "kendall", "spearman"))

## S3 method for class 'grouped_df'
correlate(.data, ..., method = c("pearson", "kendall", "spearman"))

Arguments

.data

a data.frame or a tbl_df.

method

a character string indicating which correlation coefficient (or covariance) is to be computed. One of "pearson" (default), "kendall", or "spearman": can be abbreviated.

...

one or more unquoted expressions separated by commas. You can treat variable names like they are positions. Positive values select variables; negative values to drop variables. If the first expression is negative, correlate() will automatically start with all variables. These arguments are automatically quoted and evaluated in a context where column names represent column positions. They support unquoting and splicing.

See vignette("EDA") for an introduction to these concepts.

Details

This function is useful when used with the group_by() function of the dplyr package. If you want to compute by level of the categorical data you are interested in, rather than the whole observation, you can use grouped_df as the group_by() function. This function is computed stats::cor() function by use = "pairwise.complete.obs" option.

Correlation coefficient information

The information derived from the numerical data compute is as follows.

See Also

cor, correlate.tbl_dbi.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
# Correlation coefficients of all numerical variables
correlate(heartfailure)

# Select the variable to compute
correlate(heartfailure, creatinine, sodium)
correlate(heartfailure, -creatinine, -sodium)
correlate(heartfailure, "creatinine", "sodium")
correlate(heartfailure, 1)
# Non-parametric correlation coefficient by kendall method
correlate(heartfailure, creatinine, method = "kendall")
 
# Using dplyr::grouped_dt
library(dplyr)

gdata <- group_by(heartfailure, smoking, death_event)
correlate(gdata, "creatinine")
correlate(gdata)

# Using pipes ---------------------------------
# Correlation coefficients of all numerical variables
heartfailure %>%
  correlate()
# Positive values select variables
heartfailure %>%
  correlate(creatinine, sodium)
# Negative values to drop variables
heartfailure %>%
  correlate(-creatinine, -sodium)
# Positions values select variables
heartfailure %>%
  correlate(1)
# Positions values select variables
heartfailure %>%
  correlate(-1, -3, -5, -7)
# Non-parametric correlation coefficient by spearman method
heartfailure %>%
  correlate(creatinine, sodium, method = "spearman")
 
# ---------------------------------------------
# Correlation coefficient
# that eliminates redundant combination of variables
heartfailure %>%
  correlate() %>%
  filter(as.integer(var1) > as.integer(var2))

heartfailure %>%
  correlate(creatinine, sodium) %>%
  filter(as.integer(var1) > as.integer(var2))

# Using pipes & dplyr -------------------------
# Compute the correlation coefficient of Sales variable by 'smoking'
# and 'death_event' variables. And extract only those with absolute
# value of correlation coefficient is greater than 0.2
heartfailure %>%
  group_by(smoking, death_event) %>%
  correlate(creatinine) %>%
  filter(abs(coef_corr) >= 0.2)

# extract only those with 'smoking' variable level is "Yes",
# and compute the correlation coefficient of 'Sales' variable
# by 'hblood_pressure' and 'death_event' variables.
# And the correlation coefficient is negative and smaller than 0.5
heartfailure %>%
  filter(smoking == "Yes") %>%
  group_by(hblood_pressure, death_event) %>%
  correlate(creatinine) %>%
  filter(coef_corr < 0) %>%
  filter(abs(coef_corr) > 0.5)
  

bit2r/kodlookr documentation built on Dec. 19, 2021, 9:49 a.m.