Description Usage Arguments Details Correlation coefficient information See Also Examples
The correlate() compute Pearson's the correlation coefficient of the numerical data.
1 2 3 4 5 6 7 |
.data |
a data.frame or a |
method |
a character string indicating which correlation coefficient (or covariance) is to be computed. One of "pearson" (default), "kendall", or "spearman": can be abbreviated. |
... |
one or more unquoted expressions separated by commas. You can treat variable names like they are positions. Positive values select variables; negative values to drop variables. If the first expression is negative, correlate() will automatically start with all variables. These arguments are automatically quoted and evaluated in a context where column names represent column positions. They support unquoting and splicing. See vignette("EDA") for an introduction to these concepts. |
This function is useful when used with the group_by() function of the dplyr package.
If you want to compute by level of the categorical data you are interested in,
rather than the whole observation, you can use grouped_df
as the group_by() function.
This function is computed stats::cor() function by use = "pairwise.complete.obs" option.
The information derived from the numerical data compute is as follows.
var1 : names of numerical variable
var2 : name of the corresponding numeric variable
coef_corr : Pearson's correlation coefficient
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 | # Correlation coefficients of all numerical variables
correlate(heartfailure)
# Select the variable to compute
correlate(heartfailure, creatinine, sodium)
correlate(heartfailure, -creatinine, -sodium)
correlate(heartfailure, "creatinine", "sodium")
correlate(heartfailure, 1)
# Non-parametric correlation coefficient by kendall method
correlate(heartfailure, creatinine, method = "kendall")
# Using dplyr::grouped_dt
library(dplyr)
gdata <- group_by(heartfailure, smoking, death_event)
correlate(gdata, "creatinine")
correlate(gdata)
# Using pipes ---------------------------------
# Correlation coefficients of all numerical variables
heartfailure %>%
correlate()
# Positive values select variables
heartfailure %>%
correlate(creatinine, sodium)
# Negative values to drop variables
heartfailure %>%
correlate(-creatinine, -sodium)
# Positions values select variables
heartfailure %>%
correlate(1)
# Positions values select variables
heartfailure %>%
correlate(-1, -3, -5, -7)
# Non-parametric correlation coefficient by spearman method
heartfailure %>%
correlate(creatinine, sodium, method = "spearman")
# ---------------------------------------------
# Correlation coefficient
# that eliminates redundant combination of variables
heartfailure %>%
correlate() %>%
filter(as.integer(var1) > as.integer(var2))
heartfailure %>%
correlate(creatinine, sodium) %>%
filter(as.integer(var1) > as.integer(var2))
# Using pipes & dplyr -------------------------
# Compute the correlation coefficient of Sales variable by 'smoking'
# and 'death_event' variables. And extract only those with absolute
# value of correlation coefficient is greater than 0.2
heartfailure %>%
group_by(smoking, death_event) %>%
correlate(creatinine) %>%
filter(abs(coef_corr) >= 0.2)
# extract only those with 'smoking' variable level is "Yes",
# and compute the correlation coefficient of 'Sales' variable
# by 'hblood_pressure' and 'death_event' variables.
# And the correlation coefficient is negative and smaller than 0.5
heartfailure %>%
filter(smoking == "Yes") %>%
group_by(hblood_pressure, death_event) %>%
correlate(creatinine) %>%
filter(coef_corr < 0) %>%
filter(abs(coef_corr) > 0.5)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.