compare_numeric.data.frame: Compare numerical variables

View source: R/compare.R

compare_numericR Documentation

Compare numerical variables

Description

The compare_numeric() compute information to examine the relationship between numerical variables.

Usage

compare_numeric(.data, ...)

## S3 method for class 'data.frame'
compare_numeric(.data, ...)

Arguments

.data

a data.frame or a tbl_df.

...

one or more unquoted expressions separated by commas. You can treat variable names like they are positions. Positive values select variables; negative values to drop variables. These arguments are automatically quoted and evaluated in a context where column names represent column positions. They support unquoting and splicing.

Details

It is important to understand the relationship between numerical variables in EDA. compare_numeric() compares relations by pair combination of all numerical variables. and return compare_numeric class that based list object.

Value

An object of the class as compare based list. The information to examine the relationship between numerical variables is as follows each components. - correlation component : Pearson's correlation coefficient.

  • var1 : factor. The level of the first variable to compare. 'var1' is the name of the first variable to be compared.

  • var2 : factor. The level of the second variable to compare. 'var2' is the name of the second variable to be compared.

  • coef_corr : double. Pearson's correlation coefficient.

- linear component : linear model summaries

  • var1 : factor. The level of the first variable to compare. 'var1' is the name of the first variable to be compared.

  • var2 : factor.The level of the second variable to compare. 'var2' is the name of the second variable to be compared.

  • r.squared : double. The percent of variance explained by the model.

  • adj.r.squared : double. r.squared adjusted based on the degrees of freedom.

  • sigma : double. The square root of the estimated residual variance.

  • statistic : double. F-statistic.

  • p.value : double. p-value from the F test, describing whether the full regression is significant.

  • df : integer degrees of freedom.

  • logLik : double. the log-likelihood of data under the model.

  • AIC : double. the Akaike Information Criterion.

  • BIC : double. the Bayesian Information Criterion.

  • deviance : double. deviance.

  • df.residual : integer residual degrees of freedom.

Attributes of return object

Attributes of compare_numeric class is as follows.

  • raw : a data.frame or a tbl_df. Data containing variables to be compared. Save it for visualization with plot.compare_numeric().

  • variables : character. List of variables selected for comparison.

  • combination : matrix. It consists of pairs of variables to compare.

See Also

correlate, summary.compare_numeric, print.compare_numeric, plot.compare_numeric.

Examples


# Generate data for the example
heartfailure2 <- heartfailure[, c("platelets", "creatinine", "sodium")]

library(dplyr)
# Compare the all numerical variables
all_var <- compare_numeric(heartfailure2)

# Print compare_numeric class object
all_var

# Compare the correlation that case of joint the sodium variable
all_var %>% 
  "$"(correlation) %>% 
  filter(var1 == "sodium" | var2 == "sodium") %>% 
  arrange(desc(abs(coef_corr)))
  
# Compare the correlation that case of abs(coef_corr) > 0.1
all_var %>% 
  "$"(correlation) %>% 
  filter(abs(coef_corr) > 0.1)
  
# Compare the linear model that case of joint the sodium variable  
all_var %>% 
  "$"(linear) %>% 
  filter(var1 == "sodium" | var2 == "sodium") %>% 
  arrange(desc(r.squared))
  
# Compare the two numerical variables
two_var <- compare_numeric(heartfailure2, sodium, creatinine)

# Print compare_numeric class objects
two_var
  
# Summary the all case : Return a invisible copy of an object.
stat <- summary(all_var)

# Just correlation
summary(all_var, method = "correlation")

# Just correlation condition by r > 0.1
summary(all_var, method = "correlation", thres_corr = 0.1)

# linear model summaries condition by R^2 > 0.05
summary(all_var, thres_rs = 0.05)

# verbose is FALSE 
summary(all_var, verbose = FALSE)
  
# plot all pair of variables
plot(all_var)

# plot a pair of variables
plot(two_var)

# plot all pair of variables by prompt
plot(all_var, prompt = TRUE)

# plot a pair of variables not focuses on typographic elements
plot(two_var, typographic = FALSE)



dlookr documentation built on July 9, 2023, 6:31 p.m.