corr_fun: Compute Correlation type analysis with Statistical...

View source: R/corr_fun.R

corr_funR Documentation

Compute Correlation type analysis with Statistical Significance

Description

Compute correlation type analysis on two mixed classes columns of a given dataframe. The dataframe is allowed to have columns of these four classes: integer, numeric, factor and character. The character column is considered as categorical variable.

Usage

corr_fun(
  df,
  nx,
  ny,
  p.value = 0.05,
  verbose = TRUE,
  num.s = 1000,
  rk = F,
  comp = c("greater", "less"),
  alternative = c("two.sided", "less", "greater"),
  cor.nn = c("pearson", "mic", "dcor", "pps"),
  cor.nc = c("lm", "pps"),
  cor.cc = c("cramersV", "uncoef", "pps"),
  lm.args = list(),
  pearson.args = list(),
  dcor.args = list(),
  mic.args = list(),
  pps.args = list(),
  cramersV.args = list(),
  uncoef.args = list(),
  ...
)

Arguments

df

\[data.frame(1)]
input data frame.

nx

\[character(1)]
column name of independent/predictor variable.

ny

\[character(1)]
column name of dependent/target variable.

p.value

\[logical(1)]
P-value probability of obtaining the observed results of a test, assuming that the null hypothesis is correct. By default p.value=0.05 (Cutoff value for p-value.).

verbose

\[logical(1)]
Activate verbose mode.

num.s

\[numeric(1)]
Used in permutation test. The number of samples with replacement created with y numeric vector.

rk

\[logical(1)]
Used in permutation test. if its TRUE transform x, y numeric vectors with samples ranks.

comp

\[character(1)]
The param p.value must be greater or less than those estimated in tests and correlations.

alternative

\[character(1)]
a character string specifying the alternative hypothesis for the correlation inference. It must be one of "two.sided" (default), "greater" or "less". You can specify just the initial letter.

cor.nn

\[character(1)]
Choose correlation type to be used in integer/numeric pair inference. The options are 'pearson: Pearson Correlation','mic: Maximal Information Coefficient', 'dcor: Distance Correlation','pps: Predictive Power Score'.Default is 'Pearson Correlation'.

cor.nc

\[character(1)]
Choose correlation type to be used in integer/numeric - factor/categorical pair inference. The option are 'lm: Linear Model','pps: Predictive Power Score'. Default is 'Linear Model'.

cor.cc

\[character(1)]
Choose correlation type to be used in factor/categorical pair inference. The option are ‘cramersV: Cramer’s V','uncoef: Uncertainty coefficient', ‘pps: Predictive Power Score'. Default is ' Cramer’s V'.

lm.args

\[list(1)]
additional parameters for the specific method.

pearson.args

\[list(1)]
additional parameters for the specific method.

dcor.args

\[list(1)]
additional parameters for the specific method.

mic.args

\[list(1)]
additional parameters for the specific method.

pps.args

\[list(1)]
additional parameters for the specific method.

cramersV.args

\[list(1)]
additional parameters for the specific method.

uncoef.args

\[list(1)]
additional parameters for the specific method.

...

Additional arguments (TODO).

Value

list with all statistical results.
- All statistical tests are controlled by the confidence internal of p.value param. If the statistical tests do not obtain a significance greater/less than p.value the value of variable 'isig' will be 'FALSE'.
- There is no statistical significance test for the pps algorithm. By default 'isig' is TRUE.
- If any errors occur during operations by default the association measure('infer.value') will be 'NA'.

Details (Types)

- integer/numeric pair Pearson Correlation using cor function. The value lies between -1 and 1.
- integer/numeric pair Distance Correlation using dcorT.test function. The value lies between 0 and 1.
- integer/numeric pair Maximal Information Coefficient using mine function. The value lies between 0 and 1.
- integer/numeric pair Predictive Power Score using score function. The value lies between 0 and 1.

- integer/numeric - factor/categorical pair correlation coefficient or squared root of R^2 coefficient of linear regression of integer/numeric variable over factor/categorical variable using lm function. The value lies between 0 and 1.
- integer/numeric - factor/categorical pair Predictive Power Score using score function. The value lies between 0 and 1.

- factor/categorical pair Cramer's V value is computed based on chisq test and using cramersV function. The value lies between 0 and 1.
- factor/categorical pair Uncertainty coefficient using UncertCoef function. The value lies between 0 and 1.
- factor/categorical pair Predictive Power Score using score function. The value lies between 0 and 1.

Author(s)

Igor D.S. Siciliani

References

KS Srikanth,sidekicks,cor2, 2020. URL https://github.com/talegari/sidekicks/.

Paul van der Laken, ppsr,2021. URL https://github.com/paulvanderlaken/ppsr.

Examples

## Not run: 

corr_fun(iris, nx = "Sepal.Length", ny = "Sepal.Width", cor.nn = "dcor")

## End(Not run)


meantrix/corrP documentation built on Oct. 22, 2024, 10:16 a.m.