corrp: corrp compute correlations types analysis in parallel...

View source: R/corrp.R

corrpR Documentation

corrp compute correlations types analysis in parallel backend.

Description

Compute correlations type analysis on mixed classes columns of larges dataframes with parallel backend. The dataframe is allowed to have columns of these four classes: integer, numeric, factor and character. The character column is considered as categorical variable.

Usage

corrp(
  df,
  parallel = TRUE,
  n.cores = 1,
  p.value = 0.05,
  verbose = TRUE,
  num.s = 1000,
  rk = F,
  comp = c("greater", "less"),
  alternative = c("two.sided", "less", "greater"),
  cor.nn = c("pearson", "mic", "dcor", "pps"),
  cor.nc = c("lm", "pps"),
  cor.cc = c("cramersV", "uncoef", "pps"),
  lm.args = list(),
  pearson.args = list(),
  dcor.args = list(),
  mic.args = list(),
  pps.args = list(),
  cramersV.args = list(),
  uncoef.args = list(),
  ...
)

Arguments

df

\[data.frame(1)]
input data frame.

parallel

\[logical(1)]
If its TRUE run the operations in parallel backend.

n.cores

\[numeric(1)]
The number of cores to use for parallel execution.

p.value

\[logical(1)]
P-value probability of obtaining the observed results of a test, assuming that the null hypothesis is correct. By default p.value=0.05 (Cutoff value for p-value.).

verbose

\[logical(1)]
Activate verbose mode.

num.s

\[numeric(1)]
Used in permutation test. The number of samples with replacement created with y numeric vector.

rk

\[logical(1)]
Used in permutation test. if its TRUE transform x, y numeric vectors with samples ranks.

comp

\[character(1)]
The param p.value must be greater or less than those estimated in tests and correlations.

alternative

\[character(1)]
a character string specifying the alternative hypothesis for the correlation inference. It must be one of "two.sided" (default), "greater" or "less". You can specify just the initial letter.

cor.nn

\[character(1)]
Choose correlation type to be used in integer/numeric pair inference. The options are 'pearson: Pearson Correlation','mic: Maximal Information Coefficient', 'dcor: Distance Correlation','pps: Predictive Power Score'.Default is 'Pearson Correlation'.

cor.nc

\[character(1)]
Choose correlation type to be used in integer/numeric - factor/categorical pair inference. The option are 'lm: Linear Model','pps: Predictive Power Score'. Default is 'Linear Model'.

cor.cc

\[character(1)]
Choose correlation type to be used in factor/categorical pair inference. The option are ‘cramersV: Cramer’s V','uncoef: Uncertainty coefficient', ‘pps: Predictive Power Score'. Default is ' Cramer’s V'.

lm.args

\[list(1)]
additional parameters for the specific method.

pearson.args

\[list(1)]
additional parameters for the specific method.

dcor.args

\[list(1)]
additional parameters for the specific method.

mic.args

\[list(1)]
additional parameters for the specific method.

pps.args

\[list(1)]
additional parameters for the specific method.

cramersV.args

\[list(1)]
additional parameters for the specific method.

uncoef.args

\[list(1)]
additional parameters for the specific method.

...

Additional arguments (TODO).

Value

list with two tables: data and index.
- The '$data' table contains all the statistical results;
- The '$index' table contains the pairs of indices used in each inference of the data table. - All statistical tests are controlled by the confidence internal of p.value param. If the statistical tests do not obtain a significance greater/less than p.value the value of variable 'isig' will be 'FALSE'.
- There is no statistical significance test for the pps algorithm. By default 'isig' is TRUE.
- If any errors occur during operations the association measure('infer.value') will be 'NA'.

Details (Types)

- integer/numeric pair Pearson Correlation using cor function. The value lies between -1 and 1.
- integer/numeric pair Distance Correlation using dcorT.test function. The value lies between 0 and 1.
- integer/numeric pair Maximal Information Coefficient using mine function. The value lies between 0 and 1.
- integer/numeric pair Predictive Power Score using score function. The value lies between 0 and 1.

- integer/numeric - factor/categorical pair correlation coefficient or squared root of R^2 coefficient of linear regression of integer/numeric variable over factor/categorical variable using lm function. The value lies between 0 and 1.
- integer/numeric - factor/categorical pair Predictive Power Score using score function. The value lies between 0 and 1.

- factor/categorical pair Cramer's V value is computed based on chisq test and using cramersV function. The value lies between 0 and 1.
- factor/categorical pair Uncertainty coefficient using UncertCoef function. The value lies between 0 and 1.
- factor/categorical pair Predictive Power Score using score function. The value lies between 0 and 1.

Author(s)

Igor D.S. Siciliani

References

KS Srikanth,sidekicks,cor2, 2020. URL https://github.com/talegari/sidekicks/.

Paul van der Laken, ppsr,2021. URL https://github.com/paulvanderlaken/ppsr.

Examples

## Not run: 

air_cor = corrp(airquality)
air_m = corr_matrix(air_cor, isig = F)
corrplot::corrplot(air_m)

## End(Not run)


meantrix/corrP documentation built on Oct. 22, 2024, 10:16 a.m.