corrigraph: igraph of correlated variables global or in relation to y

View source: R/corrigraph.R

corrigraphR Documentation

igraph of correlated variables global or in relation to y

Description

igraph of correlated variables global or in relation to y

Usage

corrigraph(
  data,
  colY = c(),
  colX = c(),
  type = "x",
  alpha = 0.05,
  exclude = c(0, 0, 0),
  ampli = 4,
  return = FALSE,
  wash = "stn",
  multi = TRUE,
  mu = FALSE,
  prop = FALSE,
  layout = "fr",
  cluster = TRUE,
  verbose = FALSE,
  NAfreq = 1,
  NAcat = FALSE,
  level = 2,
  evolreg = FALSE
)

Arguments

data

a data.frame

colY

a vector of indices or variables to predict. To force the correlogram to display only the variables correlated to a selection of Y.

colX

a vector of indices or variables to follow. We will only keep the variables that are connected to them on 1 or more levels (level parameter).

type

"x" or "y". To force the display in correlogram mode (colX, type = "x") or in prediction mode (colY, type = "y").

alpha

the maximum permissible p-value for the display

exclude

the minimum threshold of displayed correlations - or a vector of threshold in this order : c(cor,mu,prop)

ampli

coefficient of amplification of vertices

return

if return=T, returns the correlation matrix of significant correlation.

wash

automatically eliminates variables using differnts methods when there are too many variables (method = NA, stn (signal-to-noise ratio), sum, length).

multi

to ignore multiple regressions and control only single regressions.

mu

to display the effect on median/mean identified by m.test().

prop

to display the dependencies between categorical variables identified by GTest().

layout

to choose the network organization method - choose "fr", "circle", "kk" or "3d".

cluster

to make automatic clustering of variables or not.

verbose

to see the comments.

NAfreq

from 0 to 1. NA part allowed in the variables. 1 by default (100% of NA tolerate).

NAcat

TRUE or FALSE. Requires recognition of missing data as categories.

level

to be used with colY. Number of variable layers allowed (minimum 2, default 5).

evolreg

TRUE or FALSE. Not yet available. Allows you to use the evolreg function to improve the predictive ability (R squared) for the variables specified in colY.

Value

Correlation graph network (igraph) of the variables of a data.frame. Pay attention to the possible presence of non-numeric variables or missing data. Grouping of correlated variables: the vertices (circles) correspond to the variables. The more a variable is connected, the larger it appears. The color of the lines reflects the nature of the correlation (positive or negative). The size of the lines is the value of the correlation from 0 to 1. All these correlations are significant (pval < 0.01). The coloured groupings reflect families of inter-correlated variables. BLUE: positive correlation - RED: negative correlation

When mu is TRUE or prop : we see the connexion with mean effect (orange) and G (~chisq) effect (pink)

The size of orange edge and pink edge depend of p-values (-1*log10(p-value)/10) of kruskal.test() and GTest().

When indicating Y's in colY, the correlogram will identify the correlated X's, then the remaining X's correlated to these X's, and so on.

X's not related to these Y's are excluded.

The blue always displays the positive correlations and the red, negative correlations. When the display is green, it means that the predictive (~correlation) capacity of the variable can be reinforced by adding a 2nd variable in a multiple regression model (interaction X1+X2, X1*X2 or X1+X1:X2) better than X1 or X2 alone.

Correlations between X or Y of the same level are neglected.

The color of the vertices makes it possible to identify the correlated variables alone in a significant way (blue: positive, red: negative, purple: positive or negative depending on the Y).

The values displayed to the right of the Ys (colY) correspond to the maximum predictive capacity of these Ys by one or two variables.

Examples

# Example 1
data(swiss)
corrigraph(swiss)
# Example 2
data(airquality)
corrigraph(airquality,layout="3d")
# Example 3
data(airquality)
corrigraph(airquality,c("Ozone","Wind"),type="y")
# Example 4
data(iris)
corrigraph(iris,mu=TRUE)
# Example 5
require(MASS) ; data(Aids2)
corrigraph(Aids2 ,prop=TRUE,mu=TRUE,exclude=c(0.3,0.3,0))
# Example 6
data(airquality)
corrigraph(airquality,c("Ozone","Wind"),type="x")

Antoine-Masse/KefiR documentation built on Feb. 22, 2024, 5:54 a.m.