corrigraph: igraph of correlated variables global or in relation to y

View source: R/corrigraph.R

corrigraphR Documentation

igraph of correlated variables global or in relation to y

Description

igraph of correlated variables global or in relation to y

Usage

corrigraph(
  data,
  colY = c(),
  colX = c(),
  type = "x",
  alpha = 0.05,
  exclude = c(0, 0, 0),
  ampli = 4,
  return = FALSE,
  wash = "stn",
  multi = TRUE,
  mu = FALSE,
  prop = FALSE,
  layout = "fr",
  cluster = TRUE,
  verbose = FALSE,
  NAfreq = 1,
  NAcat = FALSE,
  level = 2,
  evolreg = FALSE
)

Arguments

data

a data.frame

colY

a vector of indices or variables to predict. To force the correlogram to display only the variables correlated to a selection of Y.

colX

a vector of indices or variables to follow. We will only keep the variables that are connected to them on 1 or more levels (level parameter).

type

"x" or "y". To force the display in correlogram mode (colX, type = "x") or in prediction mode (colY, type = "y").

alpha

the maximum permissible p-value for the display

exclude

the minimum threshold of displayed correlations - or a vector of threshold in this order : c(cor,mu,prop)

ampli

coefficient of amplification of vertices

return

if return=TRUE, returns the correlation matrix of significant correlation.

wash

automatically eliminates variables using differents methods when there are too many variables (method = NA, stn (signal-to-noise ratio), sum, length).

multi

to ignore multiple regressions and control only single regressions.

mu

to display the effect on median/mean identified by m.test().

prop

to display the dependencies between categorical variables identified by GTest().

layout

to choose the network organization method - choose "fr", "circle", "kk" or "3d".

cluster

to make automatic clustering of variables or not.

verbose

to see the comments.

NAfreq

from 0 to 1. NA part allowed in the variables. 1 by default (100% of NA tolerate).

NAcat

TRUE or FALSE. Requires recognition of missing data as categories.

level

to be used with colY. Number of variable layers allowed (minimum 2, default 5).

evolreg

TRUE or FALSE. Not yet available. Allows you to use the evolreg function to improve the predictive ability (R squared) for the variables specified in colY.

Value

Depending on the parameters:

igraph

A correlation graph network (igraph) of the variables of a data.frame. Non-numeric variables or missing data may be present. Vertices (circles) represent variables, with size indicating connectivity. The color of the edges reflects the nature of the correlation (positive in blue, negative in red). The width of the edges represents the strength of the correlation.

mu/prop

If mu is TRUE or prop is specified: Connections display mean effects (orange) and dependencies between categorical variables (pink). The edge sizes depend on p-values from kruskal.test() and GTest().

Y specification

When colY is specified: The correlogram identifies X variables correlated to Y, iterating through layers specified by level. X variables not related to Y are excluded.

vertex colors

The color of vertices indicates significant correlations (blue for positive, red for negative, purple for both).

max predictive capacity

Values displayed next to Y variables (colY) indicate the maximum predictive capacity by one or two variables.

correlation matrix

If return is TRUE, the function returns the correlation matrix of significant correlations.

Examples

# Example 1
data(swiss)
corrigraph(swiss)
# Example 2
data(airquality)
corrigraph(airquality,layout="3d")
# Example 3
data(airquality)
corrigraph(airquality,c("Ozone","Wind"),type="y")
# Example 4
data(iris)
corrigraph(iris,mu=TRUE)
# Example 5
require(MASS) ; data(Aids2)
corrigraph(Aids2 ,prop=TRUE,mu=TRUE,exclude=c(0.3,0.3,0))
# Example 6
data(airquality)
corrigraph(airquality,c("Ozone","Wind"),type="x")

Antoine-Masse/KefiR documentation built on July 4, 2024, 11:40 a.m.