splines2d: Spline-based dependency measure for pairs of variables

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/twod.R

Description

The function calculates a smoothing spline-based measure for quantifying functional dependencies between two variables. The function gam from package mgcv is used.

Usage

1
splines2d(x, y = NULL, binning = FALSE, b = 50, anchor = "min", parallel=FALSE)

Arguments

x

A numeric vector, a numeric matrix or a data frame. In case of a data frame only the numeric variables are used.

y

A numeric vector.

binning

A logical value. Whether or not binning should be used. TRUE, "equi" for equidistant binng, "quant" for quantile based binning or "hexb" for hexagonal binning. Default is FALSE.

b

A positive integer. Number of bins in each variable.

anchor

A chraracter string or a numeric value. How should the anchorpoint be chosen? "min" (default) for the minimum of each variable, "ggplot" for the method used in ggplot graphics, "nice" for a "pretty" anchorpoint, or a user specified value.

parallel

A logical value. Whether or not parallelization should be used. Default is FALSE.

Details

For each pair of variables x and y a model where x depends on y and a model where y depends on x are calculated. The proportions of the explained variance is calculated for both models and the maximum is returned. "cr" basis is used for faster calculation.

The number of start knots depends on the number of unique values in the independent variable. If the number is smaller than 20, 3 start knots are used, 10 otherwise.

The smoothing parameter is determined by cross validation.

Value

A numeric value decribing the value of the measure if a pair of vectors is given. Otherwise a data frame with the following variables:

splines2d

Value of the measure.

x1

Number of first variable

x2

Number of second variable.

nx1

Name of first variable (missing if x is not a data frame).

nx2

Name of second variable (missing if x is not a data frame).

tarvar

The variable which was use as target variable (delivered higher value in the measure).

Author(s)

Katrin Grimm

References

S. N. Wood (2006) Generalized Additive Models: An Introduction with R. CRC Press, London.

S. N. Wood (2016). mgcv: Mixed GAM Computation Vehicle with GCV/AIC/REML Smoothness Estimation. https://cran.r-project.org/package=mgcv

See Also

gam in mgcv, dcor2d

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
data(Election2005)
## Not run: 
# spline-based measure for all pairs of variables
spl <- splines2d(Election2005)

# order the pairs decreasing
o_spl <- spl[with(spl,order(spl[,1],decreasing=TRUE)),]

# show the 10 pairs with highest values
o_spl[1:10,]

# Show the 4 scatterplots with highest values
par(mfrow=c(2,2))
for(i in 1:4){
plot(with(Election2005,get(as.character(o_spl$nx1[i]))),
  with(Election2005,get(as.character(o_spl$nx2[i]))), 
  xlab=paste(o_spl$nx1[i]),ylab=paste(o_spl$nx2[i]),pch=19)
}

## End(Not run)

mbgraphic documentation built on May 2, 2019, 2:45 a.m.