selCorrelation: Variable Selection with Highest Correlation

View source: R/selCorrelation.R

selCorrelationR Documentation

Variable Selection with Highest Correlation

Description

This function selects a given number of variables in the framework of linear regression. The selected variables are those having the highest correlation (in absolute value) with the response.

Usage

selCorrelation(X, Y, target)

Arguments

X

numeric design matrix (excluding the intercept), where columns correspond to variables, and rows to observations.

Y

numeric response vector.

target

maximum number of variables to be selected.

Value

selCorrelation returns a numeric vector containing the indices of the selected variables.

Author(s)

Anna Vesely.

Examples

# generate linear regression data with 20 variables and 10 observations
res <- simData(m1=2, m=20, n=10, rho=0.5, type="toeplitz", SNR=5, seed=42)
X <- res$X # design matrix
Y <- res$Y # response vector
active <- res$active # indices of active variables

# choose target as twice the number of active variables
target <- 2*length(active)

# selection of at most target variables using highest correlations
selCorrelation(X, Y, target)

annavesely/splitFlip documentation built on July 27, 2024, 4:23 a.m.