lm_CLV: linear model based on CLV

View source: R/lm_CLV.R

lm_CLVR Documentation

linear model based on CLV

Description

prediction of a response variable, y, based on clusters of predictors variables, X. boosted-liked procedure for identifying groups of predictors, and their associated latent component, well correlated with the actual residuals of response variable, y. sparsity is allowed using the strategy options ("sparselv" or "kplusone") and the rho parameter.

Usage

lm_CLV(
  X,
  y,
  method = "directional",
  sX = TRUE,
  shrinkp = 0.5,
  strategy = "none",
  rho = 0.3,
  validation = FALSE,
  id.test = NULL,
  maxiter = 100,
  threshold = 1e-05
)

Arguments

X

: The matrix of the predictors, the variables to be clustered

y

: The response variable (usually numeric) If y is binary factor, indicator variable (0/1) is generated. A Bayes rule is used to compute class probabilities.
Performance criteria is RMSE for numerical variable; RMSE and error rate for binary factor.

method

: The criterion to be use in the cluster analysis.
1 or "directional" : the squared covariance is used as a measure of proximity (directional groups).
2 or "local" : the covariance is used as a measure of proximity (local groups)

sX

: TRUE/FALSE, i.e. standardization or not of the columns X (TRUE by default)

shrinkp

: shrinkage paramater used in the boosting (max : 1, 0.5 by default).
If shrinkp is a vector of positive values greater than 0, and lower or equal to 1, the outputs are given for each value.

strategy

: "none" (by default), or "kplusone" (an additional cluster for the unclassifiable variables), or "sparselv" (zero loadings for the unclassifiable variables)

rho

: a threshold of correlation between 0 and 1 (used in "kplusone" or "sparselv" strategy, 0.3 by default)

validation

TRUE/FALSE i.e. using a test set or not. By default no validation

id.test

: if validation==TRUE, the number of the observations used as test set

maxiter

: the maximum number of components extracted (100 by default)

threshold

: used in a stopping rule, when the relative calibration errors sum of squares stabilizes (10e-6 by default)

Value

Group

a list of the groups of variables X in order of the first time extracted.

Comp

a list of the latent components associated with the groups of X variables extracted.

Load

a list for the loadings of the X variables in the latent component.

Alpha

a list of the regression coefficients to be applied to the latent components.
The coefficients are aggregated when the same latent component is extracted several times during the iterative steps.

Beta

a list of the beta coefficients to be applied to the pretreated predictors.
For a model with the A first latent components, the A first elements of the list must be added together.

GroupImp

Group Importance i.e. the decrease of the residuals' variance provided by the CLV components in the model.

RMSE.cal

the root mean square error for the calibration set, at each step of the procedure.

ERRrate.cal and rocAUC.cal

when y is a binary factor, the classification rate and the AUC for ROC, on the bassis of the calibration set, at each step of the procedure.

RMSE.val

as RMSE.cal but for the test set, if provided.

ERRrate.val and rocAUC.val

as for calibration set but for the test set, if provided.

See Also

CLV, CLV_kmeans


ClustVarLV documentation built on May 28, 2022, 5:05 p.m.