ElasticNetCMA: Classfication and variable selection by the ElasticNet

Description Usage Arguments Value Note Author(s) References See Also Examples

Description

Zou and Hastie (2004) proposed a combined L1/L2 penalty for regularization and variable selection. The Elastic Net penalty encourages a grouping effect, where strongly correlated predictors tend to be in or out of the model together. The computation is done with the function glmpath from the package of the same name.
The method can be used for variable selection alone, s. GeneSelection.
For S4 method information, see ElasticNetCMA-methods.

Usage

1
ElasticNetCMA(X, y, f, learnind, norm.fraction = 0.1, alpha=0.5, models=FALSE, ...)

Arguments

X

Gene expression data. Can be one of the following:

  • A matrix. Rows correspond to observations, columns to variables.

  • A data.frame, when f is not missing (s. below).

  • An object of class ExpressionSet. note: by default, the predictors are scaled to have unit variance and zero mean. Can be changed by passing standardize = FALSE via the ... argument.

y

Class labels. Can be one of the following:

  • A numeric vector.

  • A factor.

  • A character if X is an ExpressionSet that specifies the phenotype variable.

  • missing, if X is a data.frame and a proper formula f is provided.

WARNING: The class labels will be re-coded to range from 0 to K-1, where K is the total number of different classes in the learning set.

f

A two-sided formula, if X is a data.frame. The left part correspond to class labels, the right to variables.

learnind

An index vector specifying the observations that belong to the learning set. May be missing; in that case, the learning set consists of all observations and predictions are made on the learning set.

norm.fraction

L1 Shrinkage intensity, expressed as the fraction of the coefficient L1 norm compared to the maximum possible L1 norm (corresponds to fraction = 1). Lower values correspond to higher shrinkage. Note that the default (0.1) need not produce good results, i.e. tuning of this parameter is recommended.

alpha

The elasticnet mixing parameter, with 0<alpha<= 1. The penalty is defined as

(1-alpha)/2||beta||_2^2+alpha||beta||_1.

alpha=1 is the lasso penalty; Currently 'alpha<0.01' not reliable, unless you supply your own lambda sequence

models

a logical value indicating whether the model object shall be returned

...

Further arguments passed to the function glmpath from the package of the same name.

Value

An object of class clvarseloutput.

Note

For a strongly related method, s. LassoCMA.
Up to now, this method can only be applied to binary classification.

Author(s)

Martin Slawski ms@cs.uni-sb.de

Anne-Laure Boulesteix boulesteix@ibe.med.uni-muenchen.de

Christoph Bernau bernau@ibe.med.uni-muenchen.de

References

Zhou, H., Hastie, T. (2004).
Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society B, 67(2),301-320

Young-Park, M., Hastie, T. (2007)
L1-regularization path algorithm for generalized linear models.
Journal of the Royal Statistical Society B, 69(4), 659-677

See Also

compBoostCMA, dldaCMA, fdaCMA, flexdaCMA, gbmCMA, knnCMA, ldaCMA, LassoCMA, nnetCMA, pknnCMA, plrCMA, pls_ldaCMA, pls_lrCMA, pls_rfCMA, pnnCMA, qdaCMA, rfCMA, scdaCMA, shrinkldaCMA, svmCMA

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
### load Golub AML/ALL data
data(golub)
### extract class labels
golubY <- golub[,1]
### extract gene expression
golubX <- as.matrix(golub[,-1])
### select learningset
ratio <- 2/3
set.seed(111)
learnind <- sample(length(golubY), size=floor(ratio*length(golubY)))
### run ElasticNet - penalized logistic regression (no tuning)
result <- ElasticNetCMA(X=golubX, y=golubY, learnind=learnind, norm.fraction = 0.2, alpha=0.5)
show(result)
ftable(result)
plot(result)

Example output

Loading required package: Biobase
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, sd, var, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colMeans, colSums, colnames, do.call,
    duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
    lapply, lengths, mapply, match, mget, order, paste, pmax, pmax.int,
    pmin, pmin.int, rank, rbind, rowMeans, rowSums, rownames, sapply,
    setdiff, sort, table, tapply, union, unique, unsplit, which,
    which.max, which.min

Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Loaded glmnet 2.0-16

Warning message:
In lognet(x, is.sparse, ix, jx, y, weights, offset, alpha, nobs,  :
  one multinomial or binomial class has fewer than 8  observations; dangerous ground
binary Classification with Elastic Net
number of predictions: 13
number of missclassifications:  6 
missclassification rate:  0.462 
sensitivity: 0 
specificity: 1 
    predicted
true 0 1
   0 7 0
   1 6 0

CMA documentation built on Nov. 8, 2020, 5:02 p.m.