widenet: Extends the relaxnet Package with Polynomial Basis Expansions

Description Usage Arguments Details Value Note Author(s) References See Also Examples

View source: R/widenet.R

Description

Expands the basis according to the order argument, then runs relaxnet in order to select a subset of the basis functions. Multiple values of order and alpha (the elastic net tuning parameter) may be specified, leading to selection of a specific value by cross-validation.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
widenet(x, y, family = c("gaussian", "binomial"),
        order = 1:3,
        alpha = 1,
        nfolds = 10,
        foldid,
        screen.method = c("none", "cor", "ttest"),
        screen.num.vars = 50,
        multicore = FALSE,
        mc.cores,
        mc.seed = 123,
        ...)

Arguments

x

Input matrix, each row is an observation vector. Sparse matrices are not yet supported for the widenet function. Must have unique colnames.

y

Response variable. Quantitative for family="gaussian". For family="binomial" should be either a factor with two levels, or a two-column matrix of counts or proportions.

family

Response type (see above).

order

The order of basis expansion. Elements must be in the set c(1, 2, 3). If there is more than one element, cross-validation is used to chose the order with best cross-validated performance.

alpha

The elastic net mixing parameter, see glmnet. If there is more than one element, cross-validation is used to chose the value with best cross-validated performance.

nfolds

Number of folds - default is 10. Although nfolds can be as large as the sample size (leave-one-out CV), it is not recommended for large datasets. Smallest value allowable is nfolds=3.

foldid

An optional vector of values between 1 and nfold identifying what fold each observation is in. If supplied, nfolds can be missing.

screen.method

The method to use to screen variables before basis expansion is applied. Default is no screening. "cor" = correlation, i.e. bivariate correlation with the outcome. ttest is meant for binary outcomes (family = "binomial"). The screening methods are adapted from the SuperLearner package, the author of which is Eric Polley.

screen.num.vars

The number of variables (columns of x to screen in when using screening.

multicore

Should execution be parallelized over cv folds (for cv.relaxnet) or over alpha values (for cv.alpha.relaxnet) using multicore functionality from R's parallel package?

mc.cores

Number of cores/cpus to be used for multicore processing. Parallelization is over cross-validation folds.

mc.seed

Integer value with which to seed the RNG when using parallel processing (internally, RNGkind will be called to set the RNG to "L'Ecuyer-CMRG"). Will be ignored if multicore is FALSE. If mulicore is FALSE, one should be able to get reprodicible results by setting the seed normally (with set.seed) prior to running.

...

Further arguments passed to relaxnet or cv.relaxnet, which should also be passed on to glmnet. Use with caution as this has not been tested.

Details

The type.measure argument has not yet been implemented. For type = gaussian models, mean squared error is used, and for type = binomial, binomial deviance is used.

Value

Returns and object of class "widenet" with the following elements:

call

A copy of the call which generated this object

order

The value of the order argument

alpha

The value of the alpha argument

screen.method

The value of the screen.method argument

screened.in.index

A vector which indexes the columns of x, indicating those variables which were screened in for the run on the full data

colsBinary

A vector of length ncol(x) representing which of the columns of x contained binary data. These columns will be represented by a 2. The other columns will have a 3.

cv.relaxnet.results

A list of lists containing "cv.relaxnet" objects, one for each combination of values of alpha and order.

min.cvm.mat

A matrix containing the minimum cross-validated risk for each combination of values of alpha and order

which.order.min

The order which "won" the cross-validation, i.e. resulted in minimum cross-validated risk.

which.alpha.min

The alpha value which "won" the cross-validation.

total.time

Total time in seconds to produce this result.

Note

This is a preliminary release and several additional features are planned for later versions.

Author(s)

Stephan Ritter, with design contributions from Alan Hubbard.

Much of the code (and some help file content) is adapted from the glmnet package, whose authors are Jerome Friedman, Trevor Hastie and Rob Tibshirani.

References

Stephan Ritter and Alan Hubbard, Tech report (forthcoming).

See Also

predict.widenet, relaxnet, cv.relaxnet

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
n <- 300
p <- 5

set.seed(23)
x <- matrix(rnorm(n*p), n, p)

colnames(x) <- paste("x", 1:ncol(x), sep = "")

y <- x[, 1] + x[, 2] + x[, 3] * x[, 4] + x[, 5]^2 + rnorm(n)

widenet.result <- widenet(x, y, family = "gaussian",
                          order = 2, alpha = 0.5)

summary(widenet.result)
coefs <- drop(predict(widenet.result, type = "coef"))
coefs[coefs != 0]

widenet documentation built on May 2, 2019, 2:10 p.m.

Related to widenet in widenet...