thinRandomGLM: Random generalized linear model predictor thinning

View source: R/randomGLM.R

thinRandomGLMR Documentation

Random generalized linear model predictor thinning

Description

This function allows the user to define a "thinned" version of a random generalized linear model predictor by focusing on those features that occur relatively frequently.

Usage

thinRandomGLM(rGLM, threshold)

Arguments

rGLM

a randomGLM object such as one returned by randomGLM.

threshold

integer specifying the minimum of times a feature was selected across the bags in rGLM for the feature to be kept. Note that only features selected threshold +1 times and more are retained. For the purposes of this count, appearances in interactions are not counted. Features that appear threshold times or fewer are removed from the underlying regression models when the models are re-fit.

Details

The function "thins out" (reduces) a previously-constructed random generalized linear model predictor by removing rarely selected features and refitting each (generalized) linear model (GLM). Each GLM (per bag) is refit using only those features that occur more than threshold times across the nBags number of bags. The occurrence count excludes interactions (in other words, the threshold will be applied to the first row of timesSelectedByForwardRegression).

Value

The function returns a valid randomGLM object (see randomGLM for details) that can be used as input to the predict() method (see predict.randomGLM). The returned object contains a copy of the input rGLM in which the following components were modified:

predictedOOB

the updated continuous prediction (if classify is FALSE) or predicted classification (if classify is TRUE) of the input data based on out-of-bag samples.

predictedOOB.response

In case of a binary outcome, the updated predicted probability of each outcome specified by y based on out-of-bag samples. In case of a continuous outcome, this is the predicted value based on out-of-bag samples (i.e., a copy of predictedOOB).

featuresInForwardRegression

features selected by forward selection in each bag. A list with one component per bag. Each component is a matrix with maxInteractionOrder rows. Each column represents one interaction obtained by multiplying the features indicated by the entries in each column (0 means no feature, i.e. a lower order interaction).

coefOfForwardRegression

coefficients of forward regression. A list with one component per bag. Each component is a vector giving the coefficients of the model determined by forward selection in the corresponding bag. The order of the coefficients is the same as the order of the terms in the corresponding component of featuresInForwardRegression.

interceptOfForwardRegression

a vector with one component per bag giving the intercept of the regression model in each bag.

timesSelectedByForwardRegression

a matrix of maxInteractionOrder rows and number of features columns. Each entry gives the number of times the corresponding feature appeared in a predictor model at the corresponding order of interactions. Interactions where a single feature enters more than once (e.g., a quadratic interaction of the feature with itself) are counted once.

models

the "thinned" regression models for each bag.

Author(s)

Lin Song, Steve Horvath, Peter Langfelder

References

Lin Song, Peter Langfelder, Steve Horvath: Random generalized linear model: a highly accurate and interpretable ensemble predictor. BMC Bioinformatics (2013)

Examples


## binary outcome prediction
# data generation
data(iris)
# Restrict data to first 100 observations
iris=iris[1:100,]
# Turn Species into a factor
iris$Species = as.factor(as.character(iris$Species))
# Select a training and a test subset of the 100 observations
set.seed(1)
indx = sample(100, 67, replace=FALSE)
xyTrain = iris[indx,]
xyTest = iris[-indx,]
xTrain = xyTrain[, -5]
yTrain = xyTrain[, 5]

xTest = xyTest[, -5]
yTest = xyTest[, 5]

# predict with a small number of bags - normally nBags should be at least 100.
RGLM = randomGLM(
   xTrain, yTrain, 
   nCandidateCovariates=ncol(xTrain), 
   nBags=30, 
   keepModels = TRUE, nThreads = 1)
table(RGLM$timesSelectedByForwardRegression[1, ])
# 0  7 23 
# 2  1  1 

thinnedRGLM = thinRandomGLM(RGLM, threshold=7)
predicted = predict(thinnedRGLM, newdata = xTest, type="class")
predicted = predict(RGLM, newdata = xTest, type="class")


randomGLM documentation built on April 11, 2022, 1:06 a.m.