LBoost: LBoost

Description Usage Arguments Value Author(s) References See Also Examples

View source: R/LBoost.R

Description

Constructs an ensemble of logic regression models using boosting for classification and identification of important predictors and predictor interactions

Usage

1
2
LBoost(resp, Xs, anneal.params, nBS = 100, kfold = 5, nperm = 1, 
PI.imp = NULL, pred.imp = FALSE)

Arguments

resp

numeric vector of binary response values.

Xs

matrix or data frame of zeros and ones for all predictor variables.

anneal.params

a list containing the parameters for simulated annealing. See the help file for the function logreg.anneal.control in the LogicReg package. If missing, default annealing parameters are set at start=1, end=-2, and iter=50000.

nBS

number of logic regression trees to be fit in the LBoost model.

kfold

The number of times the data are to be split in constructing the ensemble.

nperm

If measuring predictor importance of interaction importance using the permutation based measure, nperm is the number of permutations to be done in determining predictor of interaction importance.

PI.imp

A character string describing which measure of interaction importance will be used. Possible values include "Permutation", "AddRemove", and "Both". Using "Permutation" will provide the permutation based measure of interaction importance, "AddRemove" will provide the add-in/leave-out based measure of interaction importance, and "Both" provides both measures of importance.

pred.imp

logical. If FALSE, predictor importance scores will not be measured.

Value

An object of class "LBoost" which is a list including values

CVmod

A list of all logic regression fits and the associated information in the LBoost model. Each item in the list also gives a list of LR fits for a specific kfold data set, a matrix of weights given to each LR fit for that kfold data set, a matrix of the kfold training data used to construct the list of fits.

CVmisclass

a list including the mean cross-validation misclassification rate for the models and a list of vectors giving the predictions for each of the kfold test data sets.

AddRemove.PIimport

If PI.imp is specified as either "AddRemove" or "Both, this is a vector of add-in/leave-out importance scores for all interactions that occur in the LBoost model. If PI.imp is not specified or is "Permutation", this will state "Not measured".

Perm.PIimport

If PI.imp is specified as either "Permutation" or "Both, this is a vector of add-in/leave-out importance scores for all interactions that occur in the LBoost model. If PI.imp is not specified or is "AddRemove", this will state "Not measured".

Pred.import

If pred.imp is specified as TRUE, a vector of importance scores for all predictors in the data.

Pred.freq

a vector frequency of predictors occurring in individual logic regression in the LBoost model.

PI.frequency

a vector frequency of interactions occurring in individual logic regression in the LBoost model.

wt.mat

a list containing kfold matrices of observation weights for each tree for the kfold training data sets.

alphas

a list containing kfold vectors of tree specific weights for trees constructed from each of the kfold training data sets.

data

A matrix of the original data used to construct the LBoost model.

PIimp

A character string describing which interaction importance measure was used.

PredImp

logical. If TRUE predictor importance was measured.

Author(s)

Bethany Wolf wolfb@musc.edu

References

Wolf, B.J., Hill, E.G., Slate, E.H., Neumann, C.A., Kistner-Griffin, E. (2012). LBoost: A boosting algorithm with applications for epistasis discovery. PLoS One.

See Also

print.LBoost, predict.LBoost, BoostVimp.plot, submatch.plot, persistence.plot

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
data(LF.data)

#Set using annealing parameters using the logreg.anneal.control 
#function from LogicReg package
newanneal<-logreg.anneal.control(start=1, end=-2, iter=2000)

#typically more than 2000 iterations (>25000) would be used for 
#the annealing algorithm.  A typical LBoost models also contains at 
#least 100 trees.  These parameters were set to allow for faster
#run time

#The data set LF.data contains 50 binary predictors and a binary response Ybin
#Looking at only the Permutation Measure
LBfit.1<-LBoost(resp=LF.data$Ybin, Xs=LF.data[,1:50], nBS=10, kfold=2,
anneal.params=newanneal, nperm=2, PI.imp="Permutation")
print(LBfit.1)

#Looking at only the Add-in/Leave-out importance measure
LBfit.2<-LBoost(resp=LF.data$Ybin, Xs=LF.data[,1:50], nBS=10, kfold=2,
anneal.params=newanneal, PI.imp="AddRemove")
print(LBfit.2)

#Looking at both measures of importance plus predictor importance
LBfit.3<-LBoost(resp=LF.data$Ybin, Xs=LF.data[,1:50], nBS=10, kfold=2,
anneal.params=newanneal, nperm=2, PI.imp="Both", pred.imp=TRUE)
print(LBfit.3)

Example output

Loading required package: LogicReg
Loading required package: survival
Loading required package: CircStats
Loading required package: MASS
Loading required package: boot

Attaching package: 'boot'

The following object is masked from 'package:survival':

    aml

2 -fold training datasets,  5  trees per training dataset
Number of logic regression trees = 10

CV model error rate =  0.265

    Prime Implicant   Permutation Importance   Frequency
1   X4 & X5           1                        1        
2   X4 & X50          0.66                     1        
3   X5                0.53                     1        
4   X20 & X50         0.3                      1        
5   !X47 & X50        0.27                     1        
2 -fold training datasets,  5  trees per training dataset
Number of logic regression trees = 10

CV model error rate =  0.205

    Prime Implicant   Add-in/Leave-out Importance   Frequency
1   X5                1                             2        
2   X9 & X10          0.53                          1        
3   X4                0.42                          1        
4   X4 & !X9 & !X15   0.16                          1        
5   X4 & !X9 & !X20   0.16                          1        
2 -fold training datasets,  5  trees per training dataset
Number of logic regression trees = 10

CV model error rate =  0.21

    Predictor   Importance   Frequency
1   X5          1            4        
2   X4          0.51         3        
3   X50         0.25         3        
4   X43         0.05         1        
5   X32         0.03         4        

[1] 1 2 3 4 5
Top 5 prime implicants using the Add-in/Leave-out measure
    Prime Implicant           Add-in/Leave-out Importance   Frequency
1   X4 & X5                   1                             2        
2   X5                        0.6                           2        
3   !X48 & X50                0.13                          1        
4   X9 & !X37 & X47           0.03                          1        
5   !X17 & !X18 & X28 & X30   0.03                          1        


Top 5 prime implicants using the permutation measure
    Prime Implicant   Permutation Importance   Frequency
1   X4 & X5           1                        2        
2   X5                0.81                     2        
3   X32 & X50         0.33                     1        
4   X38 & X50         0.15                     1        
5   X20 & X50         0.13                     1        

LogicForest documentation built on May 30, 2017, 3:07 a.m.