logforest: Logic Forest

Description Usage Arguments Value Author(s) References See Also Examples

View source: R/logforest.R

Description

Constructs an ensemble of logic regression models using bagging for classification and identification of important predictors and predictor interactions

Usage

1
2
logforest( resp, Xs, nBSXVars, anneal.params, nBS = 100, h = 0.5, 
norm = TRUE, numout = 5)

Arguments

resp

numeric vector of binary response values.

Xs

matrix or dataframe of zeros and ones for all predictor variables.

nBSXVars

integer for the number of predictors used to construct each logic regression model. The default value is all predictors in the data.

anneal.params

a list containing the parameters for simulated annealing. See the help file for the function logreg.anneal.control in the LogicReg package. If missing, default annealing parameters are set at start=1, end=-2, and iter=50000.

nBS

number of logic regression trees to be fit in the logic forest model.

h

a number between 0 and 1 for the minimum proportion of trees in the logic forest that must predict a 1 for the prediction to be one.

norm

logical. If FALSE, predictor and interaction scores in model output are not normalized to range between zero and one.

numout

number of predictors and interactions to be included in model output

Value

An object of class "logforest" which is a list including values

AllFits

A list of all logic regression fits in the logic forest model.

Top5.PI

a vector of the 5 interactions with the largest magnitude variable importance score.

Predictor.importance

a vector of importance scores for all predictors that occur in the logic forest.

PI.importance

a vector of importance scores for all interactions that occur in the logic forest.

Predictor.frequency

a vector frequency of predictors occurring in individual logic regression in the logic forest.

PI.frequency

a vector frequency of interactions occurring in individual logic regression in the logic forest.

ModelPI.import

a list on interaction importance measures for each logic regression model in the logic forest.

OOBmisclass

out-of-bag error estimate for the logic forest.

OOBprediction

a matrix. Column one is the out-of-bag prediction for responses in original data. Columns 2 is the proportion of out-of-bag trees that predicted class value to be one.

IBdata

a list of all in-bag data sets for the logic forest model.

OOBdata

a list of all out-of-bag data sets for the logic forest model.

norm

logical. If TRUE the normalized predictor and interaction importance scores are returned.

numout

the number of predictors and interactions (based on the variable importance measure) to be returned by logforest.

predictors

number of predictor variables in the data used to construct the logic forest.

Author(s)

Bethany Wolf wolfb@musc.edu

References

Wolf, B.J., Slate, E.H., Hill, E.G. (2010) Logic Forest: An ensemble classifier for discovering logical combinations of binary markers. Bioinformatics.

See Also

print.logforest, predict.logforest, vimp.plot, submatch.plot, persistence.plot

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
data(LF.data)

#Set using annealing parameters using the logreg.anneal.control 
#function from LogicReg package

newanneal<-logreg.anneal.control(start=1, end=-2, iter=2500)

#typically more than 2500 iterations (iter>25000) would be used for 
#the annealing algorithm.  A typical forest also contains at 
#least 100 trees.  These parameters were set to allow for faster
#run times

#The data set LF.data contains 50 binary predictors and a binary
#response Ybin
LF.fit1<-logforest(resp=LF.data$Ybin, Xs=LF.data[,1:50], nBS=20,
anneal.params=newanneal)
print(LF.fit1)
predict(LF.fit1)

#Changing print parameters
LF.fit2<-logforest(resp=LF.data$Ybin, Xs=LF.data[,1:50], nBS=20,
anneal.params=newanneal, norm=TRUE, numout=10)
print(LF.fit2)

Example output

Loading required package: LogicReg
Loading required package: survival
Loading required package: CircStats
Loading required package: MASS
Loading required package: boot

Attaching package: 'boot'

The following object is masked from 'package:survival':

    aml

Number of logic regression trees = 20

5 most important predictors 

    Top 5 Predictors   Normalized Predictor Importance   Frequency
1   X4                 1                                 18       
2   X5                 0.9966                            19       
3   X10                0.009                             1        
4   X23                0.006                             1        
5   X13                0.003                             1        

5 most important interactions 

    Top 5 Interactions   Normalized Interaction Importance   Frequency
1   X4 & X5              1                                   16       
2   X5                   0.0856                              2        
3   X4                   0.0532                              1        
4   X4 & X5 & !X23       0.0263                              1        
5   X4 & !X33            0.0184                              1        
OOB Predicted values

  [1] 1 1 0 0 0 1 0 0 0 0 0 1 1 0 0 1 0 1 0 1 0 0 0 0 0 1 1 1 1 0 0 0 1 0 0 1 0
 [38] 0 0 0 0 0 0 1 0 0 1 0 1 1 1 1 0 1 0 1 0 1 1 0 1 0 1 0 1 0 1 1 1 1 1 1 1 0
 [75] 1 0 0 1 0 0 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 0 1 0 0 0 0 0 0 1 0 1 1 0 0 0
[112] 0 0 0 0 1 0 0 0 0 0 1 0 1 0 1 1 1 0 0 0 0 0 1 1 1 0 1 1 1 0 0 1 1 1 1 0 0
[149] 1 0 1 0 1 1 1 0 0 0 1 0 1 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 1 0 0 0 0 1 0 1 0
[186] 0 0 0 1 1 1 0 1 1 0 0 1 1 0 0

Proportion of OOB trees that predict 1
  [1] 1.00000000 1.00000000 0.00000000 0.00000000 0.00000000 0.85714286
  [7] 0.10000000 0.00000000 0.00000000 0.00000000 0.00000000 1.00000000
 [13] 1.00000000 0.00000000 0.00000000 1.00000000 0.00000000 1.00000000
 [19] 0.00000000 1.00000000 0.18181818 0.00000000 0.00000000 0.00000000
 [25] 0.00000000 1.00000000 1.00000000 0.85714286 1.00000000 0.00000000
 [31] 0.00000000 0.00000000 1.00000000 0.00000000 0.00000000 1.00000000
 [37] 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
 [43] 0.12500000 1.00000000 0.00000000 0.00000000 1.00000000 0.00000000
 [49] 1.00000000 1.00000000 1.00000000 1.00000000 0.00000000 1.00000000
 [55] 0.00000000 1.00000000 0.00000000 1.00000000 1.00000000 0.00000000
 [61] 1.00000000 0.16666667 1.00000000 0.00000000 0.90000000 0.07142857
 [67] 1.00000000 1.00000000 1.00000000 1.00000000 1.00000000 1.00000000
 [73] 1.00000000 0.00000000 1.00000000 0.00000000 0.00000000 1.00000000
 [79] 0.00000000 0.00000000 1.00000000 0.87500000 0.00000000 1.00000000
 [85] 1.00000000 1.00000000 1.00000000 1.00000000 1.00000000 1.00000000
 [91] 1.00000000 0.00000000 1.00000000 0.80000000 1.00000000 1.00000000
 [97] 0.00000000 1.00000000 0.12500000 0.00000000 0.00000000 0.00000000
[103] 0.00000000 0.00000000 1.00000000 0.00000000 1.00000000 1.00000000
[109] 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
[115] 0.00000000 1.00000000 0.00000000 0.14285714 0.00000000 0.00000000
[121] 0.00000000 1.00000000 0.00000000 1.00000000 0.00000000 1.00000000
[127] 1.00000000 0.90909091 0.00000000 0.00000000 0.00000000 0.00000000
[133] 0.11111111 1.00000000 1.00000000 1.00000000 0.00000000 1.00000000
[139] 1.00000000 1.00000000 0.00000000 0.28571429 1.00000000 1.00000000
[145] 1.00000000 1.00000000 0.00000000 0.00000000 1.00000000 0.00000000
[151] 1.00000000 0.00000000 1.00000000 0.90000000 1.00000000 0.25000000
[157] 0.14285714 0.00000000 0.85714286 0.00000000 1.00000000 0.00000000
[163] 0.00000000 0.00000000 0.16666667 0.00000000 0.00000000 1.00000000
[169] 1.00000000 1.00000000 0.00000000 0.00000000 0.00000000 0.00000000
[175] 0.22222222 0.00000000 1.00000000 0.00000000 0.00000000 0.20000000
[181] 0.00000000 0.87500000 0.00000000 1.00000000 0.00000000 0.00000000
[187] 0.00000000 0.00000000 1.00000000 1.00000000 1.00000000 0.00000000
[193] 1.00000000 1.00000000 0.14285714 0.00000000 1.00000000 1.00000000
[199] 0.00000000 0.00000000
Number of logic regression trees = 20

10 most important predictors 

     Top 10 Predictors   Normalized Predictor Importance   Frequency
1    X4                  1                                 19       
2    X5                  0.9528                            19       
3    X10                 0.0479                            2        
4    X9                  0.029                             2        
5    X1                  0                                 <NA>     
6    X2                  0                                 <NA>     
7    X3                  0                                 <NA>     
8    X6                  0                                 <NA>     
9    X7                  0                                 <NA>     
10   X8                  0                                 <NA>     

10 most important interactions 

     Top 10 Interactions   Normalized Interaction Importance   Frequency
1    X4 & X5               1                                   18       
2    X5                    0.0537                              1        
3    X4                    0.0516                              1        
4    X4 & X10              0.0119                              1        
5    X9 & X10              0.0099                              1        
6    X5 & X9               0.0059                              1        
7    X9 & X10 & X40        0.0041                              1        
8    <NA>                  <NA>                                <NA>     
9    <NA>                  <NA>                                <NA>     
10   <NA>                  <NA>                                <NA>     
Warning message:
system call failed: Cannot allocate memory 

LogicForest documentation built on May 30, 2017, 3:07 a.m.