# recursiveTree: cross-validated feature contributions

### Description

internal C++ function to compute feature contributions for a random Forest

### Usage

1 2 3 | ```
recTree(vars, obs, ntree, calculate_node_pred, X, Y, leftDaughter,
rightDaughter, nodestatus, xbestsplit, nodepred, bestvar,
inbag, varLevels, OOBtimes, localIncrements)
``` |

### Arguments

`vars` |
number of variables in X |

`obs` |
number of observations in X |

`ntree` |
number of trees starting from 1 function should iterate, cannot be higher than columns of inbag |

`calculate_node_pred` |
should the node predictions be recalculated(true) or reused from nodepred-matrix(false & regression) |

`X` |
X training matrix |

`Y` |
target vector, factor or regression |

`leftDaughter` |
a matrix from a the output of randomForest rfo$forest$leftDaughter the node.number/row.number of the leftDaughter in a given tree by column |

`rightDaughter` |
a matrix from a the output of randomForest rfo$forest$rightDaughter the node.number/row.number of the rightDaughter in a given tree by column |

`nodestatus` |
a matrix from a the output of randomForest rfo$forest$nodestatus the nodestatus of a given node in a given tree |

`xbestsplit` |
a matrix from a the output of randomForest rfo$forest$xbestsplit |

`nodepred` |
a matrix from a the output of randomForest rfo$forest$xbestsplit the inbag target average for regression mode and the majority target class for classification |

`bestvar` |
a matrix from a the output of randomForest rfo$forest$xbestsplit the inbag target average for regression mode and the majority target class for classification |

`inbag` |
a matrix from the output of randomForest rfo$inbag for regression |

`varLevels` |
the number of levels of all varibles, 1 for continous and multinomal, >1 forcategorical variables. This is needed for categorical variables to interpretate binary split from xbestsplit. |

`OOBtimes` |
number of times a certain observation was out of bag in the forest. Needed to compute feature contributions as they are the sum local increments over out-of-bag obseravations over features divided by the OOBtimes. In previous implementation featurecontributions is summed all observations and is divived by ntrees. |

`localIncrements` |
an empty matrix to store localIncrements during computation. In the end the localIncrement matrix will become the feature contributions. |

### Details

This is function is excuted by the function forestFloor.

This is a c++/Rcpp implementation computing feature contributions. The main differences from this implementation and the rfFC-package, is that these feature contributions is only summed over out-of-bag samples which give some kind of cross-validation. This implementation allows sample replacement but do not support more than binaray classification as rfFC do.

### Value

no output, the feature contributions are writtten directly to localIncrements input

### Author(s)

Soren Havelund Welling

### References

Interpretation of QSAR Models Based on Random Forest Methods, http://dx.doi.org/10.1002/minf.201000173

Interpreting random forest classification models using a feature contribution method, http://arxiv.org/abs/1312.1121

### Examples

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | ```
## Not run:
rm(list=ls())
library(forestFloor)
#simulate data
obs=2500
vars = 6
X = data.frame(replicate(vars,rnorm(obs)))
Y = with(X, X1^2 + sin(X2*pi) + 2 * X3 * X4 + 1 * rnorm(obs))
#grow a forest, remeber to include inbag
rfo=randomForest(X,Y,keep.inbag = TRUE,sampsize=1500,ntree=500)
#compute topology, Rectree is excuted within forestFloor.
#See source-code of forestFloor function to for more details.
ff = forestFloor(rfo,X)
#print forestFloor
print(ff)
#plot partial functions of most important variables first
plot(ff)
## End(Not run)
``` |