Description Usage Arguments Details Value Author(s) Examples
View source: R/convolute_ff2.R
Low-level function to estimate a selected combination feature contributions as function of selected features with leave-one-out k-nearest neighbor.
1 2 3 4 5 |
ff |
forestFloor object class "forestFloor_regression" or "forestFloor_multiClass" consisting of at least ff$X and ff$FCmatrix with two matrices of equal size |
Xi |
integer vector, of column indices of ff$X to estimate by. |
FCi |
integer vector, column indices of features contributions in ff$FCmatrix to estimate. If more than one , these columns will be summed by samples/rows. If NULL then FCi will match Xi. |
k.fun |
function to define k-neighbors to consider. n.obs is a constant as number of observations in ff$X. Hereby k neighbors is defined as a function k.fun of n.obs. To set k to a constant use e.g. k.fun = function() 10. k can also be overridden with userArgs.kknn = alist(kernel="Gaussian",kmax=10). |
userArgs.kknn |
argument list passed to train.kknn function for each convolution, see |
convolute_ff2 is a wrapper of train.kknn
to estimate feature contributions by a set of features.
This function is e.g. used to estimate the visualized surface layer in show3d
function. LOO CV is used to quantify how much of a feature contribution variation can by explained by a given surface. Can in theory also be used to quantify higher dimensional interaction effects, but randomForest do not learn much 3rd order (or higher) interactions. Do not support orderByImportance, thus Xi and FCi points to column order of training matrix X.
an numeric vector with one estimated feature contribution for any observation
Soren Havelund Welling
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 | ## Not run:
library(forestFloor)
library(randomForest)
library(rgl)
#simulate data
obs=2500
vars = 6
X = data.frame(replicate(vars,rnorm(obs)))
Y = with(X, X1^2 + 2*sin(X2*pi) + 8 * X3 * X4)
Yerror = 15 * rnorm(obs)
cor(Y,Y+Yerror)^2 #relatively noisy system
Y= Y+Yerror
#grow a forest, remeber to include inbag
rfo=randomForest(X,Y,keep.inbag=TRUE,ntree=1000,sampsize=800)
#obtain
ff = forestFloor(rfo,X)
#convolute the interacting feature contributions by their feature to understand relationship
fc34_convoluted = convolute_ff2(ff,Xi=3:4,FCi=3:4, #arguments for the wrapper
userArgs.kknn = alist(kernel="gaussian",k=25)) #arguments for train.kknn
#plot the joined convolution
plot3d(ff$X[,3],ff$X[,4],fc34_convoluted,
main="convolution of two feature contributions by their own vaiables",
#add some colour gradients to ease visualization
#box.outliers squese all observations in a 2 std.dev box
#univariately for a vector or matrix and normalize to [0;1]
col=rgb(.7*box.outliers(fc34_convoluted),
.7*box.outliers(ff$X[,3]),
.7*box.outliers(ff$X[,4]))
)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.