Description Usage Arguments Details Author(s) Examples
A method to plot an object of forestFloor-class. Plot partial feature contributions of the most important variables. Colour gradients can be applied two show possible interactions.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 | ## S3 method for class 'forestFloor_regression'
plot(
x,
plot_seq=NULL,
limitY=TRUE,
order_by_importance=TRUE,
cropXaxes=NULL,
crop_limit=4,
plot_GOF = FALSE,
GOF_col = "#33333399",
speedup_GOF = TRUE,
...)
## S3 method for class 'forestFloor_multiClass'
plot(
x,
plot_seq = NULL,
label.seq = NULL,
limitY = TRUE,
colLists = NULL,
order_by_importance = TRUE,
fig.columns = NULL,
plot_GOF = FALSE,
GOF_col = NULL,
speedup_GOF = TRUE,
jitter_these_cols = NULL,
jitter.factor = NULL,
compute_GOF = F,
...)
|
x |
foretFloor-object, also abbrivated ff..
Computed topology of randomForest-model, the output from the forestFloor function |
plot_seq |
a numeric vector describing which variables and in what sequence to plot, ordered by importance as default, order_by_importance = F then by feature/coloumn order of training data. |
label.seq |
a numeric vector describing which classes and in what sequence to plot. NULL is all classes ordered is in levels in x$Y of forestFloor_mulitClass object x. |
fig.columns |
for multi plotting, how many columns per page. default(NULL) is 1 for one plot, 2 for 2, 3 for 3, 2 for 4 and 3 for more. |
limitY |
TRUE/FLASE, constrain all Yaxis to same limits to ensure relevance of low importance features is not overinterpreted |
colLists |
List of colour vectors of label.seq length. Each element is a colour vector colouring sample class prediction of one class. Vectors should either be of length 1 with one colour for class predictions or of length equal to number of training observations designating colours for all samples. NULL will choose standard one colour per class. |
jitter_these_cols |
vector to apply jitter to x-axis in plots. Will refer to variables. Useful to for categorical variables. Default=NULL is no jitter. |
jitter.factor |
value to decide howmuch jitter to apply. often between .5 and 3 |
compute_GOF |
Booleen TRUE/FALSE. Should the goodness of fit be computed? If false, no GOF input pars are useful. |
order_by_importance |
TRUE / FALSE should plotting and plot_seq be ordered after importance. Most important feature plot first(TRUE) |
cropXaxes |
a vector of indice numbers of which zooming of x.axis should look away from outliers |
crop_limit |
a number often between 1.5 and 5, referring limit in std.devs from the mean defining outliers if limit = 2, above selected plots will zoom to +/- 2 std.dev of the respective features. |
plot_GOF |
Booleen TRUE/FALSE. Should the goodness of fit be plotted as a line? |
GOF_col |
Color of plotted GOF line |
speedup_GOF |
Should GOF only computed on reasonable subsample of data set to speedup computation. GOF estimation leave-one-out-kNN becomes increasingly slow for +1500 samples. |
... |
... other arguments passed to generic plot functions |
The method plot.forestFloor visualizes partial plots of the most important variables first. Partial dependence plots are available in the randomForest package. But such plots are single lines(1d-slices) and do not answer the question:
Is this partial function(PF) a fair generalization or subject to global or local interactions.
Soren Havelund Welling
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 | ## Not run:
#Regression example:
#simulate data
obs=1000
vars = 6
X = data.frame(replicate(vars,rnorm(obs)))
Y = with(X, X1^2 + sin(X2*pi) + 2 * X3 * X4 + 0.5 * rnorm(obs))
#grow a forest, remeber to include inbag
rfo=randomForest::randomForest(X,Y,keep.inbag=TRUE)
#compute topology
ff = forestFloor(rfo,X)
#print forestFloor
print(ff)
#plot partial functions of most important variables first
plot(ff,order_by_importance=TRUE)
#Non interacting functions are well displayed, whereas X3 and X4 are not
#by applying different colourgradient, interactions reveal themself
#also a k-nearest neighbor fit is applied to evaluate goodness of fit
Col=fcol(ff,3,orderByImportance=FALSE)
plot(ff,col=Col,plot_GOF=TRUE)
#if needed, k-nearest neighbor parameters for goodness-of-fit can be access through convolute_ff
#a new fit will be calculated and added to forstFloor object as ff$FCfit
ff = convolute_ff(ff,userArgs.kknn=alist(kernel="epanechnikov",kmax=5))
plot(ff,col=Col,plot_GOF=TRUE)
#Classification example:
library(randomForest)
library(forestFloor)
require(utils)
data(iris)
iris
X = iris[,!names(iris)
Y = iris[,"Species"]
as.numeric(Y)
rf = randomForest(X,Y,keep.forest=T,replace=F,keep.inbag=T)
ff = forestFloor(rf,X)
pred = sapply(1:3,function(i) apply(ff$FCarray[,,i],1,sum))+1/3
rfPred = predict(rf,type="vote",norm.votes=T)
rfPred[is.nan(rfPred)] = 1/3
if(cor(as.vector(rfPred),as.vector(pred))^2<0.99) stop("fail testMultiClass")
attributes(ff)
plot(ff)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.