show3d: make forestFloor 3D-plot of random forest feature...

Description Usage Arguments Details Value Author(s) Examples

View source: R/show3d.R

Description

2 features features(horizontal XY-plane) and one combined feature contribution (vertical Z-axis). Surface response layer will be estimated(kknn package) and plotted alongside the data points. 3D graphic device is rgl. Will dispatch methods show3d.forestFloor_regression for regression and show3d_forestFloor_multiClass for classification.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
## S3 method for class 'forestFloor_regression'
 show3d(
      x,
      Xi  = 1:2,
      FCi = NULL,
      col = "#12345678",
      plotTest = NULL,
      orderByImportance = TRUE,
      surface=TRUE,   
      combineFC = sum,  
      zoom=1.2,       
      grid.lines=30,  
      limit=3, 
      cropPointsOutSideLimit = TRUE,
      kknnGrid.args = alist(),  
      plot.rgl.args = alist(),  
      surf.rgl.args = alist(),
      user.gof.args = alist(),
      plot_GOF = TRUE,
      ...)

## S3 method for class 'forestFloor_multiClass'
show3d(
      x,
      Xi,
      FCi=NULL,
      plotTest = NULL,
      label.seq=NULL,
      kknnGrid.args=list(NULL),
      plot.rgl.args=list(),
      plot_GOF=FALSE,
      user.gof.args=list(NULL),
      ...)
    

Arguments

x

forestFloor" class object

Xi

integer vector of length 2 indices of feature columns

FCi

integer vector of length 1 to p variables indices of feature contributions columns

col

a colour vector. One colour or colour palette(vector).

plotTest

NULL(plot by test set if available), TRUE(plot by test set), FALSE(plot by train), "andTrain"(plot by both test and train)

orderByImportance

should indices order by 'variable importance' or by matrix/data.frame order?

surface

should a surface be plotted also?

combineFC

a row function applied on selected columns(FCi) on $FCmatrix or $FCarray. How should feature contributions be combined? Default is sum.

zoom

grid can be expanded in all directions by a factor

grid.lines

how many grid lines should be used. Total surface anchor points in plot is grid.lines^2. May run slow above 200-500 depending on hardware.

limit

a number. Sizing of grid does not consider outliers outside this limit of e.g. 3 SD deviations univariately.

cropPointsOutSideLimit

#if points exceed standard deviation limit, they will not be plotted

kknnGrid.args

argument list, any possible arguments to kknnkknn
These default wrapper arguments can hereby be overwritten:
wrapper = alist( formula=fc~., # do not change
train=Data, # do not change
k=k, # integer < n_observations. k>100 may run slow.
kernel="gaussian", #distance kernel, other is e.g. kernel="triangular"
test=gridX #do not change
)
see kknnkknn to understand parameters. k is set by default automatically to a half times the square root of observations, which often gives a reasonable balance between robustness and adeptness. k neighbors and distance kernel can be changed be passing kknnGrid.args = alist(k=5,kernel="triangular",scale=FALSE), hereby will default k and default kernel be overwritten. Moreover the scale argument was not specified by this wrapper and therefore not conflicting, the argument is simply appended.

plot.rgl.args

pass argument to rgl::plot3d, can override any argument of this wrapper, defines plotting space and plot points. See plot3d for documentation of graphical arguments.

wrapper_arg = alist( x=xaxis, #do not change, x coordinates
y=yaxis, #do not change, y coordinates
z=zaxis, #do not change, z coordinates
col=col, #colouring evaluated within this wrapper function
xlab=names(X)[1], #xlab, label for x axis
ylab=names(X)[2], #ylab, label for y axis
zlab=paste(names(X[,FCi]),collapse=" - "), #zlab, label for z axis
alpha=.4, #points transparency
size=3, #point size
scale=.7, #z axis scaling
avoidFreeType = T, #disable freeType=T plug-in. (Postscript labels)
add=FALSE #do not change, should graphics be added to other rgl-plot?
)

surf.rgl.args

wrapper_arg = alist( x=unique(grid[,2]), #do not change, values of x-axis
y=unique(grid[,3]), #do not change, values of y-axis
z=grid[,1], #do not change, response surface values
add=TRUE, #do not change, surface added to plotted points
alpha=0.4 #transparency of surface, [0;1]
)
see rgl::persp3d for other graphical arguments notice the surface is added onto plotting of points, thus can e.g. labels not be changed from here.

label.seq

a numeric vector describing which classes and in what sequence to plot. NULL is all classes ordered is in levels in x$Y of forestFloor_mulitClass object x.

user.gof.args

argument list passed to internal function convolute_ff2, which can modify how goodness-of-fit is computed. Number of neighbors and kernel can be set manually with e.g. list(kmax=40,kernel="gaussian"). Default pars should work already in most cases. Function convolute_ff2 computed leave-one-out CV prediction the feature contributions from the chosen context of the visualization.

plot_GOF

Boolean TRUE/FALSE. Should the goodness of fit be computed and plotted is main of 3D plot? If false, no GOF input pars are useful.

...

not used at the moment

Details

show3d plot one or more combined feature contributions in the context of two features with points representing each data point. The input object must be a "forestFloor_regression" or "forestFloor_multiClass" S3 class object , and should at least contain $X the data.frame of training data, $FCmatrix the feature contributions matrix. Usually this object are formed with the function forestFloor having a random forest model fit as input. Actual visualization differs for each class.

Value

no value

Author(s)

Soren Havelund Welling

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
## Not run: 
## avoid testing of rgl 3D plot on headless non-windows OS
## users can disregard this sentence.
if(!interactive() && Sys.info()["sysname"]!="Windows") skipRGL=TRUE

library(forestFloor)
library(randomForest)
#simulate data
obs=2500
vars = 6 

X = data.frame(replicate(vars,rnorm(obs)))
Y = with(X, X1^2 + sin(X2*pi) + 2 * X3 * X4 + 1 * rnorm(obs))


#grow a forest, remeber to include inbag
rfo=randomForest(X,Y,keep.inbag = TRUE,sampsize=1500,ntree=500)

#compute topology
ff = forestFloor(rfo,X)


#print forestFloor
print(ff) 

#plot partial functions of most important variables first
plot(ff) 

#Non interacting functions are well displayed, whereas X3 and X4 are not
#by applying different colourgradient, interactions reveal themself 
Col = fcol(ff,3)
plot(ff,col=Col) 

#in 3D the interaction between X3 and X reveals itself completely
show3d(ff,3:4,col=Col,plot.rgl=list(size=5)) 

#although no interaction, a joined additive effect of X1 and X2
Col = fcol(ff,1:2,X.m=FALSE,RGB=TRUE) #colour by FC-component FC1 and FC2 summed
plot(ff,col=Col) 
show3d(ff,1:2,col=Col,plot.rgl=list(size=5)) 

#...or two-way gradient is formed from FC-component X1 and X2.
Col = fcol(ff,1:2,X.matrix=TRUE,alpha=0.8) 
plot(ff,col=Col) 
show3d(ff,1:2,col=Col,plot.rgl=list(size=5))

## End(Not run)

sorhawell/forestFloor documentation built on Oct. 23, 2021, 2:20 a.m.