Predict grid topology by convolution

Description

n-dimensionl grid wrapper of kknn (not train.kknn). Predicts a grid on the basis of convolution of feature contributions. Can be used to construct one 2D surface in a 3D plot(see show3d example), or to construct multiple 2D slices of a 3D surface in 4D plot (see show4d).

Usage

1
2
3
4
5
6
7
8
convolute_grid           (ff,
                          Xvars,
                          FCvars = NULL,
                          grid = 30,
                          limit = 3,
                          zoom = 3,
                          k.fun=function() round(sqrt(n.obs)/2),
                          userArgs.kknn = alist(kernel="gaussian") )

Arguments

ff

forestFloor object(class="forestFloor") concisting of at least ff$X and ff$FCmatrix with two matrices of equal size

Xvars

integer vector, of col indices of ff$X to convolute by, often of length 2 or 3. Note total number of predictions is a equal grid^"length of this vector". So computation and visualization might be tough.

FCvars

integer vector, of col indices of ff$FCmatrix. Those feature contributions to conbine(sum) and convolute. if none provided will copy Xvars vector, which is the trivial choice.

grid

Either, an integer describing the number of grid.lines in each dimension(trivial choice) or, a full defined matrix of any grid position as defined by this function. If ladder, this function will defining positions of a grid and use the provided one.

limit

numeric scalar, number of stadard deviations away from mean by any dimension to disregard outliers when spanning observations with grid. Set to limit=Inf outliers never should be disregarded.

zoom

numeric scaler, the size of the grid compared to the univariate range of data. If zoom=2 the grid will by any dimension span the double range of the observations. Outliers are disregarded with limit argument.

k.fun

function to define k-neighbors to concider. n.obs is a constant as number of observations in ff$X. Hereby k neighbors is defined as a function k.fun of n.obs. To set k to a constant use e.g. k.fun = function() 10. k can also be overridden with userArgs.kknn = alist(kernel="gaussian",kmax=10).

userArgs.kknn

argument list to pass to train.kknn function for each convolution. See (link) kknn.args. arguments in this list have priority of any passed by default by this wrapper function. see argument merger append.overwrite.alists

Details

This function predicts a grid with kknn which is kNearest neighbor + gaussian weighting. This wrapper can be used to systematically construct a surface of two feature contributions or a structure of three feature contributions on the basis of the convolution process feature contributions by features. This function is a little experimental for now, and I don't really know how th phrase this right :)

Value

a data frame, 1 + Xvariable coloumns. First column is the predicted summed feature contributions as a function of the following columns feature coordinates.

Author(s)

Soren Havelund Welling

Examples

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
 
## Not run: 
#simulate data
obs=5000
vars = 6 
X = data.frame(replicate(vars,runif(obs)))*2-1
Y = with(X, X1*2 + 2*sin(X2*pi) + 3* (X3+X2)^2 )
Yerror = 1 * rnorm(obs)
var(Y)/var(Y+Yerror)
Y= Y+Yerror

#grow a forest, remember to include inbag
rfo=randomForest::randomForest(X,Y,
                               keep.inbag=TRUE,
                               ntree=1000,
                               replace=TRUE,
                               sampsize=1500,
                               importance=TRUE)

#compute topology
ff = forestFloor(rfo,X)

#print forestFloor
print(ff) 

#plot partial functions of most important variables first
Col=fcol(ff,1)
plot(ff,col=Col,order_by_importance=TRUE) 


#the pure feature contributions
plot3d::rgl(ff$X[,2],ff$X[,3],apply(ff$FCmatrix[,2:3],1,sum),
       #add some colour gradients to ease visualization
       #box.outliers squese all observations in a 2 std.dev box
       #univariately for a vector or matrix and normalize to [0;1]
       col=fcol(ff,2,orderByImportance=FALSE))

#add grid convolution/interpolation
#make grid with current function
grid23 = convolute_grid(ff,Xvars=2:3,userArgs.kknn= alist(k=25,kernel="gaus"),grid=50,zoom=1.2)
#apply grid on 3d-plot
persp3d(unique(grid23[,2]),unique(grid23[,3]),grid23[,1],alpha=0.3,col=c("black","grey"),add=TRUE)
#anchor points of grid could be plotted also
plot3d(grid23[,2],grid23[,3],grid23[,1],alpha=0.3,col=c("black"),add=TRUE)

## and we se that their is almost no variance out of the surface, thus is FC2 and FC3
## well explained by the feature context of both X3 and X4

### next example show how to plot a 3D grid + feature contribution
## this 4D application is very experimental 

#Make grid of three effects, 25^3 = 15625 anchor points
grid123 = convolute_grid(ff,
                         Xvars=c(1:3),
                         FCvars=c(1:3),
                         userArgs.kknn = alist(
                           k= 100,
                           kernel = "gaussian",
                           distance = 1),
                         grid=25,
                         zoom=1.2)

#Select a dimension to place in layers
uni2 = unique(grid123[,2])  #2 points to X1 and FC1
uni2=uni2[c(7,9,11,13,14,16,18)] #select some layers to visualize

## plotting any combination of X2 X3 in each layer(from red to green) having different value of X1
count = 0
add=FALSE
for(i in uni2) {
  count = count +1 
  this34.plane = grid123[grid123[,2]==i,]
  if (count==2) add=TRUE 
  
  #  plot3d(ff$X[,1],ff$X[,2]
  persp3d(unique(this34.plane[,3]),
          unique(this34.plane[,4]),
          this34.plane[,1], add=add, col=rgb(count/length(uni2),1-count/length(uni2),0),alpha=0.1)
}



## plotting any combination of X1 X3 in each layer(from red to green) having different value of X2
uni3 = unique(grid123[,4])  #2 points to X1 and FC1
uni3=uni3[c(7,9,11,13,14,16,18)] #select some layers to visualize
count = 0
add=FALSE
for(i in uni3) {
  count = count +1 
  this34.plane = grid123[grid123[,4]==i,]
  if (count==2) add=TRUE
  
  #plot3d(ff$X[,1],ff$X[,2])
  persp3d(unique(this34.plane[,2]),
          unique(this34.plane[,3]),
          this34.plane[,1], add=add,
          col=rgb(count/length(uni3),1-count/length(uni3),0),alpha=0.1)
} 

## End(Not run)