partitionMap: Partition Maps

Description Usage Arguments Value Author(s) References Examples

Description

using Random Forest multiclass output, embed observations in low-dimensional space

Usage

1
2
3
partitionMap(X, Y, XTEST = NULL, YTEST = NULL,  method = "pm", dimen = 2, 
                   force = TRUE, ntree = 100,
                   plottrain = TRUE, addjitter = 0.03, ...)

Arguments

X

matrix with predictor variables in the training dataset

Y

response variable, a factor with multiple classes

XTEST

The matrix of predictor variables for the test dataset (optional)

YTEST

Class labels of test observations, used for coloring the test embeddings in the plot. If not supplied, test observations are shwon in grey (optional)

method

pm for "partitonMap" and ha for "Homogeneity Analysis"

dimen

dimension of embedding, typically 2 or 3

force

use force-based variation of "partitionMap" algorithm? no effect if method="ha"

ntree

number of trees to use for randomForest prediction

plottrain

plot embedding for training data?

addjitter

amount if jitter to add to the plots to avoid overlapping observations (set addjitter=0 for no jitter)

...

other arguments to be passed to randomForest

Value

A list with values

Samples

low-dimensional co-ordinates of embedded training samples

Rules

low-dimensional co-ordinates of embedded Rules (nodes in the trees)

Z

a binary matrix, with as many rows as training samples and as many columns as rules. a value 1 in row i and column j indicates that observation i is part of rule j

Samplestest

low-dimensional co-ordinates of embedded test samples

Ztest

a binary matrix, with as many rows as test samples and as many columns as rules. a value 1 in row i and column j indicates that observation i in the test data is part of rule j

rf

the trained Random Forest classifier

Author(s)

Nicolai Meinshausen <meinshausen@stats.ox.ac.uk>

References

Nicolai Meinshausen (2011)

Partition Maps

JCGS 20(4), 1007-1028

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
	
##---- load Soybean data ----
	data(Soybean)
	X <- Soybean[,-1]
	Y <- Soybean$Y 
	
##---- divide into training and test data ----
	indtrain <- rep(0,nrow(X))
	indtrain[sample(1:length(indtrain), ceiling(nrow(X)/3*2))] <- 1
	XTEST <- X[indtrain==0,]
	YTEST <- Y[indtrain==0]
	X <- X[indtrain==1,]
	Y <- Y[indtrain==1]


##---- compute Partition Map solution ----
	pm <- partitionMap(X,Y,XTEST=XTEST,method="pm",force=TRUE,
                                dimen=2,ntree=80,plottrain=TRUE)


##---- plot the embedded training and test samples ----
	par(mfrow=c(1,3))
	plot(pm$Samples,col=Y,pch=20,cex=1.5,main="Training Data",
                                    xlab="Dimension 1",ylab="Dimension 2")
	points(pm$Rules,pch=".")
	plot(pm$Samplestest,col=YTEST,pch=20,cex=1.5,main="Test Data",
                                     xlab="Dimension 1",ylab="Dimension 2")
	points(pm$Rules,pch=".")
	plot(pm$Samples,col=Y,pch=20,cex=1.5,xlab="",ylab="",type="n",axes=FALSE)
	legend(quantile(pm$Samples[,1],0),quantile(pm$Samples[,2],1),unique(Y),
                              col=1:length(unique(Y)),fill=1:length(unique(Y)),border=0)
	par(mfrow=c(1,1))

partitionMap documentation built on May 2, 2019, 2:43 a.m.