nodeHarvest: Node Harvest

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/nodeHarvest.R

Description

Computes the node harvest estimator

Usage

1
2
3
4
5
6
7
8
9
nodeHarvest(X, Y, nodesize = 10, 
	       	  nodes = 1000, 
		  maxinter = 2, 
		  mode = "mean", 
		  lambda = Inf, 
		  addto = NULL, 
		  onlyinter = NULL, 
		  silent = FALSE, 
		  biascorr = FALSE)

Arguments

X

A n x p - dimensional data matrix, where n is sample size and p is the dimensionality of the predictor variable. Factorial variables are currently converted to numerical variables (will be changed in the future). Missing values are supported.

Y

A numerical vector of length n, containing the observations of the response variable. Can be continuous (regression) or binary 0/1 (classification).

nodesize

Minimal number of samples in each node.

nodes

Number of nodes in the initial large ensemble of nodes.

maxinter

Maximal interaction depth (1 = only main effects; 2 = two-factor interactions etc).

mode

If mode is equal to "mean", predictions are weighted group means. If equal to "outbag" (experimental version), the diagonal elements of the smoothing matrix are set to 0.

lambda

Optional upper bound on the inverse of the average weighted fraction of samples within each node.

addto

A previous node harvest estimator to which additional nodes should be attached (useful for iterative growth of the estimator when hitting memory constraints).

onlyinter

Allow interactions only for this list of variables.

silent

If TRUE, no comments are printed.

biascorr

Use bias correction? Experimental. Can be useful for high signal-to-noise ratio data.

Details

The number of nodes should be chosen as large as possible under the available computational resources. If these resources are limited, an estimator can be build by iteratively calling the function, adding the previous estimator via the addto argument.

Feedback and feature requests are more than welcome (email below).

Value

A list with entries

nodes

A list of all selected nodes

predicted

Predicted values on training data

connection

Connectivity matrix between selected nodes (used for plotting)

varnames

Variable names

Y

The original observations

Author(s)

Nicolai Meinshausen meinshausen@stats.ox.ac.uk

http://www.stats.ox.ac.uk/~meinshau

References

Node harvest: simple and interpretable regression and classification' (arxiv:0910.2145)

http://arxiv.org/abs/0910.2145

See Also

predict.nodeHarvest, plot.nodeHarvest

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
## Load Boston Housing dataset
    data(BostonHousing)
    X <- BostonHousing[,1:13]
    Y <- BostonHousing[,14]

## Divide data into training and test data
    n <- nrow(X)
    training <- sample(1:n,round(n/2))
    testing <- (1:n)[-training]

    
## Train Node Harvest and plot and print the estimator
    NH <- nodeHarvest( X[training,], Y[training], nodes=500 )
    plot(NH)
    print(NH, nonodes=6)	
    
## Predict on test data and explain prediction of the first sample in the test set
    predicttest <- predict(NH, X[testing,], explain=1)
    plot( predicttest, Y[testing] )

nodeHarvest documentation built on May 2, 2019, 2:45 a.m.