A function to calculate the proximity matrix.

Description

This function computes the proximity matrix by Random Forest algorithm. Proximity values ranges from 0 (least similar) to 1 (perfect match).

Usage

1
2
Proximity(train, train.label, test = NULL, N = 50, 
         Parallel = FALSE, ncpus = 2)

Arguments

train

An object of class ExpressionSet or data frame or matrix contains the predictors for the training set, where columns correspond to samples and rows to features.

train.label

A vector of actual class labels (0 or 1) of the training set. Should be numeric not factor.

test

An object of class ExpressionSet or data frame or matrix of containing predictors for the test set, where columns correspond to samples and rows to features.

N

Number of repetition for calculating the proximity matrix, final proximity matrix is average of these repeats. We recommend to set a large number, so that stable proximity matrix will be produced. Default is 50.

Parallel

Should proximity calculation use the parallel processing procedure? Default is FALSE.

ncpus

Number of acores assign to the parallel computation. Default is 2.

Value

A list object with following components:

prox.train

A square symmetric matrix contains the proximity values of the training set .

prox.test

A rectangular square matrix contains the proximity values between test set (rows) and training set (columns). Only returned when test set is supplied.

Author(s)

Askar Obulkasim

Maintainer: Askar Obulkasim <askar703@gmail.com>

References

Breiman, L. (2001), Random Forest, 45, 5-32.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
data(CNS)
train <- t(CNS$cli[1:40,])
test <- t(CNS$cli[41:60,])
train.label <- CNS$class[1:40]
##without parallel processing procedure
Prox <- Proximity(train, train.label, test, N = 2)
##with parallel processing procedure
## Not run: Prox <- Proximity(train, train.label, test,  
                N = 50, Parallel = TRUE, ncpus = 10)
## End(Not run)