The mAP-KL algorithm

Description

We first employ the mt.maxT function from the "multtest" package to rank the genes of the training set and then we reserve the top N genes e.g.(N = 200) for further exploitation. Prior to clustering analysis with Affinity Propagation we apply the index of Krzanowski and Lai as included in the "ClusterSim" package to determine the number of clusters solely on the disease samples of the training test set. The final step involves the cluster analysis with the AP clustering method as in the "apcluster" package, which detects n (n = k, the Krzanowski and Lai index) clusters among the top N genes and provides us with a list of the most representative genes of each cluster, the so called exemplars.

Usage

1
2
mAPKL(trObj,classLabels,valObj=NULL,dataType=6,statTest="t",permutations=1000,
features=200,minClusters=2,maxClusters=50,FC="limma",bimaxit=50,r=2)

Arguments

trObj

The train eSet object.

classLabels

The varLabels name in the eSet object where the class labels are stored e.g "type".

valObj

The validation eSet object (if not NULL).

dataType

The type of the data e.g 6-ratio data without normalization and 7-interval or mixed (ratio & interval) data without normalization as described in "clusterSim" package.

statTest

The statistical test applied to the geneIntensities. The available tests described in mt.maxT documentation in "multtest" package.

permutations

The number of permutations.

features

The top N genes to be kept.

minClusters

The minimum number of clusters that can be identified.

maxClusters

The maximum number of clusters that can be identified.

FC

The Fold Change of the exemplars according to "Limma" (default). Alternatively the "SAM" approach may be computed.

bimaxit

The maximum number of bisection steps performed by the AP algorithm. The (default) value is "50".

r

The argument r is used to transform the resulting distances by computing the r-th power. To obtain negative squared distances as in Frey's and Dueck's(use r=2 as default).

Value

rankedIntens

The top N ranked genes with their intensity values

exemplTrain

The intensity values of the exemplars in the training set

exemplTest

The intensity values of the exemplars in the validation set if not NULL

statistic

A list with the overall results of the "mt.maxT" analysis

adjp

The adjusted p-values according to the statistical analysis

pVal

The raw p-values according to the statistical analysis

fc

The Fold Change of the exemplars

exemplars

The selected "significant" probe ids/genes

clusters

The probe ids/genes per cluster

Author(s)

Argiris Sakellariou

References

A. Sakellariou, D. Sanoudou, and G. Spyrou, "Combining multiple hypothesis testing and affinity propagation clustering leads to accurate, robust and sample size independent classification on gene expression data," BMC Bioinformatics, vol. 13, p. 270, 2012.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
## Using separate train-test samples
## Load the necessary files based on Breast cancer data as included in the
## package mAPKLData

library(mAPKLData)
data(mAPKLData)
breast <- sampling(Data=mAPKLData, valPercent=40, classLabels="type", seed=135)
normTrainData <- preprocess(breast$trainData)
normTestData <- preprocess(breast$testData)

exprs(breast$trainData) <- normTrainData$clL2.normdata
exprs(breast$testData) <- normTestData$clL2.normdata

out.clL2 <- mAPKL(trObj=breast$trainData, classLabels="type",
valObj=breast$testData, dataType=7)