mAPKL: The mAP-KL algorithm
In mAPKL: A Hybrid Feature Selection method for gene expression data

Description Usage Arguments Value Author(s) References Examples

We first employ the mt.maxT function from the "multtest" package to rank the genes of the training set and then we reserve the top N genes e.g.(N = 200) for further exploitation. Prior to clustering analysis with Affinity Propagation we apply the index of Krzanowski and Lai as included in the "ClusterSim" package to determine the number of clusters solely on the disease samples of the training test set. The final step involves the cluster analysis with the AP clustering method as in the "apcluster" package, which detects n (n = k, the Krzanowski and Lai index) clusters among the top N genes and provides us with a list of the most representative genes of each cluster, the so called exemplars.

1 2	mAPKL(trObj,classLabels,valObj=NULL,dataType=6,statTest="t",permutations=1000, features=200,minClusters=2,maxClusters=50,FC="limma",bimaxit=50,r=2)

`trObj`	The train eSet object.
`classLabels`	The varLabels name in the eSet object where the class labels are stored e.g "type".
`valObj`	The validation eSet object (if not NULL).
`dataType`	The type of the data e.g 6-ratio data without normalization and 7-interval or mixed (ratio & interval) data without normalization as described in "clusterSim" package.
`statTest`	The statistical test applied to the geneIntensities. The available tests described in mt.maxT documentation in "multtest" package.
`permutations`	The number of permutations.
`features`	The top N genes to be kept.
`minClusters`	The minimum number of clusters that can be identified.
`maxClusters`	The maximum number of clusters that can be identified.
`FC`	The Fold Change of the exemplars according to "Limma" (default). Alternatively the "SAM" approach may be computed.
`bimaxit`	The maximum number of bisection steps performed by the AP algorithm. The (default) value is "50".
`r`	The argument r is used to transform the resulting distances by computing the r-th power. To obtain negative squared distances as in Frey's and Dueck's(use r=2 as default).

`rankedIntens`	The top N ranked genes with their intensity values
`exemplTrain`	The intensity values of the exemplars in the training set
`exemplTest`	The intensity values of the exemplars in the validation set if not NULL
`statistic`	A list with the overall results of the "mt.maxT" analysis
`adjp`	The adjusted p-values according to the statistical analysis
`pVal`	The raw p-values according to the statistical analysis
`fc`	The Fold Change of the exemplars
`exemplars`	The selected "significant" probe ids/genes
`clusters`	The probe ids/genes per cluster

Argiris Sakellariou

A. Sakellariou, D. Sanoudou, and G. Spyrou, "Combining multiple hypothesis testing and affinity propagation clustering leads to accurate, robust and sample size independent classification on gene expression data," BMC Bioinformatics, vol. 13, p. 270, 2012.

## Using separate train-test samples
## Load the necessary files based on Breast cancer data as included in the
## package mAPKLData

library(mAPKLData)
data(mAPKLData)
breast <- sampling(Data=mAPKLData, valPercent=40, classLabels="type", seed=135)
normTrainData <- preprocess(breast$trainData)
normTestData <- preprocess(breast$testData)

exprs(breast$trainData) <- normTrainData$clL2.normdata
exprs(breast$testData) <- normTestData$clL2.normdata

out.clL2 <- mAPKL(trObj=breast$trainData, classLabels="type",
valObj=breast$testData, dataType=7)