SWAP.Train.KTSP: Function for training the K-TSP classifier.

Description Usage Arguments Value Author(s) References See Also Examples

Description

SWAP.Train.KTSP trains a binary K-TSP classifier. The classifiers resulting from using this function can be passed to SWAP.KTSP.Classify for samples classification.

Usage

1
2
3
4
5
SWAP.Train.KTSP(inputMat, phenoGroup, classes = NULL, krange = 2:10,
  FilterFunc = SWAP.Filter.Wilcoxon, RestrictedPairs = NULL, 
  handleTies = FALSE, disjoint = TRUE,
  k_selection_fn = KbyTtest, k_opts = list(), score_fn = signedTSPScores, 
  score_opts = NULL, verbose = FALSE, ...)

Arguments

inputMat

is a numerical matrix containing the measurements (e.g., gene expression data) to be used to build the K-TSP classifier. The columns represent samples and the rows represent the features (e.g., genes). The number of columns must agree with the length of phenoGroup. Note that rownames(inputMat) will be used as the feature names (e.g., gene symbols) in all subsequent analyses.

phenoGroup

is a factor with two levels containing the phenotype information used to train the K-TSP classifier. In order to identify the best TSP to be included in the classifier, the features contained in inputMat will be compared between the two groups defined by this factor. Levels from phenoGroup will be also used to reorder the features in each TSP such as the first feature is larger than the second one in the group corresponding to first level, and vice-versa.

classes

is a character vector of length 2 providing the phenotype class labels (case followed by control). If NULL, the levels of phenoGroup will be taken as the labels.

krange

an integer (or a vector of integers) defining the candidate number of Top Scoring Pairs (TSPs) from which the algorithm chooses to build the final classifier. The algorithm uses the mechanism in Afsari et al (AOAS, 2014) to select the number of pairs and pair of features. Default is the range from 2 to 10.

FilterFunc

is a filtering function to reduce the starting number of features to be used to identify the Top Scoring Pairs (TSP). The default filter is differential expression test based on the Wilcoxon rank-sum test and alternative filtering functions can be passed too (see SWAP.Filter.Wilcoxon for details). The output of the function must be subset of rownames(inputMat)

RestrictedPairs

is a character matrix with two columns containing the feature pairs to be considered for score calculations. Each row should contain a pair of feature names matching the rownames of inputMat. If RestrictedPairs is missing all available feature pairs will be considered.

handleTies

is a logical value indicating whether tie handling should be enabled or not. FALSE by default.

disjoint

is a logical value indicating whether only disjoint pairs should be considered in the final set of selected pairs; i.e. all features occur only once among the set of TSPs.

k_selection_fn

is a function for selecting the optimal k once the TSP scores have been calculated for all the candidate pairs. This can be either SWAP.Kby.Measurement or SWAP.Kby.Ttest(default), or a user defined function.

k_opts

a list of additional arguments to be passed on to a custom k selection function.

score_fn

is a function for calculating TSP scores. By default, the signed TSP scores as calculated by SWAP.Calculate.SignedTSPScores will be used. The user can also provide SWAP.Calculate.BasicTSPScores to obtain basic TSP scores. The output of any custom function should correspond to the same strucure as the output from these two functions.

score_opts

is a list of additional variables that will be passed on to the scoring function as the score_opts argument.

verbose

is a logical value indicating whether status messages will be printed or not throughout the function. FALSE by default.

...

Additional argument passed to the filtering function FilterFunc.

Value

The KTSP classifier, in the form of a list, which contains the following components:

name

The classifier name.

TSPs

A k by 2 matrix, containing the feature names for each TSP. These names correspond to the rownames(inputData). In this matrix each row corresponds to a specific TSP. For each TSP (i.e. row in the TSPs matrix) the order of the features is such that the first one is on average smaller than the second one in the phenotypic group defined by the first levels of the phenoGroup factor and vice-versa. The algorithm uses the mechanism in Afsari et al (2014) to select the number of pairs and pair of features.

score

scores TSP for the top k TSPs.

label

the class labels. These labels correspond to the phenoGroup factor lelves and will be used lable any new sample classified by the SWAP.KTSP.Classify function.

tieVote

indicates which class the pair would vote for in case of a tie.

Author(s)

Bahman Afsari bahman.afsari@gmail.com, Luigi Marchionni marchion@jhu.edu, Wikum Dinalankara wdinala1@jhmi.edu

References

See switchBox for the references.

See Also

SWAP.KTSP.Classify, SWAP.Filter.Wilcoxon, SWAP.CalculateSignedScore

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
##################################################
### Load gene expression data for the training set
data(trainingData)


### Show group variable for the TRAINING set
table(trainingGroup)


##################################################
### Train a classifier using default filtering function based on the Wilcoxon test
classifier <- SWAP.Train.KTSP(matTraining, trainingGroup)

### Show the classifier
classifier


##################################################
### Train another classifier from the top 4 best features 
### according to the deafault  filtering function
classifier <- SWAP.Train.KTSP(matTraining, trainingGroup,
			      FilterFunc=SWAP.Filter.Wilcoxon, featureNo=4)

### Show the classifier
classifier

marchion/switchBox documentation built on May 9, 2019, 4:07 p.m.