scudoClassify: Performes classification using SCUDO

View source: R/scudoClassify.R

scudoClassifyR Documentation

Performes classification using SCUDO

Description

Performs supervised classification of samples in a testing set using a network of samples generated by SCUDO during a training step.

Usage

scudoClassify(trainExpData, testExpData, N, nTop, nBottom,
    trainGroups, maxDist = 1, weighted = TRUE, complete = FALSE, beta = 1,
    alpha = 0.1, foldChange = TRUE, featureSel = TRUE, logTransformed = NULL,
    parametric = FALSE, pAdj = "none", distFun = NULL)

Arguments

trainExpData

either an ExpressionSet, a SummarizedExperiment, a data.frame or a matrix of gene expression data, with a column for each sample and a row for each feature

testExpData

either an ExpressionSet, a SummarizedExperiment, a data.frame or a matrix of gene expression data, with a column for each sample and a row for each feature

N

a number between 0 and 1, representing the fraction of the signature-to-signature distances that will be used to draw the graph

nTop

number of up-regulated features to include in the signatures

nBottom

number of down-regulated features to include in the signatures

trainGroups

factor containing group labels for each sample in trainExpData

maxDist

an integer. Only nodes with a distance from a testing node less or equal to maxDist are used to perform the classification

weighted

logical, whether to consider the distances associated to the edges to compute the scores for the classification. For a description of the classification method, see Details below

complete

logical, whether to consider all the nodes in the training set to perform the classification. If TRUE, the arguments N, maxDist, weighted and beta are ignored. For a description of the classification method, see Details below

beta

a coefficient used to down-weight the influence of distant nodes on the classification outcome. For a description of the classification method, see Details below

alpha

p-value cutoff for the optional feature selection step. If feature selection is skipped, alpha is ignored

foldChange

logical, whether or not to compute fold-changes from expression data

featureSel

logical, whether or not to perform a feature selection. Feature selection is performed using one of four tests: Student's t-test, ANOVA, Wilcoxon-Mann-Withney test, or Kruskal-Wallis test. The test used depends on the number of groups and the parametric argument

logTransformed

logical or NULL. It indicates whether the data is log-transformed. If NULL, an attempt is made to guess if the data is log-transformed

parametric

logical, whether to use a parametric or a non-parametric test for the feature selection

pAdj

pAdj method to use to adjust the p-values in the feature selection step. See p.adjust.methods for a list of adjustment methods

distFun

the function used to compute the distance between two samples. See Details of scudoTrain for the specification of this function

Details

This function performs supervised classification of samples in a testing set, using the networks similar to the one generated by scudoTrain and scudoNetwork as a model.

For each sample S in the testing set, a new distance matrix is computed using the expression profiles in the training set and the expression profile of S. The distance matrix is computed as described in the Details of scudoTrain.

If the argument complete is TRUE, the distance matrix is converted in a similarity score matrix. Then, the similarity scores between S and all the samples in the training set are aggregated according to groups. The mean similarity scores are computed for each group and classification scores are generated dividing them by their sum, obtaining values bewteen 0 and 1.

If the argument complete is FALSE, the distance matrix obtained form S and the training set is used to generate a network of samples, using the parameter N as a threshold for edge selection (see Details of scudoNetwork for a more complete description). Then the neighbors of S in the network are explored, up to a distance controlled by the parameter maxDist. If the weighted parameter is FALSE, the classification scores for each group are computed as the number of edges connecting S or one of its neighbors to a node of that group. The scores are than rescaled dividing them by their sum, in order to obtain values between 0 and 1. If the weighted parameter is TRUE, the classification scores for each group are computed as the sum of the similarity scores associated to edges connecting S or one of its neighbors to nodes of that group. The scores are than rescaled dividing them by their sum, in order to obtain values between 0 and 1. The parameter beta can be used to down-weight the contribution to the classification scores of edges connecting nodes distant form S, both in the weighed and unweighted cases.

The predicted group for each sample is the one with the largest classification score. Both predictions and classification scores are returned. Note that if the argument complete is FALSE, the classification socres for a sample may be all zero, which happens when the correspoonding node is isolated in the network of samples. In this case the predicted group is NA. The tuning of the parameters can be performed automatically using the train function form the package caret and the function scudoModel.

Value

A list containing a factor with the predicticted class for each sample in testExpData and a data.frame of the classification scores used to generate the predictions.

Author(s)

Matteo Ciciani matteo.ciciani@gmail.com, Thomas Cantore cantorethomas@gmail.com

See Also

scudoTrain, scudoModel

Examples

expData <- data.frame(a = 1:10, b = 2:11, c = 10:1, d = 11:2,
    e = c(1:4, 10:5), f = c(7:10, 6:1), g = c(8:4, 1:3, 10, 9),
    h = c(6:10, 5:1), i = c(5:1, 6:10))
rownames(expData) <- letters[1:10]
groups <- factor(c(1,1,1,2,2,2,1,1,1))
inTrain <- 1:5

# perform classification
res <- scudoClassify(expData[, inTrain], expData[, -inTrain], 0.9, 3, 3,
    groups[inTrain], featureSel = FALSE)

#explore predictions
predictions <- res$predicted
scores <- res$scores


Matteo-Ciciani/scudo documentation built on Feb. 3, 2024, 9:41 a.m.