biosign: Builds the molecular signature.
In biosigner: Signature discovery from omics data

Description Usage Arguments Value Author(s) See Also Examples

Main function of the 'biosigner' package. For each of the available classifiers (PLS-DA, Random Forest, and SVM), the significant features are selected and the corresponding models are built.

## S4 method for signature 'MultiDataSet'
biosign(
  x,
  y,
  seedI = NULL,
  fig.pdfC = c("none", "interactive", "myfile.pdf")[2],
  info.txtC = c("none", "interactive", "myfile.txt")[2],
  ...
)

## S4 method for signature 'ExpressionSet'
biosign(x, y, ...)

## S4 method for signature 'data.frame'
biosign(x, y, ...)

## S4 method for signature 'matrix'
biosign(
  x,
  y,
  methodVc = c("all", "plsda", "randomforest", "svm")[1],
  bootI = 50,
  pvalN = 0.05,
  permI = 1,
  fixRankL = FALSE,
  seedI = 123,
  plotSubC = NA,
  fig.pdfC = c("none", "interactive", "myfile.pdf")[2],
  info.txtC = c("none", "interactive", "myfile.txt")[2],
  printL = TRUE,
  plotL = TRUE,
  .sinkC = NULL,
  ...
)

`x`	Numerical data frame or matrix (observations x variables), or ExpressionSet object with non empty assayData and phenoData; NAs are allowed for PLS-DA but for SVM, samples with NA will be removed
`y`	Two-level factor corresponding to the class labels, or a character indicating the name of the column of the pData to be used, when x is an ExpressionSet object
`seedI`	integer: optional seed to obtain exactly the same signature when rerunning biosigner; default is '123'; set to NULL to prevent seed setting
`fig.pdfC`	Character: File name with '.pdf' extension for the figure; if 'interactive' (default), figures will be displayed interactively; if 'none', no figure will be generated
`info.txtC`	Character: File name with '.txt' extension for the printed results (call to sink()'); if 'interactive' (default), messages will be printed on the screen; if 'none', no verbose will be generated
`...`	Currently not used.
`methodVc`	Character vector: Either one or all of the following classifiers: Partial Least Squares Discriminant Analysis ('plsda'), or Random Forest ('randomforest'), or Support Vector Machine ('svm')
`bootI`	Integer: Number of bootstaps for resampling
`pvalN`	Numeric: To speed up the selection, only variables which significantly improve the model up to two times this threshold (to take into account potential fluctuations) are computed
`permI`	Integer: Random permutation are used to assess the significance of each new variable included into the model (forward selection)
`fixRankL`	Logical: Should the initial ranking be computed with the full model only, or as the median of the ranks from the models built on the sampled dataset?
`plotSubC`	Character: Graphic subtitle
`printL`	Logical: deprecated: use the 'info.txtC' argument instead
`plotL`	Logical: deprecated: use the 'fig.pdfC' argument instead
`.sinkC`	Character: deprecated: use the 'info.txtC' argument instead

An S4 object of class 'biosign' containing the following slots: 1) 'methodVc' character vector: selected classifier(s) ('plsda', 'randomforest', and/or 'svm'), 2) 'accuracyMN' numeric matrix: balanced accuracies for the full models, and the models restricted to the 'S' and 'AS' signatures (predictions are obtained by using the resampling scheme selected with the 'bootI' and 'crossvalI' arguments), 3) 'tierMC' character matrix: contains the tier ('S', 'A', 'B', 'C', 'D', or 'E') of each feature for each classifier (features with tier 'S' have been found significant in all backward selections; features with tier 'A' have been found significant in all but the last selection, and so on), 4) modelLs list: selected classifier(s) trained on the subset restricted to the 'S' features, 5) signatureLs list: 'S' signatures for each classifier; and 6) 'AS' list: 'AS' signatures and corresponding trained classifiers, in addition to the dataset restricted to tiers 'S' and 'A' ('xMN') and the labels ('yFc')

Philippe Rinaudo and Etienne Thevenot (CEA)

predict.biosign, plot.biosign

## loading the diaplasma dataset

data(diaplasma)
attach(diaplasma)

## restricting to a smaller dataset for this example

featureSelVl <- variableMetadata[, "mzmed"] >= 490 & variableMetadata[, "mzmed"] < 500
dataMatrix <- dataMatrix[, featureSelVl]
variableMetadata <- variableMetadata[featureSelVl, ]

# signature selection for all 3 classifiers
# a bootI = 5 number of bootstraps is used for this example
# we recommend to keep the default bootI = 50 value for your analyzes

diaSign <- biosign(dataMatrix, sampleMetadata[, "type"], bootI = 5)

## Application to an ExpressionSet

diaSet <- ExpressionSet(assayData = t(dataMatrix), 
                        phenoData = new("AnnotatedDataFrame", 
                                        data = sampleMetadata), 
                        featureData = new("AnnotatedDataFrame", 
                                          data = variableMetadata),
                        experimentData = new("MIAME", 
                                             title = "diaplasma"))
                                             
diaSign <- biosign(diaSet, "type", bootI = 5)
diaSet <- getEset(diaSign)
head(fData(diaSet))

detach(diaplasma)

## Application to a MultiDataSet

# Loading the 'NCI60_4arrays' from the 'omicade4' package
data("NCI60_4arrays", package = "omicade4")
# Selecting two of the four datasets
setNamesVc <- c("agilent", "hgu95")
# Creating the MultiDataSet instance
nciMset <- MultiDataSet::createMultiDataSet()
# Adding the two datasets as ExpressionSet instances
for (setC in setNamesVc) {
  # Getting the data
  exprMN <- as.matrix(NCI60_4arrays[[setC]])
  pdataDF <- data.frame(row.names = colnames(exprMN),
                        cancer = substr(colnames(exprMN), 1, 2),
                        stringsAsFactors = FALSE)
  fdataDF <- data.frame(row.names = rownames(exprMN),
                        name = rownames(exprMN),
                        stringsAsFactors = FALSE)
  # Building the ExpressionSet
  eset <- Biobase::ExpressionSet(assayData = exprMN,
                                 phenoData = new("AnnotatedDataFrame",
                                                 data = pdataDF),
                                 featureData = new("AnnotatedDataFrame",
                                                   data = fdataDF),
                                 experimentData = new("MIAME",
                                                      title = setC))
  # Adding to the MultiDataSet
  nciMset <- MultiDataSet::add_eset(nciMset, eset, dataset.type = setC,
                                    GRanges = NA, warnings = FALSE)
}
# Restricting to the 'ME' and 'LE' cancer types
sampleNamesVc <- Biobase::sampleNames(nciMset[["agilent"]])
cancerTypeVc <- Biobase::pData(nciMset[["agilent"]])[, "cancer"]
nciMset <- nciMset[sampleNamesVc[cancerTypeVc %in% c("ME", "LE")], ]
# Summary of the MultiDataSet
nciMset
# Building PLS-DA models for the cancer type, and getting back the updated MultiDataSet
nciPlsda <- ropls::opls(nciMset, "cancer", predI = 2)
nciMset <- ropls::getMset(nciPlsda)
# Selecting the significant features for PLS-DA, RF, and SVM classifiers, and getting back the updated MultiDataSet
nciBiosign <- biosigner::biosign(nciMset, "cancer")
nciMset <- biosigner::getMset(nciBiosign)