edgeRselection: Feature Selection Based on Differential Expression for Count...

Description Usage Arguments Details Value Author(s) References Examples

Description

Performs a differential expression analysis between classes and chooses the features which have best resubstitution performance. The data may have overdispersion and this is modelled.

Usage

1
2
3
4
5
6
7
8
9
  ## S4 method for signature 'matrix'
edgeRselection(counts, classes, ...)
  ## S4 method for signature 'DataFrame'
edgeRselection(counts, classes, datasetName,
                   normFactorsOptions = NULL, dispOptions = NULL, fitOptions = NULL,
               trainParams, predictParams, resubstituteParams,
               selectionName = "edgeR LRT", verbose = 3)
  ## S4 method for signature 'MultiAssayExperiment'
edgeRselection(counts, targets = NULL, ...)

Arguments

counts

Either a matrix or MultiAssayExperiment containing the unnormalised counts.

classes

A vector of class labels of class factor of the same length as the number of samples in measurements. Not used if measurements is a MultiAssayExperiment object.

targets

If measurements is a MultiAssayExperiment, the names of the data tables of counts to be used.

...

Variables not used by the matrix nor the MultiAssayExperiment method which are passed into and used by the DataFrame method.

datasetName

A name for the data set used. Stored in the result.

normFactorsOptions

A named list of any options to be passed to calcNormFactors.

dispOptions

A named list of any options to be passed to estimateDisp.

fitOptions

A named list of any options to be passed to glmFit.

trainParams

A container of class TrainParams describing the classifier to use for training.

predictParams

A container of class PredictParams describing how prediction is to be done.

resubstituteParams

An object of class ResubstituteParams describing the performance measure to consider and the numbers of top features to try for resubstitution classification.

selectionName

A name to identify this selection method by. Stored in the result.

verbose

Default: 3. A number between 0 and 3 for the amount of progress messages to give. This function only prints progress messages if the value is 3.

Details

The differential expression analysis follows the standard edgeR steps of estimating library size normalisation factors, calculating dispersion, in this case robustly, and then fitting a generalised linear model followed by a likelihood ratio test.

Data tables which consist entirely of non-numeric data cannot be analysed. If measurements is an object of class MultiAssayExperiment, the factor of sample classes must be stored in the DataFrame accessible by the colData function with column name "class".

Value

An object of class SelectResult or a list of such objects, if the classifier which was used for determining the specified performance metric made a number of prediction varieties.

Author(s)

Dario Strbenac

References

edgeR: a Bioconductor package for differential expression analysis of digital gene expression data, Mark D. Robinson, Davis McCarthy, and Gordon Smyth, 2010, Bioinformatics, Volume 26 Issue 1, https://academic.oup.com/bioinformatics/article/26/1/139/182458.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
  if(require(parathyroidSE) && require(PoiClaClu))
  {
    data(parathyroidGenesSE)
    expression <- assays(parathyroidGenesSE)[[1]]
    sampleNames <- paste("Sample", 1:ncol(parathyroidGenesSE))
    colnames(expression) <- sampleNames
    DPN <- which(colData(parathyroidGenesSE)[, "treatment"] == "DPN")
    control <- which(colData(parathyroidGenesSE)[, "treatment"] == "Control")
    expression <- expression[, c(control, DPN)]
    classes <- factor(rep(c("Contol", "DPN"), c(length(control), length(DPN))))
    expression <- expression[rowSums(expression > 1000) > 8, ] # Make small data set.
    
    selected <- edgeRselection(expression, classes, "DPN Treatment",
                   trainParams = TrainParams(classifyInterface),
                   predictParams = PredictParams(NULL),
                   resubstituteParams = ResubstituteParams(nFeatures = seq(10, 100, 10),
                                        performanceType = "balanced error", better = "lower"))
                                        
    head(selected@rankedFeatures[[1]])
    plotFeatureClasses(expression, classes, "ENSG00000044574",
                       dotBinWidth = 500, xAxisLabel = "Unnormalised Counts")
  }

Example output

Loading required package: Biobase
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, sd, var, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, basename, cbind, colMeans, colSums, colnames,
    dirname, do.call, duplicated, eval, evalq, get, grep, grepl,
    intersect, is.unsorted, lapply, lengths, mapply, match, mget,
    order, paste, pmax, pmax.int, pmin, pmin.int, rank, rbind,
    rowMeans, rowSums, rownames, sapply, setdiff, sort, table, tapply,
    union, unique, unsplit, which, which.max, which.min

Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: BiocParallel
Loading required package: parathyroidSE
Warning message:
In library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE,  :
  there is no package called 'parathyroidSE'

ClassifyR documentation built on Nov. 8, 2020, 6:53 p.m.