plot a biplot of a linear discriminant analysis of an eSet object

Share:

Description

esetLda reduces the dimension of the data contained in the eSet via a linear discriminant analysis on the specified grouping variable with the lda function and plot the subsequent biplot, possibly with sample annotation and gene annotation contained in the eSet.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
esetLda(eset, ldaVar, psids = 1:nrow(eset), dim = c(1, 2),
  colorVar = NULL, color = if (is.null(colorVar)) "black" else NULL,
  shapeVar = NULL, shape = if (is.null(shapeVar)) 15 else NULL,
  sizeVar = NULL, size = if (is.null(sizeVar)) 2.5 else NULL,
  sizeRange = NULL, alphaVar = NULL, alpha = if (is.null(alphaVar)) 1 else
  NULL, alphaRange = NULL, title = "", symmetryAxes = c("combine",
  "separate", "none"), packageTextLabel = c("ggrepel", "ggplot2"),
  cloudGenes = TRUE, cloudGenesColor = "black",
  cloudGenesNBins = sqrt(length(psids)), cloudGenesIncludeLegend = FALSE,
  cloudGenesTitleLegend = "nGenes", topGenes = 10, topGenesCex = 2.5,
  topGenesVar = NULL, topGenesJust = c(0.5, 0.5), topGenesColor = "black",
  topSamples = 10, topSamplesCex = 2.5, topSamplesVar = NULL,
  topSamplesJust = c(0.5, 0.5), topSamplesColor = "black",
  geneSets = list(), geneSetsVar = NULL, geneSetsMaxNChar = NULL,
  topGeneSets = 10, topGeneSetsCex = 2.5, topGeneSetsJust = c(0.5, 0.5),
  topGeneSetsColor = "black", includeLegend = TRUE,
  includeLineOrigin = TRUE, typePlot = c("static", "interactive"),
  packageInteractivity = c("rbokeh", "ggvis"), figInteractiveSize = c(600,
  400), ggvisAdjustLegend = TRUE, interactiveTooltip = TRUE,
  interactiveTooltipExtraVars = NULL, returnAnalysis = FALSE)

Arguments

eset

expressionSet (or SummarizedExperiment) object with data

ldaVar

name of variable (in varLabels of the eset) used for grouping for lda, NULL by default

psids

featureNames of genes to include in the plot, all by default

dim

dimensions of the analysis to represent, first two dimensions by default

colorVar

name of variable (in varLabels of the eset) used for coloring, NULL by default

color

specified color(s) for the points, replicated if needed, used only if colorVar is NULL, a factor or character by default: 'black' if colorVar is not specified and default ggplot palette otherwise

shapeVar

name of variable (in varLabels of the eset) used for the shape, NULL by default

shape

specified shape(s) (pch) for the points, replicated if needed, used only if shapeVar is NULL, a factor or character by default: '15' (filled square) if shapeVar is not specified and default ggplot shape(s) otherwise

sizeVar

name of variable (in varLabels of the eset) used for the size, NULL by default

size

specified size(s) (cex) for the points, replicated if needed, used only if sizeVar is NULL, a factor or character by default: '2.5' if sizeVar is not specified and default ggplot size(s) otherwise

sizeRange

size (cex) range used in the plot, possible only if the sizeVar is 'numeric' or 'integer'

alphaVar

name of variable (in varLabels of the eset) used for the transparency, NULL by default. This parameter is currently only available for static plot.

alpha

specified transparency(s) for the points, replicated if needed, used only if shapeVar is NULL, a factor or character by default: '1' if alphaVar is not specified and default ggplot alpha otherwise This parameter is currently only available for static plot.

alphaRange

transparency (alpha) range used in the plot, possible only if the alphaVar is 'numeric' or 'integer' This parameter is currently only available for static plot.

title

plot title, ” by default

symmetryAxes

set symmetry for axes, either:

  • 'combine' (by default): both axes are symmetric and with the same limits

  • 'separate': each axis is symmetric and has its own limits

  • 'none': axes by default (plot limits)

packageTextLabel

package used to label the outlying genes/samples/gene sets, either ggrepel (by default, only used if package ggrepel is available), or ggplot2

cloudGenes

logical, if TRUE (by default), include the cloud of genes in the spectral map

cloudGenesColor

if cloudGenes is TRUE, color for the cloud of genes, black by default

cloudGenesNBins

number of bins to used for the clouds of genes, by default the square root of the number of genes

cloudGenesIncludeLegend

logical, if TRUE (FALSE by default) include the legend for the cloud of genes (in the top position if multiple legends)

cloudGenesTitleLegend

string with title for the legend for the cloud of genes 'nGenes' by default

topGenes

numeric indicating which percentile (if <1) or number (if >=1) of genes most distant to the origin of the plot to annotate, by default: 10 genes are selected If no genes should be annotated, set this parameter to 0 Currently only available for static plot.

topGenesCex

cex for gene annotation (used when topGenes > 0)

topGenesVar

variable of the featureData used to label the genes, by default: NULL, the featureNames are used for labelling (used when topGenes > 0)

topGenesJust

text justification for the genes (used when topGenes > 0 and if packageTextLabel is ggplot2), by default: c(0.5, 0.5) so centered

topGenesColor

text color for the genes (used when topGenes > 0), black by default

topSamples

numeric indicating which percentile (if <1) or number (if >=1) of samples most distant to the origin of the plot to annotate, by default: 10 samples are selected If no samples should be annotated, set this parameter to 0. Currently available for static plot.

topSamplesCex

cex for sample annotation (used when topSamples > 0)

topSamplesVar

variable of the phenoData used to label the samples, by default: NULL, the sampleNames are used for labelling (used when topSamples > 0)

topSamplesJust

text justification for the samples (used when topSamples > 0 and if packageTextLabel is ggplot2), by default: c(0.5, 0.5) so centered

topSamplesColor

text color for the samples (used when topSamples > 0), black by default

geneSets

list of gene sets/pathways, each containing identifiers of genes contained in the set. E.g. pathways from Gene Ontology databases output from the getGeneSetsForPlot function or any custom list of pathways. The genes identifiers should correspond to the variable geneSetsVar contained in the phenoData, if not specified the featureNames are used. If several gene sets have the same name, they will be combine to extract the top gene sets.

geneSetsVar

variable of the featureData used to match the genes contained in geneSets, most probably ENTREZID, if not specified the featureNames of the eSet are used Only used when topGeneSets > 0 and the parameter geneSets is specified.

geneSetsMaxNChar

maximum number of characters for pathway names, by default keep entire names Only used when topGeneSets > 0 and the parameter geneSets is specified. If returnAnalysis is set to TRUE and geneSetsMaxNChar specified, the top pathways will be returned in the output object, named with the identifiers used in the plot (so with maximum geneSetsMaxNChar number of characters)

topGeneSets

numeric indicating which percentile (if <=1) or number (if >1) of gene sets most distant to the origin of the plot to annotate, by default: 10 gene sets are selected If no gene sets should be annotated, set this parameter to 0. Currently available for static plot. Only used when topGeneSets > 0 and the parameter geneSets is specified.

topGeneSetsCex

cex for gene sets annotation Only used when topGeneSets > 0 and the parameter geneSets is specified.

topGeneSetsJust

text justification for the gene sets by default: c(0.5, 0.5) so centered Only used when topGeneSets > 0, the parameter geneSets is specified and if packageTextLabel is ggplot2.

topGeneSetsColor

color for the gene sets (used when topGeneSets > 0 and geneSets is specified), black by default Only used when topGeneSets > 0 and the parameter geneSets is specified.

includeLegend

logical if TRUE (by default) include a legend, otherwise not

includeLineOrigin

if TRUE (by default) include vertical line at x = 0 and horizontal line at y = 0

typePlot

type of the plot returned, either 'static' (static) or interactive' (potentially interactive)

packageInteractivity

if typePlot is 'interactive', package used for interactive plot, either 'rbokeh' (by default) or 'ggvis'

figInteractiveSize

vector containing the size of the interactive plot, as [width, height] by default: c(600, 400). This is passed to the width and height parameters of:

  • for rbokeh plots: the bokeh::figure function

  • for ggvis plots: the ggvis::set_options function

ggvisAdjustLegend

logical, if TRUE (by default) adjust the legends in ggvis to avoid overlapping legends when multiple legends

interactiveTooltip

logical, if TRUE, add hoover functionality showing sample annotation (variables used in the plot) in the plot

interactiveTooltipExtraVars

name of extra variable(s) (in varLabels of the eset) to add in tooltip to label the samples, NULL by default

returnAnalysis

logical, if TRUE (FALSE by default), return also the output of the analysis, and the outlying samples in the topElements element if any, otherwise only the plot object

Value

if returnAnalysis is TRUE, return a list:

  • analysis: output of the spectral map analysis, whose parameters can be given as input to the esetPlotWrapper function

    • dataPlotSamples: coordinates of the samples

    • dataPlotGenes: coordinates of the genes

    • esetUsed: expressionSet used in the plot

  • topElements: list with top outlying elements if any, possibly genes, samples and gene sets

  • plot: the plot output

otherwise return only the plot

Author(s)

Laure Cougnaud

References

Fisher, R. A. (1936). The Use of Multiple Measurements in Taxonomic Problems. Annals of Eugenics, 7 (2), 179–188

See Also

the function used internally: lda

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
# load data
library(ALL)
data(ALL)

# specify several variables in ldaVar (this might take a few minutes to run...)

# sample subsetting: currently cannot deal with missing values
samplesToRemove <- which(apply(pData(ALL)[, c("sex", "BT")], 1, anyNA)) 

# extract random features, because analysis is quite time consuming
retainedFeatures <- sample(featureNames(ALL), size = floor(nrow(ALL)/5))

# create the plot
esetLda(eset = ALL[retainedFeatures, -samplesToRemove], 
  ldaVar = "BT", colorVar = "BT", shapeVar = "sex", sizeVar = "age",
  title = "Linear discriminant analysis on the ALL dataset")