esetLda: plot a biplot of a linear discriminant analysis of an eSet...
In esetVis: Visualizations of expressionSet Bioconductor object

Description Usage Arguments Value Author(s) References See Also Examples

esetLda reduces the dimension of the data contained in the eSet via a linear discriminant analysis on the specified grouping variable with the lda function and plot the subsequent biplot, possibly with sample annotation and gene annotation contained in the eSet.

esetLda(eset, ldaVar, psids = 1:nrow(eset), dim = c(1, 2),
  colorVar = character(), color = if (length(colorVar) == 0) "black"
  else character(), shapeVar = character(), shape = if
  (length(shapeVar) == 0) 15 else numeric(), sizeVar = character(),
  size = if (length(sizeVar) == 0) {     ifelse(typePlot == "interactive"
  && length(packageInteractivity) == 1 && packageInteractivity == "rbokeh",
  5, 2.5) } else {     numeric() }, sizeRange = numeric(),
  alphaVar = character(), alpha = if (length(alphaVar) == 0) 1 else
  numeric(), alphaRange = numeric(), title = "",
  symmetryAxes = c("combine", "separate", "none"),
  packageTextLabel = c("ggrepel", "ggplot2"), cloudGenes = TRUE,
  cloudGenesColor = "black", cloudGenesNBins = sqrt(length(psids)),
  cloudGenesIncludeLegend = FALSE, cloudGenesTitleLegend = "nGenes",
  topGenes = 10, topGenesCex = 2.5, topGenesVar = character(),
  topGenesJust = c(0.5, 0.5), topGenesColor = "black",
  topSamples = 10, topSamplesCex = 2.5, topSamplesVar = character(),
  topSamplesJust = c(0.5, 0.5), topSamplesColor = "black",
  geneSets = list(), geneSetsVar = character(),
  geneSetsMaxNChar = numeric(), topGeneSets = 10,
  topGeneSetsCex = 2.5, topGeneSetsJust = c(0.5, 0.5),
  topGeneSetsColor = "black", includeLegend = TRUE,
  includeLineOrigin = TRUE, typePlot = c("static", "interactive"),
  packageInteractivity = c("rbokeh", "ggvis"),
  figInteractiveSize = c(600, 400), ggvisAdjustLegend = TRUE,
  interactiveTooltip = TRUE, interactiveTooltipExtraVars = character(),
  returnAnalysis = FALSE, returnEsetPlot = FALSE)

`eset`	expressionSet (or SummarizedExperiment) object with data
`ldaVar`	name of variable (in varLabels of the `eset`) used for grouping for lda
`psids`	featureNames of genes to include in the plot, all by default
`dim`	dimensions of the analysis to represent, first two dimensions by default
`colorVar`	name of variable (in varLabels of the `eset`) used for coloring, empty by default
`color`	character or factor with specified color(s) for the points, replicated if needed. This is used only if `colorVar` is empty. By default: 'black' if `colorVar` is not specified and default `ggplot` palette otherwise
`shapeVar`	name of variable (in varLabels of the `eset`) used for the shape, empty by default
`shape`	character or factor with specified shape(s) (pch) for the points, replicated if needed. This is used only if `shapeVar` is empty. By default: '15' (filled square) if `shapeVar` is not specified and default `ggplot` shape(s) otherwise
`sizeVar`	name of variable (in varLabels of the `eset`) used for the size, empty by default
`size`	character or factor with specified size(s) (cex) for the points, replicated if needed. This is used only if `sizeVar` is empty. By default: '2.5' if `sizeVar` is not specified and default `ggplot` size(s) otherwise
`sizeRange`	size (cex) range used in the plot, possible only if the `sizeVar` is 'numeric' or 'integer'
`alphaVar`	name of variable (in varLabels of the `eset`) used for the transparency, empty by default. This parameter is currently only available for static plot and ggvis (only numeric in this case).
`alpha`	character or factor with specified transparency(s) for the points, replicated if needed. This is used only if `shapeVar` is empty. By default: '1' if `alphaVar` is not specified and default `ggplot` alpha otherwise This parameter is currently only available for static and ggvis.
`alphaRange`	transparency (alpha) range used in the plot, possible only if the `alphaVar` is 'numeric' or 'integer' This parameter is currently only available for static and ggvis plot.
`title`	plot title, ” by default
`symmetryAxes`	set symmetry for axes, either: 'combine' (by default): both axes are symmetric and with the same limits 'separate': each axis is symmetric and has its own limits 'none': axes by default (plot limits)
`packageTextLabel`	package used to label the outlying genes/samples/gene sets, either `ggrepel` (by default, only used if package `ggrepel` is available), or `ggplot2`
`cloudGenes`	logical, if TRUE (by default), include the cloud of genes in the plot
`cloudGenesColor`	if `cloudGenes` is TRUE, color for the cloud of genes, black by default
`cloudGenesNBins`	number of bins to used for the clouds of genes, by default the square root of the number of genes
`cloudGenesIncludeLegend`	logical, if TRUE (FALSE by default) include the legend for the cloud of genes (in the top position if multiple legends)
`cloudGenesTitleLegend`	string with title for the legend for the cloud of genes 'nGenes' by default
`topGenes`	numeric indicating which percentile (if <1) or number (if >=1) of genes most distant to the origin of the plot to annotate, by default: 10 genes are selected If no genes should be annotated, set this parameter to 0 Currently only available for static plot.
`topGenesCex`	cex for gene annotation (used when `topGenes` > 0)
`topGenesVar`	variable of the featureData used to label the genes, by default: empty, the featureNames are used for labelling (used when `topGenes` > 0)
`topGenesJust`	text justification for the genes (used when `topGenes` > 0 and if `packageTextLabel` is `ggplot2`), by default: c(0.5, 0.5) so centered
`topGenesColor`	text color for the genes (used when `topGenes` > 0), black by default
`topSamples`	numeric indicating which percentile (if <1) or number (if >=1) of samples most distant to the origin of the plot to annotate, by default: 10 samples are selected If no samples should be annotated, set this parameter to 0. Currently available for static plot.
`topSamplesCex`	cex for sample annotation (used when `topSamples` > 0)
`topSamplesVar`	variable of the phenoData used to label the samples, by default: empty, the sampleNames are used for labelling (used when `topSample`s > 0)
`topSamplesJust`	text justification for the samples (used when `topSamples` > 0 and if `packageTextLabel` is `ggplot2`), by default: c(0.5, 0.5) so centered
`topSamplesColor`	text color for the samples (used when `topSamples` > 0), black by default
`geneSets`	list of gene sets/pathways, each containing identifiers of genes contained in the set. E.g. pathways from Gene Ontology databases output from the `getGeneSetsForPlot` function or any custom list of pathways. The genes identifiers should correspond to the variable `geneSetsVar` contained in the phenoData, if not specified the featureNames are used. If several gene sets have the same name, they will be combine to extract the top gene sets.
`geneSetsVar`	variable of the featureData used to match the genes contained in geneSets, most probably ENTREZID, if not specified the featureNames of the eSet are used Only used when `topGeneSets` > 0 and the parameter geneSets is specified.
`geneSetsMaxNChar`	maximum number of characters for pathway names, by default keep entire names Only used when `topGeneSets` > 0 and the parameter `geneSets` is specified. If `returnAnalysis` is set to TRUE and `geneSetsMaxNChar` specified, the top pathways will be returned in the output object, named with the identifiers used in the plot (so with maximum `geneSetsMaxNChar` number of characters)
`topGeneSets`	numeric indicating which percentile (if <=1) or number (if >1) of gene sets most distant to the origin of the plot to annotate, by default: 10 gene sets are selected If no gene sets should be annotated, set this parameter to 0. Currently available for static plot. Only used when `topGeneSets` > 0 and the parameter geneSets is specified.
`topGeneSetsCex`	cex for gene sets annotation Only used when `topGeneSets` > 0 and the parameter geneSets is specified.
`topGeneSetsJust`	text justification for the gene sets by default: c(0.5, 0.5) so centered Only used when `topGeneSets` > 0, the parameter `geneSets` is specified and if `packageTextLabel` is `ggplot2`.
`topGeneSetsColor`	color for the gene sets (used when `topGeneSets` > 0 and `geneSets` is specified), black by default Only used when `topGeneSets` > 0 and the parameter geneSets is specified.
`includeLegend`	logical if TRUE (by default) include a legend, otherwise not
`includeLineOrigin`	if TRUE (by default) include vertical line at x = 0 and horizontal line at y = 0
`typePlot`	type of the plot returned, either 'static' (static) or interactive' (potentially interactive)
`packageInteractivity`	if `typePlot` is 'interactive', package used for interactive plot, either 'rbokeh' (by default) or 'ggvis'
`figInteractiveSize`	vector containing the size of the interactive plot, as [width, height] by default: c(600, 400). This is passed to the `width` and `height` parameters of: for rbokeh plots: the `bokeh::figure` function for ggvis plots: the `ggvis::set_options` function
`ggvisAdjustLegend`	logical, if TRUE (by default) adjust the legends in `ggvis` to avoid overlapping legends when multiple legends
`interactiveTooltip`	logical, if TRUE, add hoover functionality showing sample annotation (variables used in the plot) in the plot
`interactiveTooltipExtraVars`	name of extra variable(s) (in varLabels of the `eset`) to add in rbokehEsetPlot to label the samples, empty by default
`returnAnalysis`	logical, if TRUE (FALSE by default), return also the output of the analysis, and the outlying samples in the topElements element if any, otherwise only the plot object
`returnEsetPlot`	logical, if TRUE return also the esetPlot object

if returnAnalysis is TRUE, return a list:

analysis: output of the spectral map analysis, whose parameters can be given as input to the esetPlotWrapper function
- dataPlotSamples: coordinates of the samples
- dataPlotGenes: coordinates of the genes
- esetUsed: expressionSet used in the plot
topElements: list with top outlying elements if any, possibly genes, samples and gene sets
plot: the plot output

otherwise return only the plot

Laure Cougnaud

Fisher, R. A. (1936). The Use of Multiple Measurements in Taxonomic Problems. Annals of Eugenics, 7 (2), 179–188

the function used internally: lda

# load data
library(ALL)
data(ALL)

# specify several variables in ldaVar (this might take a few minutes to run...)

# sample subsetting: currently cannot deal with missing values
samplesToRemove <- which(apply(pData(ALL)[, c("sex", "BT")], 1, anyNA)) 

# extract random features, because analysis is quite time consuming
retainedFeatures <- sample(featureNames(ALL), size = floor(nrow(ALL)/5))

# create the plot
esetLda(eset = ALL[retainedFeatures, -samplesToRemove], 
  ldaVar = "BT", colorVar = "BT", shapeVar = "sex", sizeVar = "age",
  title = "Linear discriminant analysis on the ALL dataset")

Loading required package: Biobase
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:parallel’:

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from ‘package:stats’:

    IQR, mad, sd, var, xtabs

The following objects are masked from ‘package:base’:

    anyDuplicated, append, as.data.frame, basename, cbind, colnames,
    dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
    grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
    order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
    rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
    union, unique, unsplit, which.max, which.min

Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Warning messages:
1: In lda.default(x, grouping, ...) : variables are collinear
2: Removed 2 rows containing missing values (geom_point).