cellCountsPerCluster() and clusterCellCountsPerSample().sampleData(). seurat objects now share
the same code as SingleCellExperiment, and return NULL if the sample
data is not defined. The metrics() function continues to slot empty
metadata in sampleID, sampleName, and interestingGroups() if not
defined.SingleCellExperiment is used as default method over seurat where
applicable: fetchData family, plotDimensionalReduction family,
plotMarker family, plotFeature family, plotGene family,
and plotCellTypesPerCluster(). The internal code hasn't changed, it just
is defined primarily for SingleCellExperiment.dimRed argument has been renamed to reduction, where applicable.topBarcodes() can now return either a data.frame or list, containing
the top barcodes grouped by sample.text as primary argument in markdownHeader() calls.mapCellsToSamples() utility function. This change
helps simplify the internal code for cell2sample(). For example in the
pbmc4k dataset, the barcode IDs are sanitized from TTTGGTTTCGCTAGCG-1 to
"pbmc4k_1_TTTGGTTTCGCTAGCG". The "1" here denotes 1 sample in the matrix,
which is how Cell Ranger denotes multiplexed samples in a single counts
matrix. Note that the "-" character is illegal in names, so we consistently
sanitize barcodes to contain "_" instead. See help("make.names") for
more information on syntactically valid names in R.readCellRanger() no longer requires reference data defined by refdataDir,
although this is still recommended.cellranger_small and seurat_small datasets to the
publicly available pbmc4k dataset from 10X Genomics. Here we've subset the
top 500 cells and genes by abundance. We'll use either the pbmc4k or pbmc8k
dataset for the vignette in a future update.bcbioSingleCell() and readCellRanger() functions now consistently default
to not requiring sampleMetadataFile, which is now NULL by default. For
bcbioSingleCell(), if a custom sample metadata file is not provided, the
function reads from the bcbio YAML metadata. For readCellRanger(), the
function uses minimalSampleData() internally to return minimal metadata,
containing sampleName and description columns..sampleDirs() function to bcbioSingleCell() and
readCellRanger() functions.plotMarker documentation examples to use mitochondrial genes.seurat_small in place of Seurat::pbmc_small in working examples.data.frame as much as possible,
where applicable. Code is being updated to use tidyeval.metrics() or
fetchData return column names.stripTranscriptVersions() command
applied, to remove the Ensembl transcript versions if present.ggplot grid return.plotUMAP(),
plotMarkerUMAP(), and plotFeatureUMAP(). Corresponding fetch functions,
fetchUMAPData() and fetchUMAPExpressionData(), have also been added.plotGene(): Added seurat method support. If advanced customization of
the plot is needed, use plotDot() or plotViolin() instead, or refer to the
Seurat documentation for alternates.diffExp(): improved internal code to work directly on
SingleCellExperiment, removing the need to pass design and group
parameters internally. Also added unit testing against zinbwave,
zingeR, and [edgeR][] support. [DESeq2][] is supported but runs slowly.plotFeature() and plotMarker() family of functions. Improved the
color palette support when dark = FALSE, now using a flipped viridis plasma
color palette.aggregateReplicates() function has been reworked to return a
SingleCellExperiment object instead of bcbioSingleCell. The v0.2.4
update of bcbioRNASeq behaves similarly with this generic.seurat SingleCellExperiment method
support, using as(x, "SingleCellExperiment") internally, which uses the
new Seurat::Convert() function.plotClusters(),
plotTSNEExpressionData(), loadSingleCellRun(), darkTheme(),
pcCutoff(), quantileHeatmap(), plotKnownMarkers(), readMarkers(),
readMarkersFile().plotFeatures(), plotMarker(), and plotMarkers() functions defunct.plotPCElbow() now returns a plot grid.sanitizeMarkers(): improved internal code for supported bcbio stashed
metadata, including rowRanges.plotCellTypesPerCluster() is using dark = TRUE by default again.cell2sample() handling for multiplexed Cell Ranger data loaded up with
readCellRanger(). Need to use stashed cell2sample factor saved in
metadata(), rather than attempting to calculate on the fly with
mapCellsToSamples().seurat objects in coercion
method. This helps maintain the gene symbol appearance in plotting functions
for genes with hyphens in the names.BiocParallel::SerialParam() internally for zinbwave in diffExp().cell2sample() internal code to always use mapCellsToSamples()
instead of attempting to use a stashed vector inside metadata() for
SingleCellExperiment method..applyFilterCutoffs(), which is no longer necessary since
this functionality is supported in the S4 subset method.fetchGene() functions.plotCellTypesPerCluster(): revert back to dark = TRUE by default.plotMarker and plotFeature functions in the documentation.sanitizeMarkers(): Improved gene identifier matching.topMarkers() now defaults to coding = FALSE by default, since not all
datasets will contain biotype information.updateObject() method support for bcbioSingleCell class.validObject() validity check to not require sample-level metadata in
colData() yet.plotUMAP() and fetchUMAPData() functions. These work similarly to the
other plotDimensionalReduction() and fetchData() functions.colData() slot, for better downstream
compatibility with other packages that work with SingleCellExperiment
container class. Unique per-sample rows are still saved internally in the
sampleData() slot.filterCells() now supports minUMIs = c("knee", "inflection") for automatic
filtering based on the cellular barcode ranks. Internally this is handled
by DropletUtils::barcodeRanks().libgsl-dev installation for zinbwave on
Travis CI.diffExp().fetchData() functions in the documentation.plotDimensionalReduction() functions in the documentation.aggregateReplicates() internal code. This function again only
supports aggregation of bcbioSingleCell objects that have been filtered
using the filterCells() function.Seurat::Convert() internally to coerce seurat class object
to SingleCellExperiment, using as(seurat, "SingleCellExperiment"). This
utility function was added to Seurat v2.3.1.metrics() SingleCellExperiment method code to always merge
colData() and sampleData().readCellRanger() internal code to match bcbioSingleCell()
constructor, specifically handling sample-level metadata in colData().plotQC().readYAMLSampleData() internally instead of defunct sampleYAMLMetadata().sampleNames() method support for seurat.plotUMIsPerCell().
Now uses the point argument and always labels per sample. Currently requires
the geom = "ecdf" argument for labeling.plotBarcodeRanks().seurat class objects.plotQC() geom argument is now more consistent across the paneled plots.interestingGroups support to plotZerosVsDepth(), matching the other QC functions.sampleData() S4 methods to match update in bcbioBase. Now supports
clean argument, which returns non-blacklisted factor columns only. See
bcbioBase::metadataBlacklist for the blacklist.scales::pretty_breaks() internally.grid argument to plots, where applicable.bcb_small to indrops_small.interestingGroups() and metadata().
These extend from SummarizedExperiment correctly now.plotCellCounts() and plotReadsPerCell().SingleCellExperiment-like methods,
where applicable. This includes rowData, gene2symbol(), and
interestingGroups().plotDot(), plotFeatureTSNE(),
plotMarker().filterCells() call.bcbioSingleCell()
constructor function.plotViolin() now uses a color border by default.cell2sample mapping internally for readCellRanger().bcbioSingleCell() instead of loadSingleCell() as the main
constructor function to create a bcbioSingleCell object. loadSingleCell()
is deprecated and still works, but will warn the user.loadCellRanger() to readCellRanger() for better name consistency.scale_color_hue() instead of scale_color_viridis() for example. The
viridis color palette is still used by default for marker expression
plots.aggregate" instead of "sampleNameAggregate" to define
aggregate/grouped samples in metadata.sym() in place of .data internally for tidy code.seurat blacklist for sampleData() generic.plotCellTypesPerCluster() and plotMarkerTSNE() now use an automatic color
palette by default, which enables for dynamic color palette support when
dark = TRUE. Internally this is handled with the theme_midnight() and
theme_paperwhite() [ggplot2][] themes.metrics() method support now defaults to matrix and works similarly
for dgCMatrix sparse matrices. This is used in place of calculateMetrics()
to generate the per cell quality control metrics.aggregateReplicates() internal code.aggregate is defined in metadata for quality
control plots.sanitizeMarkers() to use map the gene annotations
from rowRanges() better.barcodeRanks() and barcodeRanksPerSample().plotMarker() in addition to plotMarkers().assay() is named counts instead of
raw, for better consistency with SingleCellExperiment class. The
counts() generic requires that the primary assay slot is named counts to
work correctly. Nothing else here has changed, just the name.loadCellRanger() now returns a SingleCellExperiment object instead of a
bcbioSingleCell object.transgeneNames and spikeNames when loading up a dataset.organism = NULL during the loadSingleCell() call.SingleCellExperiment rather than bcbioSingleCell
where applicable, providing support for SingleCellExperiment objects created
elsewhere.theme_midnight() and theme_paperwhite() internally for dimensionality
reduction plots.stop(), warning(), and message().inflectionPoint() has been made defunct, in favor of using barcodeRanks().bcbioSingleCell S4 class now extends SingleCellExperiment instead of
SummarizedExperiment. This requires definition of rowRanges() inside the
object instead of rowData(). Similar functionality was added to the
bcbioRNASeq package. Upgrade support will be provided using updateObject()in
a future release.diffExp(), which uses
zingeR/[edgeR][] internally to calculate gene expression changes across
cell groups.plotCumulativeUMIsPerCell() utility. This may be removed in a future
update in favor of adding this plot into plotUMIsPerCell() using an ECDF
plot.readCellTypeMarkers() to load marker data.frames, instead of
readCellTypeMarkersFile(). This matches the conventions used in the
bcbioBase package.loadSingleCell() organism callsprepareSingleCellTemplate() now uses _setup.R instead of setup.R.mapCellsToSamples() instead of cell2sample(). cell2sample() now
simply acts as an accessor function, returning the internally stored
cell2sample mappings rather than trying to calculate. mapCellsToSamples()
performs the actual mapping from cellular barcodes to sample identifiers.metricsPerSample()BiocParallel::bpmapply() to loop across the sparse matrix files
per sample internally in the .sparseCountsList() function, which is shared
between loadSingleCell() and loadCellRanger().prepareSingleCellTemplate() to explicitly state which files to
include for each R Markdown template, rather than inheriting from
bcbioBase::prepareTemplate().selectSamples() now fails on a sample mismatch, rather than warning.drop = FALSE argument to
the cellular barcode matching call.programs metadata slot to programVersions, to improve consistency
with bcbioRNASeq package.fetchGeneData() function, that wraps the functionality of
Seurat::FetchData() for specific genes..plotDR() internally.fetchTSNEExpressionData() to return a standard
data.frame with the cellular barcodes as rows, instead of the previously
grouped tibble method. Now this function returns aggregate gene marker
calculations in the mean, median, and sum columns. Since this method has
way fewer rows than the grouped tibble, the [ggplot2][] code for
plotMarkerTSNE() now runs faster.viridis:: for color palettes, where applicable.colorPoints argument has been renamed to expression for
plotMarkerTSNE().genes argument simply matches
against the rownames in the counts matrix of the object.minCumPct argument for plotPCElbow() from 0.9 to
0.8. This is more conservative and will return slightly fewer principal
components for dimensionality reduction, by default.plotTSNE(), plotPCA(), and the other dimensionality reduction-related
plotting functions now default to a smaller point size (0.5) and slight alpha
transparency (0.8), to make super imposed points more obvious for large
datasets with many cells.subsetPerSample()..onLoad() to .onAttach() method for automatically
loading required dependency packages.plotPCA() now uses phase instead of Phase plotting cell cycle regression
as an interestingGroup (see Seurat clustering template). Previously some of
the Seurat metadata columns were not consistently sanitized to
lowerCamelCase (e.g. Phase, res.0.8, orig.ident).abort(),
inform(), and warn().filterCells() function to enable per sample filtering cutoffs. This
works by passing in a named numeric vector, where the names must match the
internal sampleID metadata column (not sampleName).metrics()
accessor. Now all count columns (e.g. nUMI) are consistently integers, and
all character vector columns are consistently coerced to factors.orig.ident and the
res.* metadata columns..readSparseCounts() function.colnames and rownames handling for internal .readSparseCounts()
function.loadCellRanger() function. refDataDir parameter has been renamed
to refdataDir.organism and genomeBuild options to loadSingleCell(), to override
the metadata set in the bcbio run, if necessary..sparseCountsTx2Gene() to .transcriptToGeneLevelCounts().geomean bind method in fetchTSNEExpressionData().minNovelty default from 0.8 to 0.75.plotDot().plotKnownMarkersDetected(). Now uses
tsneColor, violinFill, and dotColor. Also added pointsAsNumbers
parameter.subtitle parameter for plotMarkerTSNE().tsneColor, violinFill, dotColor, and dark parameters for
plotMarkers().plotTopMarkers() so that it renders correctly in
R Markdown calls.plotViolin().cell2sample handling in subset method code.readMarkersFile() to readCellTypeMarkersFile().-2, for example.aggregateReplicates() code to work with basejump generic, which uses
groupings instead of cells as the grouping parameter.detectOrganism().filterCells().gene2symbol() method support for bcbioSingleCell and seurat objects.knownMarkersDetected().plotCellTypesPerCluster().plotQuantileHeatmap() functionality into basejump, for use in
bcbioRNASeq package.plotKnownMarkers() to plotKnownMarkersDetected().readMarkers() to readMarkersFile().gene2symbol() method support.plotDot() generic to basejump package..checkFormat() function, which will check for ensgene or
symbol input..convertGenesToSymbols() utility function, for mapping
Ensembl gene identifiers to gene symbols.loadSingleCell() function.annotable() function.bcbio<- assignment method support for seurat class objects.calculateMetrics() function now uses annotable = TRUE as default, instead
of using missing() method.cell2sample() for seurat class objects.cellTypesPerCluster() now uses min and max default arguments that don't
remove any rows.counts() function. This
defaults to returning the raw counts (normalized = FALSE), but can also
return log-normalized (normalized = TRUE) and scaled
(normalized = "scaled") counts.fetchTSNEExpressionData().filterCells() now simply works in a destructive manner. We've removed the
drop parameter. The messages displayed to the user during this function call
have been improved, and now include more statistics on the step where the
majority of cells are filtered.gene2symbol() method support for bcbioSingleCell and
seurat class objects.interestingGroups<- assignment method support for seurat class
objects.plotDot() function.plotFeatureTSNE() now uses a plural features parameter instead of
feature, which is consistent with the syntax used in the other functions.plotMarkerTSNE(). The format
argument still defaults to "symbol", for consistency with previous behavior.
However, in the future we recommend that users pass in stable Ensembl gene
identifiers here if possible, for better reproducibility.plotMarkers() now supports Ensembl gene identifiers.plotPCElbow() now silently returns the sequence of principal components
(PCs) that we recommend to use for dimensionality reduction.geom_smooth() plotting, where
applicable. See the plotQC() function code.plotQuantileHeatmap() to enable faster plotting.
Now the dendrogram calculations are skipped by default, which take a long time
for large datasets.plotStressGenes() function for now. Will try to add this
in a future update.plotTopMarkers() if headerLevel = NULL.selectSamples() code to rely upon output of our bracket-based
subsetting method. See subset.R file for more details.subsetPerSample() function now defaults to saving in the working directory.topBarcodes() function to rank by nUMI instead of nCount column,
so it works with data from either bcbioSingleCell or seurat objects.data-raw/ directory!object@cellularBarcodes as a
data.frame instead of a per sample list. This makes downstream subsetting
operations on the barcodes simpler.cell2sample mapping.pcCutoff() to plotPCElbow(). The function now returns a PC
sequence from 1 to the cutoff (e.g. 1:10) instead of just the final PC cutoff
value. The R Markdown clustering template has been updated to reflect this
change.quantileHeatmap() to plotQuantileHeatmap(), for consistency with
other plotting functions.object@bcbio slot.darkTheme() to basejump package and reworked as midnightTheme(),
with improved colors and axis appearance.pointsAsNumbers parameter to plotTSNE() and plotPCA() functions,
to match the functionality in plotMarkerTSNE().loadCellRanger() to support multiplexed Cell Ranger matrix
output. Cell Ranger adds a numeric suffix to the end of multiplexed
barcodes (e.g. AAACCTGGTTTACTCT-1 denotes cellular barcode
AAACCTGGTTTACTCT is assigned to sample 1).cell2sample mapping in aggregateReplicates() function, which uses
the sampleNameAggregate column in sample metadata to define the aggregate
sample pairings. The summarize() step at line 101 is slow for datasets with
many samples and should be changed in the future to speed things up.cell2sample() code to handle NULL stashed mappings
better.midnightTheme() instead of
darkTheme().plotMarkerTSNE().plotMitoRatio() where maxGenes cutoff was plotted instead of
maxMitoRatio.legend parameter argument to plotQC() function. Also improved
handling of NULL return for plotReadsPerCell(), which can happen with
Cell Ranger output.plotZeroesVsDepth() to match the behavior in the
other plotting functions.sampleNameAggregate present
in sample metadata), but removing code support for wrapping by multiplexed
FASTQ description.bcbioSingleCell objects with filterCells() applied.
This information is stored in the metadata() slot as 3 variables: (1)
filterParams, numeric vector of the parameters used to define the cell
filtering cutoffs; (2) filterCells, character vector of the cellular barcode
IDs that passed filtering; (3) filterGenes, character vector of the
Ensembl gene identifiers that have passed filtering, as determined by the
minCellsPerGene parameter.filterCells() return, we're now defaulting to a destructive operation,
where the columns (cells) and rows (genes) of the object are adjusted to match
the cells and genes that have passed filtering. Currently this can be adjusted
with the drop argument for testing, but should generally be left as
drop = TRUE.cell2sample named factor in the metadata() slot,
which makes downstream quality control operations faster. This is generated on
the fly for previously saved objects that don't have a stashed cell2sample.plotQC() utility function, which plots multiple quality
control functions for easy visualization. This defaults to output as a cowplot
grid (return = "grid"), but can alternatively be set to return
R Markdown code (return = "markdown").aggregateReplicates() operation has been improved to properly slot raw
cellular barcodes in object@bcbio$cellularBarcodes. The filterCells vector
is adjusted, and sampleMetadata factors should be properly releveled.counts() accessor simply returns the sparse matrix contained in the
assay() slot. The filterCells argument has been removed.filterCells() function, to help the user
determine at which step the majority of cells are being filtered. We're
keeping a non-destructive option using drop = FALSE for the time being, but
this will likely be removed for improved simplicity in a future update.metrics() to use a simpler join operation on
the colData, cell2sample and sampleMetadata.sampleNameAggregate.plotReadsPerCell() labels and legends. Additionally,
plotReadsPerCell() more efficiently handles the stashed values in the
nCount column of colData, for faster plotting that having to rely on
manipulation of the raw cellularBarcodes list stashed in
object@bcbio$cellularBarcodes.sampleMetadata() return is now consistently sanitized for bcbioSingleCell
and seurat objects.plotFeatureTSNE() utility function. This improves on
Seurat::FeaturePlot() and enables the user to overlay the cluster
identifiers on top of the t-SNE plot. plotFeatures() is now deprecated in
favor of this function..fetchDimDataSeurat()
function. This now keeps the cell ID as the rowname.color), as well as pointSize
and labelSize for plotPCA() and plotTSNE().cell2sample() return.fetchTSNEExpressionData().metrics() accessor not including the cell ID as rownames.plotFeatureTSNE() to assess quality
control metrics on t-SNE.cell2sample data.frame, which helps speed up
operations on cellular barcode metrics calculations for quality control plots.annotable, ensemblVersion, gtfFile, and
sampleMetadataFile arguments in loadSingleCell() function.filterCells() function call.metrics() function will now look for a stashed cell2sample
data.frame, which speeds up operations for quality control plots.droplevels in a selectSamples() call.
The bcbioRNASeq package has also been updated to work in a similar
fashion, where all columns in the sample metadata data.frame are now defined
as factors.bcbioSingleCell to seurat object coercion to stash all of the
bcbio metadata, and simply return the basic seurat object, rather than
trying to also perform normalization and scaling. These steps have instead
been added back to the Seurat R Markdown clustering template.plotCellTypesPerCluster().plotMitoVsCoding(). I broke this code out from
plotMitoRatio(). We could opt to keep this in plotMitoRatio with a
geom = "scatterplot" argument..applyFilterCutoffs() internal function, used to subset
the object to contain only cells and genes that have passed quailty control
filtering.FindAllMarkers() sanitization.boxplot, histogram, ridgeline, and violin (default). Median
labels are applied with the internal .medianLabels() function.cellID to sampleID
matching with a different method. In the future, we'll stash a cell2sample
data.frame inside the object, that makes this operation faster than the
current mclapply() code.loadSingleCell() and loadCellRanger().counts(filterCells = TRUE) function.interestingGroups<-.selectSamples().topMarkers() to match Seurat v2.1 update.bcbioSingleCell to seurat coercion method with setAs().grepl(), gsub()).loadSingleCell() import function.bcbioSingleCell. Legacy
bcbioSCDataSet class can be upgraded to bcbioSingleCell class using
as(bcb, "bcbioSingleCell") coercion.metrics() and quality control functions by
default.filterCells = FALSE where
applicable.sampleNameAggregate column in sample metadata. This doesn't change the
actual counts values. It only applies to visualization in the quality control
plots currently.devtools::test().bcbioSCDataSet to bcbioSingleCell.filterCells() will now slot a named logical vector into
metadata(object)[["filteredCells"]], which will be used to dynamically
subset the slotted internal SummarizedExperiment data. Now that we're using
this approach, we can return a modified bcbioSingleCell object rather than
defining a separate bcbioSCFiltered class.loadSingleCellRun() to loadSingleCell(), to match bcbioRNASeq
package.sampleNameAggregate in the sample metadata.darkTheme()), based on the Seurat theme.plotDot() function, based on Seurat::DotPlot().bcbioSCDataSet and bcbioSCFiltered, which
will be deprecated in a future release.internal-projectDir.R,
internal-readSampleMetadataFile.R, internal-sampleDirs.R. We may want to
provide this code as a shared bcbio core package (e.g. bcbioBase) in the
future..validMarkers()).ensemblVersion).plotClusters() to plotMarkers(). Added soft deprecation.loadSingleCellRun() and loadCellRanger() from S4 generics back
to standard functions.fetchTSNEData(), fetchTSNEExpressionData(),
and plotTSNEExpressionData(). This enable plotting of geometric mean values
of desired marker genes.cellCycleMarkers and cellTypeMarkers data. Now supports
Drosophila.make.names() instead of camel(). This
avoids undesirable coercion of some IDs (e.g. group1_1 into group11).1 instead of 1L).DESCRIPTION file. The package now attaches
Seurat automatically.quantileHeatmap() function.bcbioSCFiltered to seurat coercion to slot relevant bcbio
metadata.bcbioSinglecell to bcbioSingleCell.organism in bcbioSCDataSet metadata, in addition to genomeBuild.bcbioSCFiltered to seurat coercion method to also run
FindVariableGenes() and ScaleData() by default.bcbioSinglecell-package.R
file.download() functionality to basejump package, to avoid
collisions with bcbioRNASeq package. Function has been renamed to
externalFile()..detectPipeline() function is no longer needed..sampleDirs() function..readSparseCounts() function.packageSE() to prepareSE(), matching the corresponding
basejump function change.filter() to tidy_filter(), to avoid future
NAMESPACE collisions with ensembldb package.bcbioSCSubset class to bcbioSCFiltered class.plotZeroesVsDepth() for datasets with high cell
counts.selectSamples(). Will attempt to migrate this
to bracket-based subsetting in a future update.selectSamples() to only work on
bcbioSCFiltered class for the time being. We can add bracket-based
subsetting or S4 method support in selectSamples() to properly work on
bcbioSCDataSet class in a future update.bcbioSCFiltered class coercion to monocle
CellDataSet class.subsetPerSample() function.lowerCamelCase from snake_case.loadRun() to loadSingleCellRun() for improved compatibility with
bcbioRNASeq package. This helps avoid NAMESPACE collisions between
packages.loadCellRanger().filteringCriteria to filterParams in @metadata slot.load_run().load_run() function for improved consistency with bcbioRNASeq
package.Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.