bcbioRNASeq-class: 'bcbioRNASeq' Object and Constructor

Description Usage Arguments Details Value Metadata DESeq2 Remote Data Note Author(s) See Also Examples

Description

bcbioRNASeq is an S4 class that extends RangedSummarizedExperiment, and is designed to store a bcbio RNA-seq analysis.

Usage

1
2
3
4
5
bcbioRNASeq(uploadDir, level = c("genes", "transcripts"),
  caller = c("salmon", "kallisto", "sailfish"), organism, samples = NULL,
  sampleMetadataFile = NULL, interestingGroups = "sampleName",
  ensemblRelease = NULL, genomeBuild = NULL, transgeneNames = NULL,
  spikeNames = NULL, gffFile = NULL, transformationLimit = 50L, ...)

Arguments

uploadDir

Path to final upload directory. This path is set when running bcbio_nextgen -w template.

level

Import counts as "genes" (default) or "transcripts".

caller

Expression caller. Supports "salmon" (default), "kallisto", or "sailfish".

organism

Organism name. Use the full latin name (e.g. "Homo sapiens"), since this will be input downstream to AnnotationHub and ensembldb, unless gffFile is set. If set NULL (advanced use; not recommended), the function call will skip loading gene/transcript-level annotations into rowRanges(). This can be useful for poorly annotation genomes or experiments involving multiple genomes.

samples

Optional. Specify a subset of samples to load. The names must match the description specified in the bcbio YAML metadata. If a sampleMetadataFile is provided, that will take priority for sample selection. Typically this can be left unset.

sampleMetadataFile

Optional. Custom metadata file containing sample information. Otherwise defaults to sample metadata saved in the YAML file. Remote URLs are supported. Typically this can be left unset.

interestingGroups

Character vector denoting groups of interest that define the samples. If left unset, defaults to sampleName.

ensemblRelease

Optional. Ensembl release version. If unset, defaults to current release, and does not typically need to be user-defined. Passed to AnnotationHub for EnsDb annotation matching, unless gffFile is set.

genomeBuild

Optional. Ensembl genome build name (e.g. "GRCh38"). This will be passed to AnnotationHub for EnsDb annotation matching, unless gffFile is set.

transgeneNames

character vector indicating which assay() rows denote transgenes (e.g. EGFP, TDTOMATO).

spikeNames

character vector indicating which assay() rows denote spike-in sequences (e.g. ERCCs).

gffFile

Advanced use; not recommended. By default, we recommend leaving this NULL for genomes that are supported on Ensembl. In this case, the row annotations (rowRanges()) will be obtained automatically from Ensembl by passing the organism, genomeBuild, and ensemblRelease arguments to AnnotationHub and ensembldb. For a genome that is not supported on Ensembl and/or AnnotationHub, a GFF/GTF (General Feature Format) file is required. Generally, we recommend using a GTF (GFFv2) file here over a GFF3 file if possible, although all GFF formats are supported. The function will internally generate a TxDb containing transcript-to-gene mappings and construct a GRanges object containing the genomic ranges (rowRanges()).

transformationLimit

Maximum number of samples to calculate DESeq2::rlog() and DESeq2::varianceStabilizingTransformation() matrix. For large datasets, DESeq2 is slow to apply variance stabilization. In this case, we recommend loading up the dataset in a high-performance computing environment. Use Inf to always apply and -Inf to always skip.

...

Additional arguments, slotted into the metadata() accessor.

Details

Simply point to the final upload directory generated by bcbio, and this constructor function will take care of the rest. It automatically imports RNA-seq counts, metadata, and the program versions used.

This class contains raw read counts and length-scaled transcripts per million (TPM) generated by tximport::tximport(). Counts can be loaded at gene or transcript level.

Value

bcbioRNASeq.

Metadata

The metadata() accessor contains:

DESeq2

DESeq2 is run automatically when bcbioRNASeq() is called, and variance stabilized counts are slotted into assays(). If the number of samples is bigger than the transformationLimit argument, rlog and vst counts will not be slotted into assays(). In this case, we recommend visualization using tmm() counts, which are automatically calculated using edgeR.

Remote Data

When working in RStudio, we recommend connecting to the bcbio run directory as a remote connection over sshfs.

Note

bcbioRNASeq extended SummarizedExperiment prior to v0.2.0, where we migrated to RangedSummarizedExperiment.

Author(s)

Michael Steinbaugh, Lorena Pantano, Rory Kirchner, Victor Barrera

See Also

Other S4 Class Definition: [,bcbioRNASeq,ANY,ANY,ANY-method, coerce, show, updateObject

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
uploadDir <- system.file("extdata/bcbio", package = "bcbioRNASeq")

# Gene level
x <- bcbioRNASeq(
    uploadDir = uploadDir,
    level = "genes",
    caller = "salmon",
    organism = "Mus musculus",
    ensemblRelease = 87L
)
show(x)
is(x, "RangedSummarizedExperiment")
validObject(x)

# Transcript level
x <- bcbioRNASeq(
    uploadDir = uploadDir,
    level = "transcripts",
    caller = "salmon",
    organism = "Mus musculus",
    ensemblRelease = 87L
)
show(x)
validObject(x)

roryk/bcbioRnaseq documentation built on May 27, 2019, 10:44 p.m.