bcbioRNASeq: bcbio RNA-Seq data set
In hbc/bcbioRnaseq: Bcbio RNA-Seq

bcbioRNASeq

R Documentation

bcbio RNA-Seq data set

Description

bcbioRNASeq is an S4 class that extends RangedSummarizedExperiment, and is designed to store a bcbio RNA-seq analysis.

Usage

bcbioRNASeq(
  uploadDir,
  level = c("genes", "transcripts"),
  caller = c("salmon", "kallisto", "sailfish", "star", "hisat2"),
  samples = NULL,
  censorSamples = NULL,
  sampleMetadataFile = NULL,
  organism = NULL,
  genomeBuild = NULL,
  ensemblRelease = NULL,
  gffFile = NULL,
  transgeneNames = NULL,
  countsFromAbundance = "lengthScaledTPM",
  interestingGroups = "sampleName",
  fast = FALSE
)

Arguments

`uploadDir`	`character(1)`. Final upload directory path.
`level`	`character(1)`. Import counts at gene level ("`genes`"; default) or transcript level ("`transcripts`"; advanced use). Only tximport-compatible callers (e.g. salmon, kallisto, sailfish) can be loaded at transcript level. Aligned counts from featureCounts-compatible callers (e.g. STAR, HISAT2) can only be loaded at gene level.
`caller`	`character(1)`. Expression caller: `"salmon"` (default): Salmon alignment-free, quasi-mapped counts. `"kallisto"`: Kallisto alignment-free, pseudo-aligned counts. `"sailfish"`: Sailfish alignment-free, lightweight counts. `"star"`: STAR (Spliced Transcripts Alignment to a Reference) aligned counts. `"hisat2"`: HISAT2 (Hierarchical Indexing for Spliced Alignment of Transcripts) graph-based aligned counts.
`samples`	`character`. Sample identifiers.
`censorSamples`	`character`. Specify a subset of samples to censor.
`sampleMetadataFile`	`character(1)`. Sample metadata file path. CSV or TSV is preferred, but Excel worksheets are also supported. Check the documentation for conventions and required columns.
`organism`	`character(1)`. Full Latin organism name (e.g. `"Homo sapiens"`).
`genomeBuild`	`character(1)`. Ensembl genome build assembly name (e.g. `"GRCh38"`). If set `NULL`, defaults to the most recent build available. Note: don't pass in UCSC build IDs (e.g. `"hg38"`).
`ensemblRelease`	`integer(1)`. Ensembl release version (e.g. `100`). We recommend setting this value if possible, for improved reproducibility. When left unset, the latest release available via AnnotationHub/ensembldb is used. Note that the latest version available can vary, depending on the versions of AnnotationHub and ensembldb in use.
`gffFile`	`character(1)`. GFF/GTF (General Feature Format) file. Generally, we recommend using GTF (GFFv2) instead of GFFv3.
`transgeneNames`	`character`. Vector indicating which assay rows denote transgenes (e.g. EGFP, TDTOMATO).
`countsFromAbundance`	`character(1)`. Whether to generate estimated counts using abundance estimates (recommended by default). `lengthScaledTPM` is a suitable default, and counts are scaled using the average transcript length over samples and then the library size. Refer to `tximport::tximport()` for more information on this parameter, but it should only ever be changed when loading some datasets at transcript level (e.g. for DTU analsyis).
`interestingGroups`	`character`. Groups of interest to use for visualization. Corresponds to factors describing the columns of the object.
`fast`	`logical(1)`. Fast mode. Skip internal DESeq2 calculations and transformations. Don't enable this setting when using the quality control R Markdown template. Note that some plotting functions, such as `plotPca()` will not work when this mode is enabled.

Details

Automatically imports RNA-seq counts, metadata, and the program versions used from a bcbio RNA-seq run. Simply point to the final upload directory generated by bcbio, and this generator function will take care of the rest.

Value

bcbioRNASeq.

Sample metadata

When loading a bcbio RNA-seq run, the sample metadata will be imported automatically from the project-summary.yaml file in the final upload directory. If you notice any typos in your metadata after completing the run, these can be corrected by editing the YAML file.

Alternatively, you can pass in a sample metadata file into the bcbioRNASeq() function call using the sampleMetadataFile argument. This requires either a CSV or Excel spreadsheet.

The samples in the bcbio run must map to the description column. The values provided in description must be unique. These values will be sanitized into syntactically valid names (see make.names() for more information), and assigned as the column names of the bcbioRNASeq object. The original values are stored as the sampleName column in colData, and are used for all plotting functions. Do not attempt to set a sampleId column, as this is used internally by the package.

Here is a minimal example of a properly formatted sample metadata file:

description	genotype
sample1	wildtype
sample2	knockout
sample3	wildtype
sample4	knockout

Valid names

R is strict about values that are considered valid for use in names() and dimnames() (i.e. rownames() and colnames(). Non-alphanumeric characters, spaces, and dashes are not valid. Use either underscores or periods in place of dashes when working in R. Also note that names should not begin with a number, and will be prefixed with an X when sanitized. Consult the documentation in the make.names() function for more information. We strongly recommend adhering to these conventions when labeling samples, to help avoid unexpected downstream behavior in R due to dimnames() mismatches.

Genome annotations

bcbioRNASeq() provides support for automatic import of genome annotations, which internally get processed into genomic ranges (GRanges) and are slotted into the rowRanges() of the object. Currently, we offer support for (1) Ensembl genome annotations from AnnotationHub via ensembldb (recommended); or (2) direct import from a GTF/GFF file using rtracklayer.

ensembldb requires the organism and ensemblRelease arguments to be defined. When both of these are set, bcbioRNASeq will attempt to download and use the pre-built Ensembl genome annotations from AnnotationHub. This method is preferred over direct loading of a GTF/GFF file because the AnnotationHub annotations contain additional rich metadata not defined in a GFF file, specifically description and entrezId values.

Alternatively, if you are working with a non-standard or poorly annotated genome that isn't available on AnnotationHub, we provide fall back support for loading the genome annotations directly from the GTF file used by the bcbio RNA-seq pipeline. This should be fully automatic for an R session active on the same server used to run bcbio.

Example bcbio GTF path: genomes/Hsapiens/hg38/rnaseq/ref-transcripts.gtf.

In the event that you are working from a remote environment that doesn't have file system access to the bcbio genomes directory, we provide additional fall back support for importing genome annotations from a GTF/GFF directly with the gffFile argument.

Internally, genome annotations are imported via the AcidGenomes package, specifically with either of these functions:

AcidGenomes::makeGRangesFromEnsembl().
AcidGenomes::makeGRangesFromGff().

Genome build

Ensure that the organism and genome build used with bcio match correctly here in the function call. In particular, for the legacy Homo sapiens GRCh37/hg19 genome build, ensure that genomeBuild = "GRCh37". Otherwise, the genomic ranges set in rowRanges() will mismatch. It is recommended for current projects that GRCh38/hg38 is used in place of GRCh37/hg19 if possible.

DESeq2

DESeq2 is run automatically when bcbioRNASeq() is called, unless fast = TRUE is set. Internally, this automatically slots normalized counts into assays(), and generates variance-stabilized counts.

Remote connections

When working on a local machine, it is possible to load bcbio run data over a remote connection using sshfs. When loading a large number of samples, it is preferable to call bcbioRNASeq() directly in R on the remote server, if possible.

Note

Updated 2022-05-09.

Author(s)

Michael Steinbaugh, Lorena Pantano, Rory Kirchner, Victor Barrera

Examples

uploadDir <- system.file("extdata/bcbio", package = "bcbioRNASeq")

## Gene level.
object <- bcbioRNASeq(
    uploadDir = uploadDir,
    level = "genes",
    caller = "salmon",
    organism = "Mus musculus",
    ensemblRelease = 87L
)
print(object)

## Transcript level.
object <- bcbioRNASeq(
    uploadDir = uploadDir,
    level = "transcripts",
    caller = "salmon",
    organism = "Mus musculus",
    ensemblRelease = 87L
)
print(object)

## Fast mode.
object <- bcbioRNASeq(uploadDir = uploadDir, fast = TRUE)

hbc/bcbioRnaseq documentation built on Sept. 15, 2024, 9:10 a.m.

hbc/bcbioRnaseq index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

hbc/bcbioRnaseq
Bcbio RNA-Seq

bcbioRNASeq: bcbio RNA-Seq data set
In hbc/bcbioRnaseq: Bcbio RNA-Seq

bcbio RNA-Seq data set

Description

Usage

Arguments

Details

Value

Sample metadata

Valid names

Genome annotations

Genome build

DESeq2

Remote connections

Note

Author(s)

See Also

Examples

Related to bcbioRNASeq in hbc/bcbioRnaseq...

R Package Documentation

Browse R Packages

We want your feedback!

hbc/bcbioRnaseq Bcbio RNA-Seq

bcbioRNASeq: bcbio RNA-Seq data set In hbc/bcbioRnaseq: Bcbio RNA-Seq

bcbio RNA-Seq data set

Description

Usage

Arguments

Details

Value

Sample metadata

Valid names

Genome annotations

Genome build

DESeq2

Remote connections

Note

Author(s)

See Also

Examples

Related to bcbioRNASeq in hbc/bcbioRnaseq...

R Package Documentation

Browse R Packages

We want your feedback!

hbc/bcbioRnaseq
Bcbio RNA-Seq

bcbioRNASeq: bcbio RNA-Seq data set
In hbc/bcbioRnaseq: Bcbio RNA-Seq