Home

/

GitHub

/

GoekeLab/bambu

/

bambu: long read isoform reconstruction and quantification

bambu: long read isoform reconstruction and quantification
In GoekeLab/bambu: Context-Aware Transcript Quantification from Long Read RNA-Seq data

View source: R/bambu.R

bambu

R Documentation

long read isoform reconstruction and quantification

Description

This function takes bam file of genomic alignments and performs isoform recontruction and gene and transcript expression quantification. It also allows saving of read class files of alignments, extending provided annotations, and quantification based on extended annotations. When multiple samples are provided, extended annotations will be combined across samples to allow comparison.

Usage

bambu(
  reads,
  annotations = NULL,
  genome = NULL,
  NDR = NULL,
  opt.discovery = NULL,
  opt.em = NULL,
  rcOutDir = NULL,
  discovery = TRUE,
  quant = TRUE,
  stranded = FALSE,
  ncore = 1,
  yieldSize = NULL,
  trackReads = FALSE,
  returnDistTable = FALSE,
  lowMemory = FALSE,
  fusionMode = FALSE,
  verbose = FALSE
)

Arguments

`reads`	A string or a vector of strings specifying the paths of bam files for genomic alignments, or a `BamFile` object or a `BamFileList` object (see `Rsamtools`). Alternatively string or a vector of strings specifying the read class files that are saved during previous run of `bambu`.
`annotations`	A path to a .gtf file or a `TxDb` object or a GRangesList object obtained by `prepareAnnotations`.
`genome`	A path to a fasta file or a BSGenome object.
`NDR`	specifying the maximum NDR rate to novel transcript output from detected read classes, defaults to an automatic recommendation
`opt.discovery`	A list of controlling parameters for isoform reconstruction process: remove.subsetTx indicating whether filter to remove read classes which are a subset of known transcripts(), defaults to TRUE min.readCount specifying minimum read count to consider a read class valid in a sample, defaults to 2 min.readFractionByGene specifying minimum relative read count per gene, highly expressed genes will have many high read count low relative abundance transcripts that can be filtered, defaults to 0.05 min.sampleNumber specifying minimum sample number with minimum read count, defaults to 1 min.exonDistance specifying minum distance to known transcript to be considered valid as new, defaults to 35bp min.exonOverlap specifying minimum number of bases shared with annotation to be assigned to the same gene id, defaults to 10bp min.primarySecondaryDist specifying the minimum number of distance threshold, defaults to 5bp min.primarySecondaryDistStartEnd1 specifying the minimum number of distance threshold, used for extending annotation, defaults to 5bp min.primarySecondaryDistStartEnd2 specifying the minimum number of distance threshold, used for estimating distance to annotation, defaults to 5bp min.txScore.multiExon specifying the minimum transcript level threshold for multi-exon transcripts during sample combining, defaults to 0 min.txScore.singleExon specifying the minimum transcript level threshold for single-exon transcripts during sample combining, defaults to 1 fitReadClassModel A boolean specifying if Bambu should attempt to train a transcript discovery model for all samples. Defaults to TRUE defaultModels A model object obtained by codetrainBambu or when returnModel is TRUE returnModel A boolean specifying if the trained model is output with the readclass files. Defaults to FALSE baselineFDR A number between 0 - 1, specifying the false discovery rate used during NDR recomendation. Defaults to 0.1 min.readFractionByEqClass indicating the minimum relative read count of a subset transcript compared to all superset transcripts (ie the relative read count within the minimum equivalent class). This filter is applied on the set of annotations across all samples using the total read count, this is not a per-sample filter. Please use with caution. defaults to 0 prefix specifying prefix for new gene Ids (genePrefix.number), defaults to "Bambu"
`opt.em`	A list of controlling parameters for quantification algorithm estimation process: maxiter specifying maximum number of run iterations, defaults to 10000 degradationBias correcting for degradation bias, defaults to TRUE conv specifying the covergence threshold control, defaults to 0.0001 minvalue specifying the minvalue for convergence consideration, defaults to 0.00000001 sig.digit specifying the maximum significant digits of the reported estimates
`rcOutDir`	A string variable specifying the path to where read class files will be saved.
`discovery`	A logical variable indicating whether annotations are to be extended. Defaults to TRUE
`quant`	A logical variable indicating whether quantification will be performed. If false the output type will change. Defaults to TRUE
`stranded`	A boolean for strandedness, defaults to FALSE.
`ncore`	specifying number of cores used when parallel processing is used, defaults to 1.
`yieldSize`	see `Rsamtools`.
`trackReads`	When TRUE read names will be tracked and output as metadata in the final output as readToTranscriptMaps detailing. the assignment of reads to transcripts. The output is a list with an entry for each sample.
`returnDistTable`	When TRUE the calculated distance table between read classes and annotations will be output as metadata as distTables. The output is a list with an entry for each sample.
`lowMemory`	Read classes will be processed by chromosomes when lowMemory is specified. This option provides an efficient way to process big samples.
`fusionMode`	A logical variable indicating whether run in fusion mode
`verbose`	A logical variable indicating whether processing messages will be printed.

Details

Main function

Value

bambu will output different results depending on whether quant mode is on. By default, quant is set to TRUE, so bambu will generate a SummarizedExperiment object that contains the transcript expression estimates. Transcript expression estimates can be accessed by counts(), including the following variables

counts: expression estimates
CPM: sequencing depth normalized estimates
fullLengthCounts: estimates of read counts mapped as full length reads for each transcript
uniqueCounts: counts of reads that are uniquely mapped to each transcript

Output annotations that are usually the annotations with/without novel transcripts/genes added, depending on whether discovery mode is on can be accessed by rowRanges() Transcript to gene map can be accessed by rowData(), with eqClass that defining equivalent class for each transcript

In the case when quant is set to FALSE, i.e., only transcript discovery is performed, bambu will report the grangeslist of the extended annotations.

Examples

## =====================
test.bam <- system.file("extdata",
    "SGNex_A549_directRNA_replicate5_run1_chr9_1_1000000.bam",
    package = "bambu")
fa.file <- system.file("extdata", 
    "Homo_sapiens.GRCh38.dna_sm.primary_assembly_chr9_1_1000000.fa", 
    package = "bambu")
gr <- readRDS(system.file("extdata", 
    "annotationGranges_txdbGrch38_91_chr9_1_1000000.rds",
    package = "bambu"))
se <- bambu(reads = test.bam, annotations = gr, 
    genome = fa.file,  discovery = TRUE, quant = TRUE)

GoekeLab/bambu documentation built on Feb. 14, 2025, 8:22 a.m.

GoekeLab/bambu index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

GoekeLab/bambu
Context-Aware Transcript Quantification from Long Read RNA-Seq data

bambu: long read isoform reconstruction and quantification
In GoekeLab/bambu: Context-Aware Transcript Quantification from Long Read RNA-Seq data

long read isoform reconstruction and quantification

Description

Usage

Arguments

Details

Value

Examples

Related to bambu in GoekeLab/bambu...

R Package Documentation

Browse R Packages

We want your feedback!

GoekeLab/bambu Context-Aware Transcript Quantification from Long Read RNA-Seq data

bambu: long read isoform reconstruction and quantification In GoekeLab/bambu: Context-Aware Transcript Quantification from Long Read RNA-Seq data

long read isoform reconstruction and quantification

Description

Usage

Arguments

Details

Value

Examples

Related to bambu in GoekeLab/bambu...

R Package Documentation

Browse R Packages

We want your feedback!

GoekeLab/bambu
Context-Aware Transcript Quantification from Long Read RNA-Seq data

bambu: long read isoform reconstruction and quantification
In GoekeLab/bambu: Context-Aware Transcript Quantification from Long Read RNA-Seq data