countRNA: Count RNA-seq data
In c-mertes/FRASER: Find RAre Splicing Events in RNA-Seq Data

countRNA

R Documentation

Count RNA-seq data

Description

The FRASER package provides multiple functions to extract and count both split and non-spliced reads from bam files. See Detail and Functions for more information.

Usage

countRNAData(
  fds,
  NcpuPerSample = 1,
  minAnchor = 5,
  recount = FALSE,
  BPPARAM = bpparam(),
  genome = NULL,
  junctionMap = NULL,
  filter = TRUE,
  minExpressionInOneSample = 20,
  keepNonStandardChromosomes = TRUE,
  countDir = file.path(workingDir(fds), "savedObjects", nameNoSpace(name(fds))),
  ...
)

getSplitReadCountsForAllSamples(
  fds,
  NcpuPerSample = 1,
  junctionMap = NULL,
  recount = FALSE,
  BPPARAM = bpparam(),
  genome = NULL,
  countFiles = NULL,
  keepNonStandardChromosomes = TRUE,
  outDir = file.path(workingDir(fds), "savedObjects", nameNoSpace(name(fds)),
    "splitCounts")
)

getNonSplitReadCountsForAllSamples(
  fds,
  splitCountRanges,
  NcpuPerSample = 1,
  minAnchor = 5,
  recount = FALSE,
  BPPARAM = bpparam(),
  longRead = FALSE,
  outDir = file.path(workingDir(fds), "savedObjects", nameNoSpace(name(fds)),
    "nonSplitCounts")
)

addCountsToFraserDataSet(fds, splitCounts, nonSplitCounts)

countSplitReads(
  sampleID,
  fds,
  NcpuPerSample = 1,
  genome = NULL,
  recount = FALSE,
  keepNonStandardChromosomes = TRUE,
  bamfile = bamFile(fds[, sampleID]),
  pairedend = pairedEnd(fds[, sampleID]),
  strandmode = strandSpecific(fds[, sampleID]),
  cacheFile = getSplitCountCacheFile(sampleID, fds),
  scanbamparam = scanBamParam(fds),
  coldata = colData(fds)
)

mergeCounts(
  countList,
  fds,
  junctionMap = NULL,
  assumeEqual = FALSE,
  spliceSiteCoords = NULL,
  BPPARAM = SerialParam()
)

countNonSplicedReads(
  sampleID,
  splitCountRanges,
  fds,
  NcpuPerSample = 1,
  minAnchor = 5,
  recount = FALSE,
  spliceSiteCoords = NULL,
  longRead = FALSE
)

Arguments

`fds`	A `FraserDataSet` object
`NcpuPerSample`	A BiocParallel param object or a positive integer to configure the parallel backend of the internal loop per sample
`minAnchor`	Minimum overlap around the Donor/Acceptor for non spliced reads. Default to 5
`recount`	if TRUE the cache is ignored and the bam file is recounted.
`BPPARAM`	the BiocParallel parameters for the parallelization
`genome`	NULL (default) or a character vector specifying the names of the reference genomes that were used to align the reads for each sample. The names have to be in a way accepted by the `getBSgenome` function. Available genomes can be listed using the `available.genomes` function from the BSgenome package. If genome is of length 1, the same reference genome will be used for all samples. If `genome` is supplied and `strandSpecific(fds) == 0L` (unstranded), then the strand information will be estimated by checking the dinucleotides found at the intron boundaries (see `summarizeJunctions` in GenomicAlignments package for details). This can e.g. help to avoid ambiguities when adding gene names from a gene annotation to the introns in a later step.
`junctionMap`	A object or file containing a map of all junctions of interest across all samples
`filter`	If TRUE, splice sites of introns with low read support in all samples are not considered when calculating the non-split reads. This helps to speed up the subsequent steps.
`minExpressionInOneSample`	The minimal split read count in at least one sample that is required for an intron to pass the filter.
`keepNonStandardChromosomes`	Logical value indicating if non standard chromosomes should also be counted. Defaults to `TRUE`.
`countDir`	The directory in which the tsv containing the position and counts of the junctions should be placed.
`...`	Further parameters passed on to Rsubread::featureCounts.
`countFiles`	If specified, the split read counts for all samples are read from the specified files. Should be a vector of paths to files containing the split read counts for the individual samples. Reading from files is only supported for tsv(.gz) or RDS files containing GRranges objects. The order of the individual sample files should correspond to the order of the samples in the fds.
`outDir`	The full path to the output folder containing the merged counts. If the given folder already exists and stores a SummarizedExperiment object, the counts from this folder will be read in and used in the following (i.e. the reads are not recounted), unless the option recount=TRUE is used. If this folder doesn't exist or if recount=TRUE, then it will be created after counting has finished.
`splitCountRanges`	The merged GRanges object containing the positions of all the introns in the dataset over all samples.
`longRead`	If TRUE, then the isLongRead option of Rsubread::featureCounts is used when counting the non spliced reads overlapping splice sites.
`splitCounts`	The SummarizedExperiment object containing the position and counts of all the introns in the dataset for all samples.
`nonSplitCounts`	The SummarizedExperiment object containing the position and non split read counts of all splice sites present in the dataset for all samples.
`sampleID`	The ID of the sample to be counted.
`bamfile`	The BAM file to be used to extract the counts. Defaults to the BAM file defined in the `FraserDataSet` object.
`pairedend`	`TRUE` or `FALSE` if the BAM file is paired end. Defaults to the value specified in the `FraserDataSet` object.
`strandmode`	0 (no, default), 1 (stranded), or 2 (revers) to specify the used protocol for the RNA-seq experiment.
`cacheFile`	File path to the cache, where counts are stored.
`scanbamparam`	The ScanBamParam object which is used for loading the reads from the BAM file before counting. Defaults to the params stored in the `FraserDataSet` object.
`coldata`	The colData as given by the `FraserDataSet` object.
`countList`	A list of GRanges objects containing the counts that should be merged into one object.
`assumeEqual`	Logical indicating whether all objects in `countList` can be assumed to contain counts for the same ranges. If FALSE, merging of the ranges is performed.
`spliceSiteCoords`	A GRanges object containing the positions of the splice sites. If it is NULL, then splice sites coordinates are calculated first based on the positions of the junctions defined from the split reads.

Details

The functions described in this file extract and count both the split and the non-spliced reads from bam files.

countRNAData is the main function that takes care of all counting steps and returns a FraserDataSet containing the counts for all samples in the fds.

getSplitReadCountsForAllSamples counts split reads for all samples and getNonSplitReadCountsForAllSamples counts non split reads overlapping splice sites for all samples. addCountsToFraserDataSet adds these counts to an existing fds.

countSplitReads calculates the split read counts for a single sample. countNonSplicedReads counts the non split reads overlapping with splice sites for a single sample.

mergeCounts merges the counts from different samples into a single count object, where the counts for junctions that are not present in a sample are set to zero.

Value

countRNAData returns a FraserDataSet.

getSplitReadCountsForAllSamples returns a GRanges object.

getNonSplitReadCountsForAllSamples returns a GRanges object.

addCountsToFraserDataSet returns a FraserDataSet.

countSplitReads returns a GRanges object.

mergeCounts returns a SummarizedExperiment object.

countNonSplicedReads returns a GRanges object.

Functions

countRNAData(): This method extracts and counts the split reads and non spliced reads from RNA bam files.
getSplitReadCountsForAllSamples(): This method creates a GRanges object containing the split read counts from all specified samples.
getNonSplitReadCountsForAllSamples(): This method creates a GRanges object containing the non split read counts at the exon-intron boundaries inferred from the GRanges object containing the positions of all the introns in this dataset.
addCountsToFraserDataSet(): This method adds the split read and non split read counts to a existing FraserDataSet containing the settings.
countSplitReads(): This method counts all split reads in a bam file for a single sample.
mergeCounts(): This method merges counts for multiple samples into one SummarizedExperiment object.
countNonSplicedReads(): This method counts non spliced reads based on the given target (acceptor/donor) regions for a single sample.

Examples

   # On Windows SNOW is the default for the parallele backend, which can be 
   # very slow for many but small tasks. Therefore, we will use 
   # for the example the SerialParam() backend.
   if(.Platform$OS.type != "unix") {
       register(SerialParam())
   }
   
  fds <- countRNAData(createTestFraserSettings())

c-mertes/FRASER documentation built on Feb. 22, 2025, 1:24 a.m.