countRNA: Count RNA-seq data

countRNAR Documentation

Count RNA-seq data

Description

The FRASER package provides multiple functions to extract and count both split and non-spliced reads from bam files. See Detail and Functions for more information.

Usage

countRNAData(
  fds,
  NcpuPerSample = 1,
  minAnchor = 5,
  recount = FALSE,
  BPPARAM = bpparam(),
  genome = NULL,
  junctionMap = NULL,
  filter = TRUE,
  minExpressionInOneSample = 20,
  keepNonStandardChromosomes = TRUE,
  countDir = file.path(workingDir(fds), "savedObjects", nameNoSpace(name(fds))),
  ...
)

getSplitReadCountsForAllSamples(
  fds,
  NcpuPerSample = 1,
  junctionMap = NULL,
  recount = FALSE,
  BPPARAM = bpparam(),
  genome = NULL,
  countFiles = NULL,
  keepNonStandardChromosomes = TRUE,
  outDir = file.path(workingDir(fds), "savedObjects", nameNoSpace(name(fds)),
    "splitCounts")
)

getNonSplitReadCountsForAllSamples(
  fds,
  splitCountRanges,
  NcpuPerSample = 1,
  minAnchor = 5,
  recount = FALSE,
  BPPARAM = bpparam(),
  longRead = FALSE,
  outDir = file.path(workingDir(fds), "savedObjects", nameNoSpace(name(fds)),
    "nonSplitCounts")
)

addCountsToFraserDataSet(fds, splitCounts, nonSplitCounts)

countSplitReads(
  sampleID,
  fds,
  NcpuPerSample = 1,
  genome = NULL,
  recount = FALSE,
  keepNonStandardChromosomes = TRUE,
  bamfile = bamFile(fds[, sampleID]),
  pairedend = pairedEnd(fds[, sampleID]),
  strandmode = strandSpecific(fds),
  cacheFile = getSplitCountCacheFile(sampleID, fds),
  scanbamparam = scanBamParam(fds),
  coldata = colData(fds)
)

mergeCounts(
  countList,
  fds,
  junctionMap = NULL,
  assumeEqual = FALSE,
  spliceSiteCoords = NULL,
  BPPARAM = SerialParam()
)

countNonSplicedReads(
  sampleID,
  splitCountRanges,
  fds,
  NcpuPerSample = 1,
  minAnchor = 5,
  recount = FALSE,
  spliceSiteCoords = NULL,
  longRead = FALSE
)

Arguments

fds

A FraserDataSet object

NcpuPerSample

A BiocParallel param object or a positive integer to configure the parallel backend of the internal loop per sample

minAnchor

Minimum overlap around the Donor/Acceptor for non spliced reads. Default to 5

recount

if TRUE the cache is ignored and the bam file is recounted.

BPPARAM

the BiocParallel parameters for the parallelization

genome

NULL (default) or a character vector specifying the names of the reference genomes that were used to align the reads for each sample. The names have to be in a way accepted by the getBSgenome function. Available genomes can be listed using the available.genomes function from the BSgenome package. If genome is of length 1, the same reference genome will be used for all samples. If genome is supplied and strandSpecific(fds) == 0L (unstranded), then the strand information will be estimated by checking the dinucleotides found at the intron boundaries (see summarizeJunctions in GenomicAlignments package for details). This can e.g. help to avoid ambiguities when adding gene names from a gene annotation to the introns in a later step.

junctionMap

A object or file containing a map of all junctions of interest across all samples

filter

If TRUE, splice sites of introns with low read support in all samples are not considered when calculating the non-split reads. This helps to speed up the subsequent steps.

minExpressionInOneSample

The minimal split read count in at least one sample that is required for an intron to pass the filter.

keepNonStandardChromosomes

Logical value indicating if non standard chromosomes should also be counted. Defaults to TRUE.

countDir

The directory in which the tsv containing the position and counts of the junctions should be placed.

...

Further parameters passed on to Rsubread::featureCounts.

countFiles

If specified, the split read counts for all samples are read from the specified files. Should be a vector of paths to files containing the split read counts for the individual samples. Reading from files is only supported for tsv(.gz) or RDS files containing GRranges objects. The order of the individual sample files should correspond to the order of the samples in the fds.

outDir

The full path to the output folder containing the merged counts. If the given folder already exists and stores a SummarizedExperiment object, the counts from this folder will be read in and used in the following (i.e. the reads are not recounted), unless the option recount=TRUE is used. If this folder doesn't exist or if recount=TRUE, then it will be created after counting has finished.

splitCountRanges

The merged GRanges object containing the positions of all the introns in the dataset over all samples.

longRead

If TRUE, then the isLongRead option of Rsubread::featureCounts is used when counting the non spliced reads overlapping splice sites.

splitCounts

The SummarizedExperiment object containing the position and counts of all the introns in the dataset for all samples.

nonSplitCounts

The SummarizedExperiment object containing the position and non split read counts of all splice sites present in the dataset for all samples.

sampleID

The ID of the sample to be counted.

bamfile

The BAM file to be used to extract the counts. Defaults to the BAM file defined in the FraserDataSet object.

pairedend

TRUE or FALSE if the BAM file is paired end. Defaults to the value specified in the FraserDataSet object.

strandmode

0 (no, default), 1 (stranded), or 2 (revers) to specify the used protocol for the RNA-seq experiment.

cacheFile

File path to the cache, where counts are stored.

scanbamparam

The ScanBamParam object which is used for loading the reads from the BAM file before counting. Defaults to the params stored in the FraserDataSet object.

coldata

The colData as given by the FraserDataSet object.

countList

A list of GRanges objects containing the counts that should be merged into one object.

assumeEqual

Logical indicating whether all objects in countList can be assumed to contain counts for the same ranges. If FALSE, merging of the ranges is performed.

spliceSiteCoords

A GRanges object containing the positions of the splice sites. If it is NULL, then splice sites coordinates are calculated first based on the positions of the junctions defined from the split reads.

Details

The functions described in this file extract and count both the split and the non-spliced reads from bam files.

countRNAData is the main function that takes care of all counting steps and returns a FraserDataSet containing the counts for all samples in the fds.

getSplitReadCountsForAllSamples counts split reads for all samples and getNonSplitReadCountsForAllSamples counts non split reads overlapping splice sites for all samples. addCountsToFraserDataSet adds these counts to an existing fds.

countSplitReads calculates the split read counts for a single sample. countNonSplicedReads counts the non split reads overlapping with splice sites for a single sample.

mergeCounts merges the counts from different samples into a single count object, where the counts for junctions that are not present in a sample are set to zero.

Value

countRNAData returns a FraserDataSet.

getSplitReadCountsForAllSamples returns a GRanges object.

getNonSplitReadCountsForAllSamples returns a GRanges object.

addCountsToFraserDataSet returns a FraserDataSet.

countSplitReads returns a GRanges object.

mergeCounts returns a SummarizedExperiment object.

countNonSplicedReads returns a GRanges object.

Functions

  • countRNAData(): This method extracts and counts the split reads and non spliced reads from RNA bam files.

  • getSplitReadCountsForAllSamples(): This method creates a GRanges object containing the split read counts from all specified samples.

  • getNonSplitReadCountsForAllSamples(): This method creates a GRanges object containing the non split read counts at the exon-intron boundaries inferred from the GRanges object containing the positions of all the introns in this dataset.

  • addCountsToFraserDataSet(): This method adds the split read and non split read counts to a existing FraserDataSet containing the settings.

  • countSplitReads(): This method counts all split reads in a bam file for a single sample.

  • mergeCounts(): This method merges counts for multiple samples into one SummarizedExperiment object.

  • countNonSplicedReads(): This method counts non spliced reads based on the given target (acceptor/donor) regions for a single sample.

Examples

   # On Windows SNOW is the default for the parallele backend, which can be 
   # very slow for many but small tasks. Therefore, we will use 
   # for the example the SerialParam() backend.
   if(.Platform$OS.type != "unix") {
       register(SerialParam())
   }
   
  fds <- countRNAData(createTestFraserSettings())


c-mertes/FRASER documentation built on April 20, 2024, 4:32 p.m.