predictTxFeaturesPerSample: Identification of splice junctions and exons from BAM file

Description Usage Arguments Details Value Author(s)

View source: R/features-prediction.R

Description

Splice junctions and exons are predicted from genomic RNA-seq read alignments in BAM format.

Usage

1
2
3
4
predictTxFeaturesPerSample(file_bam, which, paired_end, read_length,
  frag_length, lib_size, min_junction_count, alpha, psi, beta, gamma,
  min_anchor, include_counts, retain_coverage, junctions_only, max_complexity,
  sample_name, verbose, cores)

Arguments

file_bam

BAM file with genomic RNA-seq read alignments

which

GRanges of genomic regions to be considered for feature prediction, passed to ScanBamParam

paired_end

Logical, TRUE for paired-end data, FALSE for single-end data

read_length

Read length required for use with alpha

frag_length

Fragment length for paired-end data required for use with alpha

lib_size

Number of aligned fragments required for use with alpha

min_junction_count

Minimum fragment count required for a splice junction to be included. If specified, argument alpha is ignored.

alpha

Minimum FPKM required for a splice junction to be included. Internally, FPKMs are converted to counts, requiring arguments read_length, frag_length and lib_size. alpha is ignored if argument min_junction_count is specified.

psi

Minimum splice frequency required for a splice junction to be included

beta

Minimum relative coverage required for an internal exon to be included

gamma

Minimum relative coverage required for a terminal exon to be included

min_anchor

Integer specifiying minimum anchor length

include_counts

Logical indicating whether counts of compatible fragments should be included in metadata column “N”

retain_coverage

Logical indicating whether coverage for each exon should be retained as an RleList in metadata column “coverage”. This allows filtering of features using more stringent criteria after the initial prediction.

junctions_only

Logical indicating whether predictions should be limited to identification of splice junctions only

max_complexity

Maximum allowed complexity. If a locus exceeds this threshold, it is skipped, resulting in a warning. Complexity is defined as the maximum number of unique predicted splice junctions overlapping a given position. High complexity regions are often due to spurious read alignments and can slow down processing. To disable this filter, set to NA.

sample_name

Sample name used in messages

verbose

If TRUE, generate messages indicating progress

cores

Number of cores available for parallel processing

Details

For spliced alignments, the direction of transcription is inferred from the XS tag in the BAM file and used to assign strand information to the read, or fragment for paired-end data.

Feature prediction is performed in two steps. First, splice junctions are identified from spliced alignments. Second, exons are identified based on regions that are flanked by splice junctions and show sufficient coverage with compatible reads.

Splice junctions implied by read alignments are filtered based on fragment count and splice frequency. The splice frequency at the splice donor (acceptor) is defined as x_J/x_D (x_J/x_A), where x_J is the number of fragments containing the splice junction, and x_D (x_A) is the number of fragments overlapping the exon/intron (intron/exon) boundary. Fragments overlapping the spliced boundary can be either spliced or extend into the intron. To be included in predicted features, splice junctions must have fragment count at least min_junction_count or FPKM at least alpha, and splice frequency at both donor and acceptor at least psi.

Regions between any pair of identified splice junctions with sufficient compatible read coverage are considered candidate internal exons. Read coverage for a candidate exon is computed based on compatible fragments, i.e. fragments with matching (or missing) strand information and introns consistent with the exon under consideration. Candidate exons are included in predicted features if the minimum coverage is at least beta * number of junction-containing fragments for either flanking junctions.

Terminal exons are regions downstream or upstream of splice junctions with compatible fragment coverage at least gamma * number of junction-containing fragments.

Value

TxFeatures object

Author(s)

Leonard Goldstein


ldg21/SGSeq documentation built on Oct. 14, 2020, 9:51 p.m.