regionPerReadLength: Find proportion of reads per position per read length in...

regionPerReadLengthR Documentation

Find proportion of reads per position per read length in region

Description

This is defined as: Given some transcript region (like CDS), get coverage per position. By default only returns positions that have hits, set drop.zero.dt to FALSE to get all 0 positions.

Usage

regionPerReadLength(
  grl,
  reads,
  acceptedLengths = NULL,
  withFrames = TRUE,
  scoring = "transcriptNormalized",
  weight = "score",
  exclude.zero.cov.grl = TRUE,
  drop.zero.dt = TRUE,
  BPPARAM = bpparam()
)

Arguments

grl

a GRangesList object with usually either leaders, cds', 3' utrs or ORFs

reads

a GAlignments, GRanges, or precomputed coverage as covRleList (where names of covRle objects are readlengths) of RiboSeq, RnaSeq etc.
Weigths for scoring is default the 'score' column in 'reads'. Can also be random access paths to bigWig or fstwig file. Do not use random access for more than a few genes, then loading the entire files is usually better.

acceptedLengths

an integer vector (NULL), the read lengths accepted. Default NULL, means all lengths accepted.

withFrames

logical TRUE, add ORF frame (frame 0, 1, 2), starting on first position of every grl.

scoring

a character (transcriptNormalized), which meta coverage scoring ? one of (zscore, transcriptNormalized, mean, median, sum, sumLength, fracPos), see ?coverageScorings for more info. Use to decide a scoring of hits per position for metacoverage etc. Set to NULL if you do not want meta coverage, but instead want per gene per position raw counts.

weight

(default: 'score'), if defined a character name of valid meta column in subject. GRanges("chr1", 1, "+", score = 5), would mean score column tells that this alignment region was found 5 times. ORFik ofst, bedoc and .bedo files contains a score column like this. As do CAGEr CAGE files and many other package formats. You can also assign a score column manually.

exclude.zero.cov.grl

logical, default TRUE. Do not include ranges that does not have any coverage (0 reads on them), this makes it faster to run.

drop.zero.dt

logical, default TRUE. If TRUE and as.data.table is TRUE, remove all 0 count positions. This greatly speeds up and most importantly, greatly reduces memory usage. Will not change any plots, unless 0 count positions are used in some sense.

BPPARAM

how many cores/threads to use? default: bpparam()

Value

a data.table with lengths by coverage.

See Also

Other coverage: coverageScorings(), metaWindow(), scaledWindowPositions(), windowPerReadLength()

Examples

# Raw counts per gene per position
cds <- GRangesList(tx1 = GRanges("1", 100:129, "+"))
reads <- GRanges("1", seq(79,129, 3), "+")
reads$size <- 28 # <- Set read length of reads
regionPerReadLength(cds, reads, scoring = NULL)
## Sum up reads in each frame per read length per gene
regionPerReadLength(cds, reads, scoring = "frameSumPerLG")

Roleren/ORFik documentation built on Nov. 13, 2024, 10 p.m.