regionPerReadLength: Find proportion of reads per position per read length in...
In JokingHero/ORFik: Open Reading Frames in Genomics

regionPerReadLength

R Documentation

Find proportion of reads per position per read length in region

Description

This is defined as: Given some transcript region (like CDS), get coverage per position. By default only returns positions that have hits, set drop.zero.dt to FALSE to get all 0 positions.

Usage

regionPerReadLength(
  grl,
  reads,
  acceptedLengths = NULL,
  withFrames = TRUE,
  scoring = "transcriptNormalized",
  weight = "score",
  exclude.zero.cov.grl = TRUE,
  drop.zero.dt = TRUE,
  BPPARAM = bpparam()
)

Arguments

`grl`	a `GRangesList` object with usually either leaders, cds', 3' utrs or ORFs
`reads`	a `GAlignments`, `GRanges`, or precomputed coverage as `covRleList` (where names of covRle objects are readlengths) of RiboSeq, RnaSeq etc. Weigths for scoring is default the 'score' column in 'reads'. Can also be random access paths to bigWig or fstwig file. Do not use random access for more than a few genes, then loading the entire files is usually better.
`acceptedLengths`	an integer vector (NULL), the read lengths accepted. Default NULL, means all lengths accepted.
`withFrames`	logical TRUE, add ORF frame (frame 0, 1, 2), starting on first position of every grl.
`scoring`	a character (transcriptNormalized), which meta coverage scoring ? one of (zscore, transcriptNormalized, mean, median, sum, sumLength, fracPos), see ?coverageScorings for more info. Use to decide a scoring of hits per position for metacoverage etc. Set to NULL if you do not want meta coverage, but instead want per gene per position raw counts.
`weight`	(default: 'score'), if defined a character name of valid meta column in subject. GRanges("chr1", 1, "+", score = 5), would mean score column tells that this alignment region was found 5 times. Formats which loads a score column like this: Bigwig, wig, ORFik ofst, collapsed bam, bedoc and .bedo. As do CAGEr CAGE files and many other package formats. You can also assign a score column manually.
`exclude.zero.cov.grl`	logical, default TRUE. Do not include ranges that does not have any coverage (0 reads on them), this makes it faster to run.
`drop.zero.dt`	logical, default TRUE. If TRUE and as.data.table is TRUE, remove all 0 count positions. This greatly speeds up and most importantly, greatly reduces memory usage. Will not change any plots, unless 0 count positions are used in some sense.
`BPPARAM`	how many cores/threads to use? default: bpparam()

Value

a data.table with lengths by coverage.

Examples

# Raw counts per gene per position
cds <- GRangesList(tx1 = GRanges("1", 100:129, "+"))
reads <- GRanges("1", seq(79,129, 3), "+")
reads$size <- 28 # <- Set read length of reads
regionPerReadLength(cds, reads, scoring = NULL)
## Sum up reads in each frame per read length per gene
regionPerReadLength(cds, reads, scoring = "frameSumPerLG")

JokingHero/ORFik documentation built on June 9, 2025, 8:46 p.m.