reliableFeatures: Identify features (e.g., transcripts) with high quality data
In bakR: Analyze and Compare Nucleotide Recoding RNA Sequencing Datasets

reliableFeatures

R Documentation

Identify features (e.g., transcripts) with high quality data

Description

This function identifies all features (e.g., transcripts, exons, etc.) for which the mutation rate is below a set threshold in the control (-s4U) sample and which have more reads than a set threshold in all samples. If there is no -s4U sample, then only the read count cutoff is considered. Additional filtering options are only relevant if working with short RNA-seq read data. This includes filtering out features with extremely low empirical U-content (i.e., the average number of Us in sequencing reads from that feature) and those with very few reads having at least 3 Us in them.

Usage

reliableFeatures(
  obj,
  high_p = 0.2,
  totcut = 50,
  totcut_all = 10,
  Ucut = 0.25,
  AvgU = 4
)

Arguments

`obj`	Object of class bakRData
`high_p`	highest mutation rate accepted in control samples
`totcut`	Numeric; Any transcripts with less than this number of sequencing reads in any replicate of all experimental conditions are filtered out
`totcut_all`	Numeric; Any transcripts with less than this number of sequencing reads in any sample are filtered out
`Ucut`	Must have a fraction of reads with 2 or less Us less than this cutoff in all samples
`AvgU`	Must have an average number of Us greater than this

Value

vector of gene names that passed reliability filter

Examples



# Load cB
data("cB_small")

# Load metadf
data("metadf")

# Create bakRData
bakRData <- bakRData(cB_small, metadf)

# Find reliable features
features_to_keep <- reliableFeatures(obj = bakRData)

bakR documentation built on June 22, 2024, 6:55 p.m.