ODER: ODER: Optimising the Definition of Expressed Regions
In eolagbaju/ODER: Optimising the Definition of Expressed Regions

Description Usage Arguments Value Examples

The aim of ODER is to identify previously unannotated expressed regions (ERs) using RNA-sequencing data. For this purpose, ODER defines and optimises the definition of ERs, then connected these ERs to genes using junction data. In this way, ODER improves gene annotation. Gene annotation is a staple input of many bioinformatic pipelines and a more complete gene annotation can enable more accurate interpretation of disease associated variants.

Returns the optimum definition of the expressed regions by finding the ideal MCC (Mean Coverage Cutoff) and MRG (Max Region Gap). The combination of MCC and MRG that returns the expressed region with the smallest exon delta is the most ideal.

ODER(
  bw_paths,
  auc_raw,
  auc_target,
  chrs = "",
  genome = "hg38",
  mccs,
  mrgs,
  gtf = NULL,
  ucsc_chr,
  ignore.strand,
  exons_no_overlap = NULL,
  biotype = "Non-overlapping",
  bw_chr = "chr",
  file_type = "non-stranded",
  bw_pos = NULL,
  bw_neg = NULL,
  auc_raw_pos = NULL,
  auc_raw_neg = NULL
)

`bw_paths`	path(s) to bigwig file(s) with the RNA-seq data that you want the #' coverage of.
`auc_raw`	vector containing AUCs(Area Under Coverage) matching the order of bigwig path(s).
`auc_target`	total AUC to normalise all samples to e.g. 40e6 * 100 would be the estimated total auc for sample sequenced to 40 million reads of 100bp in length.
`chrs`	chromosomes to obtain mean coverage for, default is "" giving every chromosome. Can take UCSC format(chrs = "chr1") or just the chromosome i.e. chrs = c(1,X)
`genome`	the UCSC genome you want to use, the default is hg38.
`mccs`	mean coverage cut-offs to apply.
`mrgs`	max region gaps to apply.
`gtf`	Either a string containg the path to a .gtf file or a pre-imported gtf using `rtracklayer::import` .
`ucsc_chr`	logical scalar, determining whether to add "chr" prefix to the seqnames of non-overlapping exons and change "chrMT" -> "chrM". Note, if set to TRUE and seqnames already have "chr", it will not add another.
`ignore.strand`	logical value for input into `findOverlaps`, default is True.
`exons_no_overlap`	Optimum set of exons to help calculate deltas
`biotype`	Filters the GTF file passed in to what would be considered the "Gold Standard" exons. The Default is "Non-overlapping" but the options are: "Non-overlapping" (exons that don't intersect each other), "Three Prime" (3' UTR), "Five Prime" (5' UTR), "Internal" (Internal coding), "lncRNA" (Long Non-Coding RNA), "ncRNA" (Non-Coding RNA) and "Pseudogene"
`bw_chr`	specifies whether the bigwig files has the chromosomes labelled with a "chr" preceding the chromosome i.e. "chr1" vs "1". Can be either "chr" or "nochr" with "chr" being the default.
`file_type`	Describes if the BigWigs are stranded or not. Either "stranded" or non-stranded
`bw_pos`	positive strand bigwig file
`bw_neg`	negative strand bigwig file
`auc_raw_pos`	vector containing AUCs(Area Under Coverage) matching the order of the positive bigwig paths.
`auc_raw_neg`	vector containing AUCs(Area Under Coverage) matching the order of the negative bigwig paths.

list containing optimised ERs, optimal pair of MCC/MRGs and delta_df

rec_url <- recount::download_study(
    project = "SRP012682",
    type = "samples",
    download = FALSE
)

# file_cache is an internal function to download a bigwig file from a link
# if the file has been downloaded recently, it will be retrieved from a cache
bw_path <- file_cache(rec_url[1])
gtf_url <- paste0(
    "http://ftp.ensembl.org/pub/release-103/gtf/",
    "homo_sapiens/Homo_sapiens.GRCh38.103.chr.gtf.gz"
)
gtf_path <- file_cache(gtf_url)

# As of rtracklayer 1.25.16, BigWig is not supported on Windows.
data(gtex_SRP012682_SRX222703_lung_auc_1, package = "ODER")
if (!xfun::is_windows()) {
    opt_ers <- ODER(
        bw_paths = bw_path,
        auc_raw = gtex_SRP012682_SRX222703_lung_auc_1,
        auc_target = 40e6 * 100, chrs = c("chr21", "chr22"),
        genome = "hg38", mccs = c(5, 10), mrgs = c(10, 20),
        gtf = gtf_path, ucsc_chr = TRUE, ignore.strand = TRUE,
        exons_no_overlap = NULL, bw_chr = "chr"
    )

    opt_ers
}