ODER: ODER: Optimising the Definition of Expressed Regions

Description Usage Arguments Value Examples

View source: R/ODER.R

Description

The aim of ODER is to identify previously unannotated expressed regions (ERs) using RNA-sequencing data. For this purpose, ODER defines and optimises the definition of ERs, then connected these ERs to genes using junction data. In this way, ODER improves gene annotation. Gene annotation is a staple input of many bioinformatic pipelines and a more complete gene annotation can enable more accurate interpretation of disease associated variants.

Returns the optimum definition of the expressed regions by finding the ideal MCC (Mean Coverage Cutoff) and MRG (Max Region Gap). The combination of MCC and MRG that returns the expressed region with the smallest exon delta is the most ideal.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
ODER(
  bw_paths,
  auc_raw,
  auc_target,
  chrs = "",
  genome = "hg38",
  mccs,
  mrgs,
  gtf = NULL,
  ucsc_chr,
  ignore.strand,
  exons_no_overlap = NULL,
  biotype = "Non-overlapping",
  bw_chr = "chr",
  file_type = "non-stranded",
  bw_pos = NULL,
  bw_neg = NULL,
  auc_raw_pos = NULL,
  auc_raw_neg = NULL
)

Arguments

bw_paths

path(s) to bigwig file(s) with the RNA-seq data that you want the #' coverage of.

auc_raw

vector containing AUCs(Area Under Coverage) matching the order of bigwig path(s).

auc_target

total AUC to normalise all samples to e.g. 40e6 * 100 would be the estimated total auc for sample sequenced to 40 million reads of 100bp in length.

chrs

chromosomes to obtain mean coverage for, default is "" giving every chromosome. Can take UCSC format(chrs = "chr1") or just the chromosome i.e. chrs = c(1,X)

genome

the UCSC genome you want to use, the default is hg38.

mccs

mean coverage cut-offs to apply.

mrgs

max region gaps to apply.

gtf

Either a string containg the path to a .gtf file or a pre-imported gtf using rtracklayer::import .

ucsc_chr

logical scalar, determining whether to add "chr" prefix to the seqnames of non-overlapping exons and change "chrMT" -> "chrM". Note, if set to TRUE and seqnames already have "chr", it will not add another.

ignore.strand

logical value for input into findOverlaps, default is True.

exons_no_overlap

Optimum set of exons to help calculate deltas

biotype

Filters the GTF file passed in to what would be considered the "Gold Standard" exons. The Default is "Non-overlapping" but the options are: "Non-overlapping" (exons that don't intersect each other), "Three Prime" (3' UTR), "Five Prime" (5' UTR), "Internal" (Internal coding), "lncRNA" (Long Non-Coding RNA), "ncRNA" (Non-Coding RNA) and "Pseudogene"

bw_chr

specifies whether the bigwig files has the chromosomes labelled with a "chr" preceding the chromosome i.e. "chr1" vs "1". Can be either "chr" or "nochr" with "chr" being the default.

file_type

Describes if the BigWigs are stranded or not. Either "stranded" or non-stranded

bw_pos

positive strand bigwig file

bw_neg

negative strand bigwig file

auc_raw_pos

vector containing AUCs(Area Under Coverage) matching the order of the positive bigwig paths.

auc_raw_neg

vector containing AUCs(Area Under Coverage) matching the order of the negative bigwig paths.

Value

list containing optimised ERs, optimal pair of MCC/MRGs and delta_df

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
rec_url <- recount::download_study(
    project = "SRP012682",
    type = "samples",
    download = FALSE
)

# file_cache is an internal function to download a bigwig file from a link
# if the file has been downloaded recently, it will be retrieved from a cache
bw_path <- file_cache(rec_url[1])
gtf_url <- paste0(
    "http://ftp.ensembl.org/pub/release-103/gtf/",
    "homo_sapiens/Homo_sapiens.GRCh38.103.chr.gtf.gz"
)
gtf_path <- file_cache(gtf_url)

# As of rtracklayer 1.25.16, BigWig is not supported on Windows.
data(gtex_SRP012682_SRX222703_lung_auc_1, package = "ODER")
if (!xfun::is_windows()) {
    opt_ers <- ODER(
        bw_paths = bw_path,
        auc_raw = gtex_SRP012682_SRX222703_lung_auc_1,
        auc_target = 40e6 * 100, chrs = c("chr21", "chr22"),
        genome = "hg38", mccs = c(5, 10), mrgs = c(10, 20),
        gtf = gtf_path, ucsc_chr = TRUE, ignore.strand = TRUE,
        exons_no_overlap = NULL, bw_chr = "chr"
    )

    opt_ers
}

eolagbaju/ODER documentation built on Dec. 20, 2021, 5:21 a.m.