mergeBSJunctions: Group circRNAs identified by multiple prediction tools

mergeBSJunctionsR Documentation

Group circRNAs identified by multiple prediction tools

Description

The function mergeBSJunctions() shrinks the data frame by grouping back-spliced junctions commonly identified by multiple detection tools. The read counts of the samples reported in the final data frame will be the ones of the tool that detected the highest total mean across all samples. All the tools that detected the back-spliced junctions are then listed in the column "tool" of the final data frame. See getDetectionTools for more detail about the code corresponding to each circRNA detection tool.

NOTE: Since different detection tools can report sligtly different coordinates before grouping the back-spliced junctions, it is possible to fix the latter using the gtf file. In this way the back-spliced junctions coordinates will correspond to the exon coordinates reported in the gtf file. A difference of maximum 2 nucleodites is allowed between the bsj and exon coordinates. See param fixBSJsWithGTF.

Usage

mergeBSJunctions(
  backSplicedJunctions,
  gtf,
  pathToExperiment = NULL,
  exportAntisense = FALSE,
  fixBSJsWithGTF = FALSE
)

Arguments

backSplicedJunctions

A data frame containing back-spliced junction coordinates and counts generated with getBackSplicedJunctions.

gtf

A data frame containing genome annotation information, generated with formatGTF.

pathToExperiment

A string containing the path to the experiment.txt file. The file experiment.txt contains the experiment design information. It must have at least 3 columns with headers: - label (1st column): unique names of the samples (short but informative). - fileName (2nd column): name of the input files - e.g. circRNAs_X.txt, where x can be can be 001, 002 etc. - group (3rd column): biological conditions - e.g. A or B; healthy or diseased if you have only 2 conditions.

By default pathToExperiment i set to NULL and the file it is searched in the working directory. If experiment.txt is located in a different directory then the path needs to be specified.

exportAntisense

A logical specifying whether to export the identified antisense circRNAs in a file named antisenseCircRNAs.txt. Default value is FALSE. A circRNA is defined antisense if the strand reported in the prediction results is different from the strand reported in the genome annotation file. The antisense circRNAs are removed from the returned data frame.

fixBSJsWithGTF

A logical specifying whether to fix the back-spliced junctions coordinates using the GTF file. Default value is FALSE.

Value

A data frame.

Examples

# Load detected back-soliced junctions
data("backSplicedJunctions")

# Load short version of the gencode v19 annotation file
data("gtf")

pathToExperiment <- system.file("extdata", "experiment.txt",
    package ="circRNAprofiler")

# Merge commonly identified circRNAs
mergedBSJunctions <- mergeBSJunctions(backSplicedJunctions, gtf,
    pathToExperiment)


Aufiero/circRNAprofiler documentation built on Nov. 3, 2024, 10:12 a.m.