fusionTable | R Documentation |
Determine all possible fusions of a Rearrangement
object and
evaluate whether each is in-frame.
fusionTable(robj, txdb, tx, cds, genome, orgdb, id = "")
robj |
A |
txdb |
A |
tx |
A |
cds |
A |
genome |
A |
orgdb |
A |
id |
A length-one character vector of the sample identifier |
Fusions are analyzed as follows. First, all CDS from
genes overlapping a rearrangement are extracted (ignoring strand)
using the getCDS
function. In addition to extracting all
the CDS, this function returns the possible gene fusion that may
result from a rearrangement based on the modal rearrangement type
(the modal rearrangement type is inferred from the strand and
position orientation of read pairs). The genes are denoted
generically by
A C 5+ --------------|--------- 3- --------B-----|----D---- where "|" denotes a new sequence junction in a rearranged genome.
For a given fusion (say AC), we then clip
CDS from A and
CDS from B that are absent in the fused product. After clipping,
we fuse
the remaining CDS from genes A and C. The function
tumorProtein
is used to derive the amino acid sequence of
the tumor protein – the protein that would be formed by fusing
the tripped CDS from genes A and C. To assess whether the fusion
is in frame, we extract all known full transcripts from genes A
and C and translate the DNA sequence of each transcript to an
amino acid sequence. We refer to the amino acid sequences of the
full CDS as the reference protein. The function
referenceProtein
is a wrapper for getting the reference
amino acid sequences. Given the amino acid sequence of the
clipped and fused transcripts (fused tumor protein) and the amino
acid sequence of the full, unclipped transcripts (reference
protein), we compare their sequences to assess whether the fusion
is in-frame using the function inFrameFusions
. The results
are summarized in tabular format by the function
.fusionTable
.
See getCDS
for how the CDS from genes
involved in a rearrangement are extracted, clip
and
fuse
for how transcripts are clipped and then
fused, respectively. See referenceProtein
and
tumorProtein
for deriving germline (unrearranged)
and somatic (rearranged) amino acid sequences.
library(org.Hs.eg.db)
library(BSgenome.Hsapiens.UCSC.hg19)
library(TxDb.Hsapiens.UCSC.hg19.refGene)
txdb <- TxDb.Hsapiens.UCSC.hg19.refGene
genome <- BSgenome.Hsapiens.UCSC.hg19
tx <- transcripts(txdb)
options(warn=-1)
cds.all <- cdsBy(txdb, "tx", use.names=TRUE)
data(rear_list)
r <- rear_list[["18557-18736"]]
fusionTable(r, txdb, tx, cds.all, genome,
org.Hs.eg.db, id="test")
## in-frame fusion of CTNND2 and TRIO
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.