find_cryptic_splice_sites | R Documentation |
This function identifies potential cryptic splice sites by comparing sequence motifs in introns to canonical splice site motifs (donor and acceptor). Cryptic splice sites are those that do not match the canonical donor (GT) or acceptor motifs (AG). It compares the identified splice sites with the provided canonical motifs and flags the sites that differ from the canonical patterns, making it useful for studying aberrant splicing events.
find_cryptic_splice_sites(input, genome, canonical_donor, canonical_acceptor, verbose)
input |
A data frame containing intron coordinates, ideally generated
by |
genome |
A BSgenome object representing the genome sequence. This is used to extract the sequence for each intron to identify splice sites. |
canonical_donor |
A character vector of canonical donor splice site motifs.
Default is |
canonical_acceptor |
A character vector of canonical acceptor splice site motifs.
Default is |
verbose |
Logical; if |
This function performs the following steps:
It assigns donor and acceptor splice sites to each intron using the assign_splice_sites
function.
It compares the identified donor and acceptor splice sites against the provided canonical motifs (GT
for donor and AG
for acceptor by default). If the splice site sequences do not match the canonical motifs, they are flagged as cryptic.
The function returns a data frame with the same intron information, including additional columns cryptic_donor
and cryptic_acceptor
indicating whether the splice sites are cryptic.
The progress of the function is printed if the verbose
argument is set to TRUE
, showing also the total number of cryptic donor and acceptor sites and their respective percentages.
The input data frame with two logical columns:
cryptic_donor
: TRUE
if donor site is non-canonical.
cryptic_acceptor
: TRUE
if acceptor site is non-canonical.
assign_splice_sites
, extract_ss_motif
suppressPackageStartupMessages(library(BSgenome.Hsapiens.UCSC.hg38))
file_v1 <- system.file("extdata", "gencode.v1.example.gtf.gz", package = "GencoDymo2")
gtf_v1 <- load_file(file_v1)
introns_df <- extract_introns(gtf_v1)
introns_ss <- assign_splice_sites(introns_df, genome = BSgenome.Hsapiens.UCSC.hg38)
cryptic_sites <- find_cryptic_splice_sites(introns_ss, BSgenome.Hsapiens.UCSC.hg38)
head(cryptic_sites)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.