knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(tidyverse) library(Biostrings) library(FeatureReachR)
In this experiment, RBFOX2, a splicing factor, was knocked out. Several introns become enhanced, or preferentially retained due to the absence of RBFOX2. The fastas below contatain the sequences downstream from enhanced introns (case) or all other introns (ctrl).
case <- readDNAStringSet(system.file("extdata", "DownstreamIntron.Enhanced.fasta", package = "FeatureReachR")) ctrl <- readDNAStringSet(system.file("extdata", "DownstreamIntron.Control.fasta", package = "FeatureReachR"))
Some of the easiest sequence characteristics to investigate are length and GC content.
In reguards to length, we can see that these sequence lengths were defined as 200 nt long with some limits. Because these sequences lengths are defined, comparing them yields no interesting conclusions.
length_compare(case, ctrl) length_plot(case, ctrl)
GC content is more interesting.
We see that the case sequences have a significantly higher GC content than the control sequences. This could indicate that RBFOX2 preferentially binds higher GC sequences.
GC_compare(case,ctrl) GC_plot(case,ctrl)
Using these fastas, we can identify sixmers that are over-represented downstream of enhanced introns as compared to downstream sequence from all other introns.
There are three significantly over-represented sixmers and one under-represented sixmer downstream of enhanced introns.
#identify enriched kmers kmer_stats <- kmer_compare(case, ctrl, 6) print(kmer_stats) kmer_plot(kmer_stats) ##enriched kmers are of particular interest. enriched_kmers <- kmer_stats %>% filter(p_adj < 0.05, log2FC > 0) %>% pull(kmer) %>% as.character()
enriched kmers are of particular interest as they can reveal over-represented RBP binding sites (motifs)
With only three enriched kmers, we can easily tell that the sequence "GCATG" is present in all of them. The functions below would help understand more complex sets of enriched kmers.
kmer2logo(enriched_kmers)
Because scanning for motifs in all of the case and ctrl fastas can take a long time, we can use the enriched kmers to estimate RBP binding sites within the case seqences.
#generate RBNS motif estimates from enriched kmers RBNS_estimate <- estimate_motif_from_kmer(enriched_kmers, "RBNS") RBNS_estimate motif_plot(RBNS_estimate) cisbp_estimate <- estimate_motif_from_kmer(enriched_kmers, "CISBPRNA_hs") cisbp_estimate motif_plot(cisbp_estimate)
#code to generate comparison tables (takes time) #RBFOX2_RBNS_compare <- motif_compare(RBNS_PWM, case, ctrl) #RBFOX2_cisbpRNA_compare <- motif_compare(CISBPRNA_hs_PWM, case, ctrl) data(RBFOX2_RBNS_compare, package="FeatureReachR") data(RBFOX2_cisbpRNA_compare, package = "FeatureReachR") #PWM scanning comparison tables print(RBFOX2_RBNS_compare) motif_plot(RBFOX2_RBNS_compare) print(RBFOX2_cisbpRNA_compare) motif_plot(RBFOX2_cisbpRNA_compare) ##phyper overlap RBNS est_hits <- RBNS_estimate %>% filter(p_adj < 0.05) %>% pull(., motif) scan_hits <- RBFOX2_RBNS_compare %>% filter(p_adj < 0.05, log2FC > 0) %>% pull(., motif) phyper(sum(est_hits %in% scan_hits)-1, length(scan_hits), nrow(RBFOX2_RBNS_compare)-length(scan_hits), length(est_hits), lower.tail = FALSE) ##phyper overlap cisbpRNA est_hits <- cisbp_estimate %>% filter(p_adj < 0.05) %>% pull(., motif) scan_hits <- RBFOX2_cisbpRNA_compare %>% filter(p_adj < 0.05, log2FC > 0) %>% pull(., motif) phyper(sum(est_hits %in% scan_hits)-1, length(scan_hits), nrow(RBFOX2_cisbpRNA_compare)-length(scan_hits), length(est_hits), lower.tail = FALSE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.