In TaliaferroLab/FeatureReachR: Extract Enriched Biological Features From RNA-seq Data

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

library(tidyverse)
library(Biostrings)
library(FeatureReachR)

Starting with some Sequence

In this experiment, RBFOX2, a splicing factor, was knocked out. Several introns become enhanced, or preferentially retained due to the absence of RBFOX2. The fastas below contatain the sequences downstream from enhanced introns (case) or all other introns (ctrl).

case <- readDNAStringSet(system.file("extdata", "DownstreamIntron.Enhanced.fasta", package = "FeatureReachR"))
ctrl <- readDNAStringSet(system.file("extdata", "DownstreamIntron.Control.fasta", package = "FeatureReachR"))

Easy length and GC comparisons

Some of the easiest sequence characteristics to investigate are length and GC content.

In reguards to length, we can see that these sequence lengths were defined as 200 nt long with some limits. Because these sequences lengths are defined, comparing them yields no interesting conclusions.

length_compare(case, ctrl)
length_plot(case, ctrl)

GC content is more interesting.

We see that the case sequences have a significantly higher GC content than the control sequences. This could indicate that RBFOX2 preferentially binds higher GC sequences.

GC_compare(case,ctrl)
GC_plot(case,ctrl)

Comparing kmers can reveal sequences over-represented in the case fasta

Using these fastas, we can identify sixmers that are over-represented downstream of enhanced introns as compared to downstream sequence from all other introns.

There are three significantly over-represented sixmers and one under-represented sixmer downstream of enhanced introns.

#identify enriched kmers
kmer_stats <- kmer_compare(case, ctrl, 6) 

print(kmer_stats)

kmer_plot(kmer_stats)


##enriched kmers are of particular interest.
enriched_kmers <- kmer_stats %>% 
  filter(p_adj < 0.05, log2FC > 0) %>% 
  pull(kmer) %>% 
  as.character()

enriched kmers are of particular interest as they can reveal over-represented RBP binding sites (motifs)

Collapsing these enriched kmers into similar groups is more useful when more kmers are enriched...

With only three enriched kmers, we can easily tell that the sequence "GCATG" is present in all of them. The functions below would help understand more complex sets of enriched kmers.

kmer2logo(enriched_kmers)

We can use the enriched sixmers to estimate RBP binding sites

Because scanning for motifs in all of the case and ctrl fastas can take a long time, we can use the enriched kmers to estimate RBP binding sites within the case seqences.

#generate RBNS motif estimates from enriched kmers
RBNS_estimate <- estimate_motif_from_kmer(enriched_kmers, "RBNS") 
RBNS_estimate
motif_plot(RBNS_estimate)

cisbp_estimate <- estimate_motif_from_kmer(enriched_kmers, "CISBPRNA_hs") 
cisbp_estimate
motif_plot(cisbp_estimate)

#code to generate comparison tables (takes time)
#RBFOX2_RBNS_compare <- motif_compare(RBNS_PWM, case, ctrl)
#RBFOX2_cisbpRNA_compare <- motif_compare(CISBPRNA_hs_PWM, case, ctrl)

data(RBFOX2_RBNS_compare, package="FeatureReachR")
data(RBFOX2_cisbpRNA_compare, package = "FeatureReachR")

#PWM scanning comparison tables
print(RBFOX2_RBNS_compare)
motif_plot(RBFOX2_RBNS_compare)
print(RBFOX2_cisbpRNA_compare)
motif_plot(RBFOX2_cisbpRNA_compare)

##phyper overlap RBNS
est_hits <- RBNS_estimate %>% filter(p_adj < 0.05) %>% pull(., motif)
scan_hits <- RBFOX2_RBNS_compare %>% filter(p_adj < 0.05, log2FC > 0) %>% pull(., motif) 

phyper(sum(est_hits %in% scan_hits)-1, length(scan_hits), nrow(RBFOX2_RBNS_compare)-length(scan_hits), length(est_hits), lower.tail = FALSE)


##phyper overlap cisbpRNA
est_hits <- cisbp_estimate %>% filter(p_adj < 0.05) %>% pull(., motif)
scan_hits <- RBFOX2_cisbpRNA_compare %>% filter(p_adj < 0.05, log2FC > 0) %>% pull(., motif)

phyper(sum(est_hits %in% scan_hits)-1, length(scan_hits), nrow(RBFOX2_cisbpRNA_compare)-length(scan_hits), length(est_hits), lower.tail = FALSE)

TaliaferroLab/FeatureReachR documentation built on Aug. 15, 2021, 2:21 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

TaliaferroLab/FeatureReachR
Extract Enriched Biological Features From RNA-seq Data

In TaliaferroLab/FeatureReachR: Extract Enriched Biological Features From RNA-seq Data

Starting with some Sequence

Easy length and GC comparisons

Comparing kmers can reveal sequences over-represented in the case fasta

Collapsing these enriched kmers into similar groups is more useful when more kmers are enriched...

We can use the enriched sixmers to estimate RBP binding sites

R Package Documentation

Browse R Packages

We want your feedback!

TaliaferroLab/FeatureReachR Extract Enriched Biological Features From RNA-seq Data

In TaliaferroLab/FeatureReachR: Extract Enriched Biological Features From RNA-seq Data

Starting with some Sequence

Easy length and GC comparisons

Comparing kmers can reveal sequences over-represented in the case fasta

Collapsing these enriched kmers into similar groups is more useful when more kmers are enriched...

We can use the enriched sixmers to estimate RBP binding sites

R Package Documentation

Browse R Packages

We want your feedback!

TaliaferroLab/FeatureReachR
Extract Enriched Biological Features From RNA-seq Data