# rmarkdown::render("vignettes/vignette.Rmd") library(MotifFinder)
knitr::opts_chunk$set(fig.width=10)
We create 300 DNA sequences of length 200 with the motif "ATgTT_GtCC" around the center of 50% of the sequences.
library(MotifFinder) set.seed(42) simulated_sequences <- simulate_sequences(motif="ATgTT_GtCC") str(simulated_sequences)
We run MotifFinder with a length slightly shorter than the known motif length.
motif_found <- findamotif(simulated_sequences$seqs, len=7, stranded_prior = T, seed = 42)
We can see that we have recovered the motif.
library(ggseqlogo) ggseqlogo(get_PWM(motif_found)) ggseqlogo(get_PWM(motif_found, complement=TRUE))
We can also check the location of the motifs found as well as the location it thinks the motifs are.
plot_motif_location(motif_found) plot(motif_found$prior)
It's a good idea to plot the inferred probability the motif being in each sequence (alpha parameter) for each iteration to check convergence.
plot(motif_found$alphas, ylim=c(0,1))
Other helper functions include downloading PWMs from the Jaspar or Hocomoco databases.
Alx1 <- download_PWM("ALX1_MOUSE.H11MO.0.B") str(Alx1)
To use real DNA instead of simulating sequences you will need to download the genome and extract regions that are specified using a BED file.
```{bash, getFASTA, eval=FALSE} wget ftp://ftp.ensembl.org/pub/release-95/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna_sm.primary_assembly.fa.gz gunzip motifs/Homo_sapiens.GRCh38.dna_sm.primary_assembly.fa.gz samtools faidx motifs/Homo_sapiens.GRCh38.dna_sm.primary_assembly.fa
bedtools getfasta -s -fi Homo_sapiens.GRCh38.dna_sm.primary_assembly.fa -bed example_locations.bed -name > example_locations.fasta
We can then load these sequences using the helper function load_sequences ```r genomic_sequences <- load_sequences("example_locations.fasta")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.