The goal of conservedPos package is to find important conserved sites for small target region by conducting a site-specific analysis for a set of aligned DNA sequences. The package will provide function to count nucleotides occurrances in each position independently and calculates the frequency of the nucleotides in order to quantify the conservity of each position.
You can install the released version of conservedPos
require("devtools")
devtools::install_github("hezijin/conservedPos", build_vignettes = TRUE)
library("conservedPos")
To run the Shiny app:
runconservedPos()
For tutorial how to use:
browseVignettes("conservedPos")
ls("package:conservedPos")
data(package = "conservedPos")
The package contains 5 function to reformat and analyze the input sequences and 2 function to plot for the result. createPosVec() grabs the nucleotide for all of sequences at index position. Then in conservityTable(), it loops all of the index and calculate the most conserved nucleotide frequency through findConservityFromS(). We can also use findConservedNul() to find the conserved nucleotide at specific site and findMaxLen() to find the max length for sequence set.
plotOverall() plots the overall alignment’s conservity using the conservity vector generated from the function conservityTable while plotPartial()plots the seleted region from users.
Overview picture for the package:
The idea of developing this package is inspired by PWM Viewing and Searching. The way of clear up and reformat sequence string in function createPosVec is referenced to many website (show in Reference section). The idea of function findConservityFromS comes from the function in Biostirngs package function NucleotideFrequencyAt, while my function avoids the limitation of the input (NucleotideFrequencyAt is not allowed to use stirngs with NA, which means length of the input sequences need to be equal).The function findConservedNul() uses the idea from stack overflow.(see reference 8)
The function findMaxLen and conservityTable are written by me. The way of generate bar plot comes from R language Tutorial(show in Reference section) and used in function plotOverall and plotPartial.
Silva, A. (2020) TestingPackage: An Example R Package For BCB410H. Unpublished. URL https://github.com/anjalisilva/TestingPackage
R Bar Plot (2018) DataMentor. Getting Started in Data Science With R. https://www.datamentor.io/r-programming/bar-plot/
Lab 1: Biostrings in R. (2019) Susan Holmes and Wolfgang. Huberhttps://web.stanford.edu/class/bios221/labs/biostrings/lab_1_biostrings.html
R - Bar Charts. Bar Chart Labels, Title and Colors.(2020) Tutorialpoints-R Tutorial. https://www.tutorialspoint.com/r/r_bar_charts.htm
H. Pagès, P. Aboyoun, R. Gentleman and S. DebRoy (2020). Biostrings: Efficient manipulation of biological strings. R package version 2.58.0. https://bioconductor.org/packages/Biostrings
Orchestrating high-throughput genomic analysis with Bioconductor. W. Huber, V.J. Carey, R. Gentleman, …, M. Morgan Nature Methods, 2015:12, 115.
Bembom O, Ivanek R (2020). seqLogo: Sequence logos for DNA sequence alignments. R package version 1.56.0. http://www.bioconductor.org/packages/release/bioc/html/seqLogo.html
Antonis. (2018) How to find the most repeated word in a vector with R. Stack oveflow. https://stackoverflow.com/questions/48632957/how-to-find-the-most-repeated-word-in-a-vector-with-r
This package was developed as part of an assessment for 2020BCB410H: Applied Bioinformatics, University of Toronto, Toronto,CANADA.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.