knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) evenlyScaleEnsemble <- function(spectraList){ spectraLengths <- lapply(spectraList,length); maxLength <- max(unlist(spectraLengths)); maxIndex <- which(unlist(lapply(spectraList,length)) == max(unlist(lapply(spectraList,length)))); scaledSpectra <- list(length(spectraList)); scaledSpectra[[maxIndex]] <- spectraList[[maxIndex]]; for (i in 1:length(spectraList)){ if (i == maxIndex){ next; } scaledSpectra[[i]] <- evenlyScaleSingle(spectraList[[i]], maxLength); } return(scaledSpectra); }
This document is the package vignette for the afgencomp
package in R. The
code in this package was originally included in a github package distributed
under the name YinGenomicDFTDistances
. This is the most current version of
the software from the previous package, and all associated works.
This software is developed independently from many of the extant enhanced datatypes which are available within the BioConductor software suite, as it is intended to be a light-weight standalone package which can be used to quickly produce distances between genomic sequences using alignment free methods. This being said, the primary data-type utilized for storing sequences is the string.
This vignette is distributed and maintained along with the software in afgencomp
.
However, if you happen to find your way to this vignette, and are looking to install
the afgencomp
R-package you may do so via one of several routes:
install.packages()
devtools::install_github()
install.packages()
The source distribution of the software is written in R, and hosted on Github.
The source code can be downloaded using the git software to clone the afgencomp
repository. This can be accomplished by simply typing the following git
command in a terminal window where you would like to keep the package files
or by downloading the package manually using a web-browser and navigating to the
github page.
```{bash eval=FALSE} git clone https://github.com/mathornton01/afgencomp.git
Once the package has been downloaded it can be installed for R by navigating to the `afgencomp` directory from within R and using the `install.packages()` function. ```r afgencomp.pkg.dir <- "full/path/to/directory/goes/here"; install.packages(afgencomp.pkg.dir, repos=NULL, type="source");
Or if you would instead like to choose the file using the file-manager, you can do so by running:
install.packages(file.choose(), repos=NULL)
then selecting the top-level directory for the package source. That is the
directory which contains the R/
, data/
, man/
, and vignettes/
folders.
The devtools
library in R allows for developers to quickly and easily share
there packages with R-users via Github. the install_github
function of the
devtools
package. Be sure to specify that you would like for the package
vignette (this document) to be constructed when you run this, so that the
vignette is available via the browseVignettes()
or ??
functions.
library(devtools); devtools::install_github("https://github.com/mathornton01/afgencomp.git",build_vignettes = TRUE);
When comparing genomic sequences, most procedures first determine a set of mutations required to transform one sequence into another, these are referred to as 'post-alignment' procedures in this work. When doing this for a large group of sequences simultaneously, it can become unwieldy to align every sequence to every other sequence. This is why alignment-free procedures can be useful. It is frequently very slow to perform Multiple Sequence Alignment (MSA), with large datasets. In the afgencomp (previously YinGenomicDFTDistance) package two alignment free approaches are implemented. However, prior to actually computing distances and comparing sequences, the data must be processed in an appropriate manner
A quickstart guide is provided here for ease of adoption and implementation, but it is expected that for most procedures, the researcher may use the `?' functionality in R as usual to retrieve runnable examples for each of the functions in the package.
library(afgencomp) # Create List with some Example Sequences. sequencelist.example <- c("ACCTCGCGGCGGCGCTCTCGAGAGNNCGCGTGAGAGCTCGCN", "ACCTTGCGGCGGCGCTCTCCGTAGNNCGCGTGAGAGCTCGCN", "ACCACGGGCGGGGGCGCGTTNNNTGAGAGTNCCCGCGCGCGG", "ACCTCGCGGCGGCGCTCTCGAGAGNNCGCGTGATCGCTCGCN", "ACCTCGCGGCGGCGCTCTCGAGAGNNCGCG", "ACCTCGCGGCGGCGCTCTCGAGAGNNCGCGTGATCGCTCGCAGAGGAGGN"); # Encode The Ensemble and create a 2D encoded genomic string ensemble encoded.sequences <- encodeGenomes(sequencelist.example); # Display First Sequence Signal for Example plot(encoded.sequences[[1]][1,],col='blue',type='l',main="Encoded Genome 1", xlab="Genomic Loci", ylab="Encoding"); lines(encoded.sequences[[1]][2,],col='red');
Once the genomes are encoded, the power spectra can be computed and the even scaling procedure can be applied to produce equal length sequences, and distances can be taken to produce phylogenies and dendrograms.
getPowerSpectraEnsemble(encoded.sequences) -> power.spectra.sample; evenlyScaleEnsemble(power.spectra.sample) -> scaled.spectra; library(rlist) list.rbind(scaled.spectra) -> sspecmat; scale(sspecmat) -> scaled.sspecmat; dist(scaled.sspecmat)->scaled.sspecmat.dist; plot(hclust(scaled.sspecmat.dist))
We hope that you find some use in our package! Thank you for your consideration.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.