rSWeeP: Functions to creation of low dimensional comparative matrices of Amino Acid Sequence occurrences

Overview

The “Spaced Words Projection (sWeeP)” is a method for representing biological sequences using relatively, it uses the spacedwords concept by scanning sequences and generating indices to create a higherdimensional vector that is later projected into a smaller randomly oriented orthonormal base. This function is suitable for making high quality comparisons between sequences allowing analyzes that are not possible due to the computational limitation of the traditional techniques. The method is available at sWeeP (PIERRI, 2019). This tool has it's main speed gain in constanci processing time. The response time grows linear to the number of inputs, while in other methods it grow is exponencial.

Functions

The package has two functions: orthBase, that generates an orthonormal matrix of a chosen size, and sWeeP, a function that applies the sWeeP method

Quick Start

The orthBase function can create a quasi-orthonormal matrix in any desired size. Here it is used to create a matrix to project the sWeeP method, so it must have 160.000 rows and the columns of the size wished for projection.

library(rSWeeP)
baseMatrix <- orthBase(160000,10)

The exdna.fas dataset consists in a list of three strings that simulates a DNA sequence used for demonstration purposes only.

path <- system.file(package = "rSWeeP", "extdata", "exdna.fas")

Then the sWeeP method is applied and the returns a matrix that represents the sequences compared by a vectorial method. And then it's possible to see a graphic representation in a phylogenetic tree

return <- sWeeP(path,baseMatrix)
distancia <- dist(return, method = "euclidean")
tree <- hclust(distancia, method="ward.D")
plot(tree, hang = -1, cex = 1)

Session information

sessionInfo()

References



Try the rSWeeP package in your browser

Any scripts or data that you put into this service are public.

rSWeeP documentation built on Nov. 8, 2020, 5:28 p.m.