SWeeP: A vectorial comparative method to amino acid sequence.

Description Usage Arguments Details Value Author(s) References Examples

Description

The "Spaced Words Projection (SWeeP)" is a method for representing biological sequences using compact vectors. SWeeP uses the spacedwords concept by scanning sequences and generating indices to create a higherdimensional matrix of occurrences that is later projected into a smaller randomly oriented orthonormal base (PIERRI, 2019). This way the resulting matrix will conserve the comparational data but will have a selectable size

Usage

1
2
3
4
5
6
7
sWeeP(xfas, baseMatrix)

## S4 method for signature 'character'
sWeeP(xfas, baseMatrix)

## S4 method for signature 'AAStringSet'
sWeeP(xfas, baseMatrix)

Arguments

xfas

A AAStringSet or a FASTA format file

baseMatrix

A orthonormal matrix with 160.000 coordinates

Details

The SWeeP method was developed to favor the comparison between complete proteomic sequences and to assist in machine learning analyzes. This method is based on the concept of spaced words, which are used to scan biological sequences and project them into matrix of occurrences, favoring the manipulation of the data. The sWeeP function can project a matrix n by m, where n is the number of sequences in the analized xfas and m is the number of columns in baseMatrix matrix.

Value

A matrix resulted by the projection of the sequences in xfas in the baseMatrix matrix

Author(s)

Danrley R. Fernandes.

References

Pierri,C. R. et al. SWeeP: Representing large biological sequences data sets in compact vectors. Scientific Reports, accepted in December 2019.doi: 10.1038/s41598-019-55627-4.

Examples

1
2
3
4
5
6
baseMatrix <- orthBase(160000,10)
path <- system.file(package = "rSWeeP", "extdata", "exdna.fas")
return <- sWeeP(path,baseMatrix)
distancia <- dist(return, method = "euclidean")
tree <- hclust(distancia, method="ward.D")
plot(tree, hang = -1, cex = 1)

DanrleyRF/SWeeP documentation built on Nov. 24, 2020, 5:41 a.m.