prot_vec: Converting from protein sequences to vectors or vice versa.

prot_vecR Documentation

Converting from protein sequences to vectors or vice versa.

Description

By using the word2vec model, amino acids are mapped to vectors of real numbers. Conceptually, it involves a mathematical embedding from a space with many dimensions per amino acid to a continuous vector space with a much lower dimension.

Usage

prot2vec(prot_seq, embedding_dim, embedding_matrix = NULL, ...)
vec2prot(prot_vec, embedding_matrix)

Arguments

prot_seq

protein sequences

prot_vec

protein embedding vectors

embedding_dim

dimension of embedding vectors

embedding_matrix

embedding matrix (default: NULL)

...

arguments for "word2vec::word2vec" but for dim, min_count and split

Value

prot_seq

protein sequences

prot_vec

protein embedding vectors

embedding_matrix

embedding matrix

Author(s)

Dongmin Jung

References

Mikolov, T., Sutskever, I., Chen, K., Corrado, G., & Dean, J. (2013). Distributed Representations of Words and Phrases and their Compositionality. arXiv:1310.4546.

Chang, M. (2020). Artificial intelligence for drug development, precision medicine, and healthcare.

See Also

word2vec::word2vec, word2vec::word2vec_similarity

Examples

prot_seq <- example_PTEN[1:10]
prot2vec_result <- prot2vec(prot_seq = prot_seq, embedding_dim = 8)
vec2prot_result <- vec2prot(prot_vec = prot2vec_result$prot_vec,
                            embedding_matrix = prot2vec_result$embedding_matrix)

dongminjung/GenProSeq documentation built on May 3, 2022, 10:28 p.m.