prot_vec: Converting from protein sequences to vectors or vice versa.
In dongminjung/GenProSeq: Generating Protein Sequences with Deep Generative Models

prot_vec

R Documentation

Converting from protein sequences to vectors or vice versa.

Description

By using the word2vec model, amino acids are mapped to vectors of real numbers. Conceptually, it involves a mathematical embedding from a space with many dimensions per amino acid to a continuous vector space with a much lower dimension.

Usage

prot2vec(prot_seq, embedding_dim, embedding_matrix = NULL, ...)
vec2prot(prot_vec, embedding_matrix)

Arguments

`prot_seq`	protein sequences
`prot_vec`	protein embedding vectors
`embedding_dim`	dimension of embedding vectors
`embedding_matrix`	embedding matrix (default: NULL)
`...`	arguments for "word2vec::word2vec" but for dim, min_count and split

Value

`prot_seq`	protein sequences
`prot_vec`	protein embedding vectors
`embedding_matrix`	embedding matrix

Author(s)

Dongmin Jung

References

Mikolov, T., Sutskever, I., Chen, K., Corrado, G., & Dean, J. (2013). Distributed Representations of Words and Phrases and their Compositionality. arXiv:1310.4546.

Chang, M. (2020). Artificial intelligence for drug development, precision medicine, and healthcare.

Examples

prot_seq <- example_PTEN[1:10]
prot2vec_result <- prot2vec(prot_seq = prot_seq, embedding_dim = 8)
vec2prot_result <- vec2prot(prot_vec = prot2vec_result$prot_vec,
                            embedding_matrix = prot2vec_result$embedding_matrix)

dongminjung/GenProSeq documentation built on May 3, 2022, 10:28 p.m.