seq_preprocessing: Preprocessing for SMILES strings and amino acid sequences

Description Usage Arguments Value Author(s) References Examples

View source: R/util.R

Description

Preprocessing helps make the data suitable for the model depending on the type of data the preprocessing works upon. Preprocessing is more time consuming for text data. The adjacency matrix and node feature, fingerprint, or string data are preprocessed from sequences.

Usage

1
2
3
4
5
6
7
8
9
seq_preprocessing(smiles = NULL,
    AAseq = NULL,
    type,
    convert_canonical_smiles,
    max_atoms,
    length_seq,
    lenc = NULL,
    ngram_max = 1,
    ngram_min = 1)

Arguments

smiles

SMILES strings (default: NULL)

AAseq

amino acid sequences (default: NULL)

type

"graph", "fingerprint" or "sequence"

convert_canonical_smiles

SMILES strings are converted to canonical SMILES strings if TRUE

max_atoms

maximum number of atoms for compounds

length_seq

length of compound or protein sequence

lenc

encoded labels for characters of SMILES strings or amino acid sequenes (default: NULL)

ngram_max

maximum size of an n-gram for protein sequences (default: 1)

ngram_min

minimum size of an n-gram for protein sequences (default: 1)

Value

canonical_smiles

canonical representation of SMILES

convert_canonical_smiles

canonical representation is used or not

A_pad

padded or turncated adjacency matrix of compounds if type is "graph"

X_pad

padded or turncated node features of compounds if type is "graph"

fp

fingerprint of compounds if type is "fingerprint"

sequences_encode_pad

encoded sequences which are padded or truncated

lenc

encoded labels for characters of SMILES strings or amino acid sequenes

length_seq

length of compound or protein sequence

num_tokens

total number of characters of compounds or proteins

Author(s)

Dongmin Jung

References

Dey, N., Wagh, S., Mahalle, P. N., & Pathan, M. S. (Eds.). (2019). Applied machine learning for smart data analysis. CRC Press.

Examples

1
2
3
seq_preprocessing(smiles = cbind(example_cpi[1, 1]),
    type = "fingerprint",
    convert_canonical_smiles = TRUE)

dongminjung/DeepPINCS documentation built on Dec. 20, 2021, 12:13 a.m.