get_seq_encode_pad: Vectorization of characters of strings

Description Usage Arguments Value Author(s) See Also Examples

View source: R/util.R

Description

A vectorization of characters of strings is necessary. Vectorized characters are padded or truncated.

Usage

1
2
get_seq_encode_pad(sequences, length_seq, ngram_max = 1, ngram_min = 1,
    lenc = NULL)

Arguments

sequences

SMILE strings or amino acid sequences

length_seq

length of input sequences

ngram_max

maximum size of an n-gram (default: 1)

ngram_min

minimum size of an n-gram (default: 1)

lenc

encoded labels for characters, LableEncoder object fitted by "CatEncoders::LabelEncoder.fit" (default: NULL)

Value

sequences_encode_pad

for each SMILES string, an encoded sequence which is padded or truncated

lenc

encoded labels for characters

num_token

total number of characters

Author(s)

Dongmin Jung

See Also

CatEncoders::LabelEncoder.fit, CatEncoders::transform, keras::pad_sequences, stringdist::qgrams, tokenizers::tokenize_ngrams

Examples

1

dongminjung/DeepPINCS documentation built on Dec. 20, 2021, 12:13 a.m.