token_vector: Vectorization of tokens

Description Usage Arguments Value Author(s) See Also Examples

View source: R/ttgsea.R

Description

A vectorization of words or tokens of text is necessary for machine learning. Vectorized sequences are padded or truncated.

Usage

1
token_vector(text, token, length_seq)

Arguments

text

text data

token

result of tokenization (output of "text_token")

length_seq

length of input sequences

Value

sequences of integers

Author(s)

Dongmin Jung

See Also

tm::removeWords, stopwords::stopwords, textstem::lemmatize_strings, tokenizers::tokenize_ngrams, keras::pad_sequences

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
library(reticulate)
if (keras::is_keras_available() & reticulate::py_available()) {
  library(fgsea)
  data(examplePathways)
  data(exampleRanks)
  names(examplePathways) <- gsub("_", " ",
                            substr(names(examplePathways), 9, 1000))
  set.seed(1)
  fgseaRes <- fgsea(examplePathways, exampleRanks)
  tokens <- text_token(data.frame(fgseaRes)[,"pathway"],
            num_tokens = 1000)
  sequences <- token_vector("Cell Cycle", tokens, 10)
}

dongminjung/ttgsea documentation built on Dec. 30, 2021, 8:51 a.m.