text_token: Tokenizing text
In dongminjung/ttgsea: Tokenizing Text of Gene Set Enrichment Analysis

Description Usage Arguments Value Author(s) See Also Examples

View source: R/ttgsea.R

An n-gram is used for tokenization. This function can also be used to limit the total number of tokens.

1	text_token(text, ngram_min = 1, ngram_max = 1, num_tokens)

`text`	text data
`ngram_min`	minimum size of an n-gram (default: 1)
`ngram_max`	maximum size of an n-gram (default: 1)
`num_tokens`	maximum number of tokens

`token`	result of tokenizing text
`ngram_min`	minimum size of an n-gram
`ngram_max`	maximum size of an n-gram

Dongmin Jung

tm::removeWords, stopwords::stopwords, textstem::lemmatize_strings, text2vec::create_vocabulary, text2vec::prune_vocabulary

library(fgsea)
data(examplePathways)
data(exampleRanks)
names(examplePathways) <- gsub("_", " ",
                          substr(names(examplePathways), 9, 1000))
set.seed(1)
fgseaRes <- fgsea(examplePathways, exampleRanks)
tokens <- text_token(data.frame(fgseaRes)[,"pathway"],
          num_tokens = 1000)