| TextAnalyzer | R Documentation |
Text analyzer for search indexing
Provides text processing pipelines:
Tokenization
Lowercasing
Stopword removal
Stemming
Synonym expansion
lowercaseConvert to lowercase
remove_stopwordsRemove stopwords
stopwordsSet of stopwords
stemmerStemmer object
synonymsSynonym dictionary
min_token_lengthMinimum token length
max_token_lengthMaximum token length
token_patternRegex pattern for tokens
new()Create a new TextAnalyzer
TextAnalyzer$new( lowercase = TRUE, remove_stopwords = FALSE, stopwords = NULL, use_stemmer = FALSE, synonyms = NULL, min_token_length = 1, max_token_length = 100, token_pattern = "[a-zA-Z0-9]+" )
lowercaseLowercase text (default: TRUE)
remove_stopwordsRemove stopwords (default: FALSE)
stopwordsCustom stopwords (default: ENGLISH_STOPWORDS)
use_stemmerUse stemming (default: FALSE)
synonymsNamed list of synonyms
min_token_lengthMin length (default: 1)
max_token_lengthMax length (default: 100)
token_patternRegex pattern
analyze()Analyze text and return tokens
TextAnalyzer$analyze(text)
textInput text
Character vector of tokens
analyze_query()Analyze a query string
TextAnalyzer$analyze_query(query)
queryQuery text
Character vector of tokens
clone()The objects of this class are cloneable with this method.
TextAnalyzer$clone(deep = FALSE)
deepWhether to make a deep clone.
## Not run:
analyzer <- TextAnalyzer$english()
tokens <- analyzer$analyze("The quick brown foxes are jumping")
# c("quick", "brown", "fox", "jump")
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.