extract-term-ngrams: Split a text source into tokens and terms by date of...
In sdaume/topicsplorrr: A Package Supporting Topical Text Analysis, Exploration and Visualization

Description Usage Arguments Details Value

These functions transform a text source into a dataframe of individual terms and tokens with an occurrence date. These terms/tokens can be extracted as ngrams of specified length. terms_by_date is wrapper around the function for specific types of ngrams.

terms_by_date(textData, textColumn, dateColumn, removeNumbers = TRUE,
  wordStemming = TRUE, customStopwords = NULL, tokenType = "unigram")

unigrams_by_date(textData, textColumn, dateColumn, removeNumbers = TRUE,
  wordStemming = TRUE, customStopwords = NULL)

bigrams_by_date(textData, textColumn, dateColumn, removeNumbers = TRUE,
  wordStemming = TRUE, customStopwords = NULL)

`textData`	a dataframe containing the text to be processed
`textColumn`	a character string specifying the column name in `textData` containing the text to be processed
`dateColumn`	a character string specifying the column name in `textData` specifying a publication date for the text in `textColumn`
`removeNumbers`	a Boolean indicating whether numbers should be removed from the result; default is TRUE.
`wordStemming`	a Boolean indicating whether words in the text should be reduced to the word stem; default is TRUE.
`customStopwords`	a character vector specifying additional stopwords that should be removed from the result
`tokenType`	the length of the consecutive token sequence extracted, currently only `bigram` (two word sequence) and `unigram` (single words) are supported, with `unigram` as default

Text input (textColumn) is split with a word tokenizer, default stopwords (see tidytext) are removed and tokens are further processed and filtered according to the function's options. A term is the character sequence obtained after all NLP processing options this function offers have been applied, most importantly stemming, here the Porter stemmer from the SnowballC package is applied.

a dataframe with three columns listing all individual term occurrences in the provided text source, where occur is the publication date associated with an original token, which has been processed/reduced to term; if no stemming has been applied the term and token in the result are identical

sdaume/topicsplorrr documentation built on Dec. 22, 2021, 11:11 p.m.

sdaume/topicsplorrr index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

sdaume/topicsplorrr
A Package Supporting Topical Text Analysis, Exploration and Visualization

extract-term-ngrams: Split a text source into tokens and terms by date of...
In sdaume/topicsplorrr: A Package Supporting Topical Text Analysis, Exploration and Visualization

Description

Usage

Arguments

Details

Value

Related to extract-term-ngrams in sdaume/topicsplorrr...

R Package Documentation

Browse R Packages

We want your feedback!

sdaume/topicsplorrr A Package Supporting Topical Text Analysis, Exploration and Visualization

extract-term-ngrams: Split a text source into tokens and terms by date of... In sdaume/topicsplorrr: A Package Supporting Topical Text Analysis, Exploration and Visualization

Description

Usage

Arguments

Details

Value

Related to extract-term-ngrams in sdaume/topicsplorrr...

R Package Documentation

Browse R Packages

We want your feedback!

sdaume/topicsplorrr
A Package Supporting Topical Text Analysis, Exploration and Visualization

extract-term-ngrams: Split a text source into tokens and terms by date of...
In sdaume/topicsplorrr: A Package Supporting Topical Text Analysis, Exploration and Visualization