terms_dfm: Create a document-feature-matrix from a text source
In sdaume/topicsplorrr: A Package Supporting Topical Text Analysis, Exploration and Visualization

Description Usage Arguments Details Value

terms_dfm takes a text source with text objects associated with unique document identifiers and creates a document-feature-matrix, which can be used as input for an stm topic modeller.

1
2
3

terms_dfm(textData, textColumn, documentIdColumn,
  removeStopwords = FALSE, removeNumbers = FALSE,
  wordStemming = FALSE, customStopwords = NULL)

`textData`	a dataframe containing the text to be processed, with each row representing a distinct document
`textColumn`	the column name in `textData` containing the text to be processed
`documentIdColumn`	the column name in `textData` specifying a unique identifier for the document with the content given in `textColumn`
`removeStopwords`	a Boolean indicating whether standard stopwords (see `tidytext`) should be removed from the result; default is FALSE.
`removeNumbers`	a Boolean indicating whether numbers should be removed from the result; default is FALSE. If TRUE, a the Porter stemmer from the `SnowballC package` is applied.
`wordStemming`	a Boolean indicating whether words in the text should be reduced to the word stem; default is FALSE.
`customStopwords`	a character vector specifying additional stopwords that should be removed from the result

Text input (textColumn) is split with a word tokenizer and tokens are further processed and filtered according to the function's options. Since the result is primarily intended as input for a topic modeller, stopwords (see tidytext) are not removed by default.

a document-feature-matrix of type quanteda::dfm (similar to a document-term-matrix), where a document is identified by the value in the documentIdColumn specified in the text source (i.e. textData), and a feature or term is a character sequence obtained after tokenization and all other NLP processing options have been applied to the text associated with a document.

sdaume/topicsplorrr documentation built on Dec. 22, 2021, 11:11 p.m.

sdaume/topicsplorrr index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

sdaume/topicsplorrr
A Package Supporting Topical Text Analysis, Exploration and Visualization

terms_dfm: Create a document-feature-matrix from a text source
In sdaume/topicsplorrr: A Package Supporting Topical Text Analysis, Exploration and Visualization

Description

Usage

Arguments

Details

Value

Related to terms_dfm in sdaume/topicsplorrr...

R Package Documentation

Browse R Packages

We want your feedback!

sdaume/topicsplorrr A Package Supporting Topical Text Analysis, Exploration and Visualization

terms_dfm: Create a document-feature-matrix from a text source In sdaume/topicsplorrr: A Package Supporting Topical Text Analysis, Exploration and Visualization

Description

Usage

Arguments

Details

Value

Related to terms_dfm in sdaume/topicsplorrr...

R Package Documentation

Browse R Packages

We want your feedback!

sdaume/topicsplorrr
A Package Supporting Topical Text Analysis, Exploration and Visualization

terms_dfm: Create a document-feature-matrix from a text source
In sdaume/topicsplorrr: A Package Supporting Topical Text Analysis, Exploration and Visualization