dtm_stats: Gets DTM summary statistics
In text2map: R Tools for Text Matrices, Embeddings, and Networks

dtm_stats

R Documentation

Gets DTM summary statistics

Description

dtm_stats() provides a summary of corpus-level statistics using any document-term matrix. These include (1) basic information on size (total documents, total unique terms, total tokens), (2) lexical richness, (3) distribution information, (4) central tendency, and (5) character-level information.

Usage

dtm_stats(
  dtm,
  richness = TRUE,
  distribution = TRUE,
  central = TRUE,
  character = TRUE,
  simplify = FALSE
)

Arguments

`dtm`	Document-term matrix with terms as columns. Works with DTMs produced by any popular text analysis package, or you can use the `dtm_builder()` function.
`richness`	Logical (default = TRUE), whether to include statistics about lexical richness, i.e. terms that occur once, twice, and three times (hapax, dis, tris), and the total type-token ratio.
`distribution`	Logical (default = TRUE), whether to include statistics about the distribution, i.e. min, max st. dev, skewness, kurtosis.
`central`	Logical (default = TRUE), whether to include statistics about the central tendencies i.e. mean and median for types and tokens.
`character`	Logical (default = TRUE), whether to include statistics about the character lengths of terms, i.e. min, max, mean
`simplify`	Logical (default = FALSE), whether to return statistics as a data frame where each statistic is a column. Default returns a list of small data frames.