textstat_summary: Summarize documents
In koheiw/quanteda.core: Quantitative Analysis of Textual Data

Description Usage Arguments Details Examples

Count the total number of number tokens and sentences.

1	textstat_summary(x, cache = TRUE, ...)

`x`	corpus to be summarized
`cache`	if `TRUE`, use internal cache from the second time. Not available on Solaris.
`...`	additional arguments passed through to `dfm()`

Count the total number of characters, tokens and sentences as well as special tokens such as numbers, punctuation marks, symbols, tags and emojis.

chars = number of characters; equal to nchar()
sents = number of sentences; equal ntoken(tokens(x), what = "sentence")
tokens = number of tokens; equal to ntoken()
types = number of unique tokens; equal to ntype()
puncts = number of punctuation marks (^\p{P}+$)
numbers = number of numeric tokens (^\p{Sc}{0,1}\p{N}+([.,]*\p{N})*\p{Sc}{0,1}$)
symbols = number of symbols (^\p{S}$)
tags = number of tags; sum of pattern_username and pattern_hashtag in quanteda_options()
emojis = number of emojis (^\p{Emoji_Presentation}+$)

corp <- data_corpus_inaugural
textstat_summary(corp, cache = TRUE)
toks <- tokens(corp)
textstat_summary(toks, cache = TRUE)
dfmat <- dfm(toks)
textstat_summary(dfmat, cache = TRUE)