ntoken: Count the number of tokens or types
In quanteda/quanteda: Quantitative Analysis of Textual Data

ntoken

R Documentation

Count the number of tokens or types

Description

Get the count of tokens (total features) or types (unique tokens).

Usage

ntoken(x, ...)

ntype(x, ...)

Arguments

`x`	a quanteda tokens or dfm object
`...`	additional arguments passed to `tokens()`

Value

ntoken() returns a named integer vector of the counts of the total tokens

ntypes() returns a named integer vector of the counts of the types (unique tokens) per document. For dfm objects, ntype() will only return the count of features that occur more than zero times in the dfm.

Examples

# simple example
txt <- c(text1 = "This is a sentence, this.", text2 = "A word. Repeated repeated.")
toks <- tokens(txt)
ntoken(toks)
ntype(toks)
ntoken(tokens_tolower(toks))  # same
ntype(tokens_tolower(toks))   # fewer types

# with some real texts
toks <- tokens(corpus_subset(data_corpus_inaugural, Year < 1806))
ntoken(tokens(toks, remove_punct = TRUE))
ntype(tokens(toks, remove_punct = TRUE))
ntoken(dfm(toks))
ntype(dfm(toks))

quanteda/quanteda documentation built on April 15, 2024, 7:59 a.m.