dfm: Create a document-feature matrix
In quanteda: Quantitative Analysis of Textual Data

View source: R/dfm.R

dfm	R Documentation

Create a document-feature matrix

Description

Construct a sparse document-feature matrix from a tokens or dfm object.

Usage

dfm(
  x,
  tolower = TRUE,
  remove_padding = FALSE,
  verbose = quanteda_options("verbose"),
  ...
)

Arguments

`x`	a tokens or dfm object.
`tolower`	convert all features to lowercase.
`remove_padding`	logical; if `TRUE`, remove the "pads" left as empty tokens after calling `tokens()` or `tokens_remove()` with `padding = TRUE`.
`verbose`	display messages if `TRUE`.
`...`	not used.

Value

a dfm object

Changes in version 3

In quanteda v4, many convenience functions formerly available in dfm() were removed.

Examples

## for a corpus
toks <- data_corpus_inaugural |>
  corpus_subset(Year > 1980) |>
  tokens()
dfm(toks)

# removal options
toks <- tokens(c("a b c", "A B C D")) |>
    tokens_remove("b", padding = TRUE)
toks
dfm(toks)
dfm(toks) |>
 dfm_remove(pattern = "") # remove "pads"

# preserving case
dfm(toks, tolower = FALSE)

quanteda documentation built on April 7, 2026, 1:06 a.m.