README.md

quanteda.tidy

CRAN_Status_Badge Lifecycle:
experimental R-CMD-check Codecov test
coverage

About

quanteda.tidy extends the quanteda package with functionality from the “tidyverse”, especially dplyr. Note that this is not the same as tidytext, which stretches tokens into data.frames. Instead, tidy functions operate only on document variables, but extends these functions (from dplyr) to work on quanteda objects as if they were tibbles or data.frames.

Installation

You can install the stable version of quanteda.tidy from CRAN:

install.packages("quanteda.tidy")

Or install the development version from GitHub:

pak::pkg_install("quanteda/quanteda.tidy")

Overview of Functions

The functions in quanteda.tidy are organized into four categories, following the dplyr documentation:

| Category | Function | Description | |:---|:---|:---| | Rows | filter() | Subset documents based on docvar conditions | | Rows | slice(), slice_head(), slice_tail() | Subset documents by position | | Rows | slice_sample() | Randomly sample documents | | Rows | slice_min(), slice_max() | Select documents with min/max docvar values | | Rows | arrange(), distinct() | Reorder documents; keep unique documents | | Columns | select() | Keep or drop docvars by name | | Columns | rename(), rename_with() | Rename docvars | | Columns | relocate() | Change docvar column order | | Columns | mutate(), transmute() | Create or modify docvars | | Columns | pull() | Extract a single docvar as a vector | | Columns | glimpse() | Get a quick overview of the corpus | | Groups of rows | add_count() | Add count by group as a docvar | | Groups of rows | add_tally() | Add total count as a docvar | | Pairs of data frames | left_join() | Join corpus with external data frame |

Example

Adding a document variable for full president name:

library("quanteda.tidy", warn.conflicts = FALSE)
## Loading required package: quanteda
## Package version: 4.3.1
## Unicode version: 14.0
## ICU version: 71.1
## Parallel computing: disabled
## See https://quanteda.io for tutorials and examples.

data_corpus_inaugural %>%
  mutate(fullname = paste(FirstName, President, sep = ", ")) %>%
  summary(n = 5)
## Corpus consisting of 60 documents, showing 5 documents:
## 
##             Text Types Tokens Sentences Year  President FirstName
##  1789-Washington   625   1537        23 1789 Washington    George
##  1793-Washington    96    147         4 1793 Washington    George
##       1797-Adams   826   2577        37 1797      Adams      John
##   1801-Jefferson   717   1923        41 1801  Jefferson    Thomas
##   1805-Jefferson   804   2380        45 1805  Jefferson    Thomas
##                  Party           fullname
##                   none George, Washington
##                   none George, Washington
##             Federalist        John, Adams
##  Democratic-Republican  Thomas, Jefferson
##  Democratic-Republican  Thomas, Jefferson


Try the quanteda.tidy package in your browser

Any scripts or data that you put into this service are public.

quanteda.tidy documentation built on Dec. 17, 2025, 5:09 p.m.