README.md
In quanteda.tidy: 'tidyverse' Extensions for 'quanteda'

quanteda.tidy

quanteda.tidy extends the quanteda package with functionality from the “tidyverse”, especially dplyr. Note that this is not the same as tidytext, which stretches tokens into data.frames. Instead, tidy functions operate only on document variables, but extends these functions (from dplyr) to work on quanteda objects as if they were tibbles or data.frames.

You can install the stable version of quanteda.tidy from CRAN:

install.packages("quanteda.tidy")

Or install the development version from GitHub:

pak::pkg_install("quanteda/quanteda.tidy")

The functions in quanteda.tidy are organized into four categories, following the dplyr documentation:

| Category | Function | Description | |:---|:---|:---| | Rows | filter() | Subset documents based on docvar conditions | | Rows | slice(), slice_head(), slice_tail() | Subset documents by position | | Rows | slice_sample() | Randomly sample documents | | Rows | slice_min(), slice_max() | Select documents with min/max docvar values | | Rows | arrange(), distinct() | Reorder documents; keep unique documents | | Columns | select() | Keep or drop docvars by name | | Columns | rename(), rename_with() | Rename docvars | | Columns | relocate() | Change docvar column order | | Columns | mutate(), transmute() | Create or modify docvars | | Columns | pull() | Extract a single docvar as a vector | | Columns | glimpse() | Get a quick overview of the corpus | | Groups of rows | add_count() | Add count by group as a docvar | | Groups of rows | add_tally() | Add total count as a docvar | | Pairs of data frames | left_join() | Join corpus with external data frame |

Adding a document variable for full president name:

library("quanteda.tidy", warn.conflicts = FALSE)
## Loading required package: quanteda
## Package version: 4.3.1
## Unicode version: 14.0
## ICU version: 71.1
## Parallel computing: disabled
## See https://quanteda.io for tutorials and examples.

data_corpus_inaugural %>%
  mutate(fullname = paste(FirstName, President, sep = ", ")) %>%
  summary(n = 5)
## Corpus consisting of 60 documents, showing 5 documents:
## 
##             Text Types Tokens Sentences Year  President FirstName
##  1789-Washington   625   1537        23 1789 Washington    George
##  1793-Washington    96    147         4 1793 Washington    George
##       1797-Adams   826   2577        37 1797      Adams      John
##   1801-Jefferson   717   1923        41 1801  Jefferson    Thomas
##   1805-Jefferson   804   2380        45 1805  Jefferson    Thomas
##                  Party           fullname
##                   none George, Washington
##                   none George, Washington
##             Federalist        John, Adams
##  Democratic-Republican  Thomas, Jefferson
##  Democratic-Republican  Thomas, Jefferson