cr_sample_corpus: Congressional Record sample corpus
In conText: 'a la Carte' on Text (ConText) Embedding Regression

cr_sample_corpus

R Documentation

Congressional Record sample corpus

Description

A (quanteda) corpus containing a sample of the United States Congressional Record (daily transcripts) covering the 111th to 114th Congresses. The raw corpus is first subset to speeches containing the regular expression "immig*". Then 100 docs from each party-gender pair is randomly sampled. For full data and pre-processing file, see: https://www.dropbox.com/sh/jsyrag7opfo7l7i/AAB1z7tumLuKihGu2-FDmhmKa?dl=0 For nominate scores see: https://voteview.com/data

Usage

cr_sample_corpus

Format

A quanteda corpus with 200 documents and 3 docvars:

party: party of speaker, (D)emocrat or (R)epublican
gender: gender of speaker, (F)emale or (M)ale
nominate_dim1: dimension 1 of the nominate score

...

Source

https://data.stanford.edu/congress_text

conText documentation built on April 12, 2026, 9:06 a.m.

conText index

Package overview README.md Quick Start Guide

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com