cr_sample_corpus: Congressional Record sample corpus

cr_sample_corpusR Documentation

Congressional Record sample corpus

Description

A (quanteda) corpus containing a sample of the United States Congressional Record (daily transcripts) covering the 111th to 114th Congresses. The raw corpus is first subset to speeches containing the regular expression "immig*". Then 100 docs from each party-gender pair is randomly sampled. For full data and pre-processing file, see: https://www.dropbox.com/sh/jsyrag7opfo7l7i/AAB1z7tumLuKihGu2-FDmhmKa?dl=0 For nominate scores see: https://voteview.com/data

Usage

cr_sample_corpus

Format

A quanteda corpus with 200 documents and 3 docvars:

party

party of speaker, (D)emocrat or (R)epublican

gender

gender of speaker, (F)emale or (M)ale

nominate_dim1

dimension 1 of the nominate score

...

Source

https://data.stanford.edu/congress_text


conText documentation built on Feb. 16, 2023, 7:32 p.m.