brown_words: The Brown Corpus of Written American English (words)
In WFU-TLC/analyzr: Support Package for Data Analysis with R (FLC)

The Brown Corpus was the first computer-readable general corpus of texts prepared for linguistic research on modern English. It contains of over 1 million words (500 samples of 2000+ words each) of running text of edited English prose printed in the United States during the calendar year 1961.

1	brown_words

A data frame with 1,004,082 rows and 4 variables:

doc_id: Original file name for each written sample
category: The writing category of each sample
word: Word tokens
tag: Part-of-speech tag for each word token

This dataset has 1,004,082 rows corresponding to the tokenized words and 4 variables. For more information: http://www.helsinki.fi/varieng/CoRD/corpora/BROWN/

https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/packages/corpora/brown.zip

WFU-TLC/analyzr documentation built on June 4, 2019, 2:27 p.m.