brown_words: The Brown Corpus of Written American English (words)

Description Usage Format Details Source

Description

The Brown Corpus was the first computer-readable general corpus of texts prepared for linguistic research on modern English. It contains of over 1 million words (500 samples of 2000+ words each) of running text of edited English prose printed in the United States during the calendar year 1961.

Usage

1

Format

A data frame with 1,004,082 rows and 4 variables:

doc_id

Original file name for each written sample

category

The writing category of each sample

word

Word tokens

tag

Part-of-speech tag for each word token

Details

This dataset has 1,004,082 rows corresponding to the tokenized words and 4 variables. For more information: http://www.helsinki.fi/varieng/CoRD/corpora/BROWN/

Source

https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/packages/corpora/brown.zip


WFU-TLC/analyzr documentation built on June 4, 2019, 2:27 p.m.