brown: Brown Corpus

brownR Documentation

Brown Corpus

Description

A dataset containing the 1,155,866 tokenized words for 15 genre categories of a sample of American English.

Usage

brown

Format

A data frame with 223,506 rows and 11 variables:

document_id

ID for each corpus document

category

Label code for each of the 15 corpus categories

category_description

Description label for the corpus categories

words

Tokenized words from the corpus

pos

Part of speech label for each word in the corpus

Source

http://www.nltk.org/nltk_data/


francojc/tadr documentation built on April 26, 2022, 7:55 p.m.