brown: Brown Corpus

Description Usage Format Source

Description

A dataset containing the 1,155,866 tokenized words for 15 genre categories of a sample of American English.

Usage

1

Format

A data frame with 223,506 rows and 11 variables:

document_id

ID for each corpus document

category

Label code for each of the 15 corpus categories

category_description

Description label for the corpus categories

words

Tokenized words from the corpus

pos

Part of speech label for each word in the corpus

Source

http://www.nltk.org/nltk_data/


francojc/langdata documentation built on May 31, 2019, 2:48 p.m.