A set of language datasets and the code that creates them. These datasets provide a starting point for data visualization, transformation and analysis.
Install from GitHub with devtools::install_github("francojc/langdata")
.
Switchboard Dialog Act Corpus
A dataset containing a corpus of spontaneous conversations from 440 speakers of American English in 1,115 individual conversations. Original corpus files and documentation from the Linguistic Data Consortium is available here.
Brown Corpus
A dataset containing the 1,155,866 tokenized words for 15 genre categories of a sample of American English. Original corpus files and documentation from the Natural Language Toolkit data repository is available here.
...
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.