rcorpora: A Collection of Small Text Corpora of Interesting Data

A collection of small text corpora of interesting data. It contains all data sets from https://github.com/dariusk/corpora. Some examples: names of animals: birds, dinosaurs, dogs; foods: beer categories, pizza toppings; geography: English towns, rivers, oceans; humans: authors, US presidents, occupations; science: elements, planets; words: adjectives, verbs, proverbs, US president quotes.

Install the latest version of this package by entering the following in R:
install.packages("rcorpora")
AuthorDarius Kazemi, Cole Willsea, Matthew Rothenberg, Karl Swedberg, Daniel D. Beck, Javier Arce, Matthew Hokanson, Casey Kolderup, Nathaniel Mitchell, Mark Sample, Nathan Lachenmyer, Aaron Marriner, Greg Kennedy, Greg Borenstein, Peter Organisciak, Rachel White, Tod Robbins, John Wiseman, M. Nowak, Alice Maz, Allison Parrish, Andrew Gorman, Colin Mitchell, David Whitten, Mary Dickson, Michael R. Bernstein, Parker Higgins, Patrick Rodriguez, Ross Barclay, Ross Binden, Ryan Freebern, Justin Alford, Brian Detweiler, Ed Lea, John Ohno, Alexandra Murray, Sean May, Will Hankinson, Brett O'Connor, Brian Jones, Casey Olson, Edward Loveall, Felix Laurie von Massenbach, Garrett Miller, Grant Williamson, Jacob Fauber, Joe Mahoney, Jordan Killpack, Kay Belardinelli, K.Adam White, Kyle McDonald, Andy Dayton, Adam Malantonio, Marcos Wright-Kuhns, Mark Wunsch, Max Bittker, Michael Dewberry, Nathan Black, Noah Kantrowitz, Noah Swartz, Ranjit Bhatnagar, Rob Huzzey, Russell Horton, Vincent Bruijn, Virginia Murdoch, Zac Moody, Scott Grant, Tariq Ali
Date of publication2016-05-02 06:24:49
MaintainerGabor Csardi <csardi.gabor@gmail.com>
LicenseCC0
Version1.2.0
https://github.com/gaborcsardi/rcorpora

View on CRAN

Files

inst
inst/corpora
inst/corpora/data
inst/corpora/data/corporations
inst/corpora/data/corporations/cars.json
inst/corpora/data/corporations/newspapers.json
inst/corpora/data/corporations/fortune500.json
inst/corpora/data/corporations/industries.json
inst/corpora/data/corporations/djia.json
inst/corpora/data/corporations/nasdaq.json
inst/corpora/data/film-tv
inst/corpora/data/film-tv/tv_shows.json
inst/corpora/data/architecture
inst/corpora/data/architecture/passages.json
inst/corpora/data/architecture/rooms.json
inst/corpora/data/governments
inst/corpora/data/governments/us_mil_operations.json
inst/corpora/data/governments/uk_political_parties.json
inst/corpora/data/governments/nsa_projects.json
inst/corpora/data/governments/us_federal_agencies.json
inst/corpora/data/technology
inst/corpora/data/technology/computer_sciences.json
inst/corpora/data/technology/photo_sharing_websites.json
inst/corpora/data/technology/video_hosting_websites.json
inst/corpora/data/technology/lisp.json
inst/corpora/data/technology/programming_languages.json
inst/corpora/data/technology/social_networking_websites.json
inst/corpora/data/technology/fireworks.json
inst/corpora/data/technology/guns_n_rifles.json
inst/corpora/data/technology/knots.json
inst/corpora/data/technology/appliances.json
inst/corpora/data/technology/new_technologies.json
inst/corpora/data/humans
inst/corpora/data/humans/occupations.json
inst/corpora/data/humans/wrestlers.json
inst/corpora/data/humans/firstNames.json
inst/corpora/data/humans/spanishFirstNames.json
inst/corpora/data/humans/bodyParts.json
inst/corpora/data/humans/prefixes.json
inst/corpora/data/humans/spanishLastNames.json
inst/corpora/data/humans/spinalTapDrummers.json
inst/corpora/data/humans/authors.json
inst/corpora/data/humans/scientists.json
inst/corpora/data/humans/richpeople.json
inst/corpora/data/humans/suffixes.json
inst/corpora/data/humans/lastNames.json
inst/corpora/data/humans/us_presidents.json
inst/corpora/data/humans/britishActors.json
inst/corpora/data/humans/moods.json
inst/corpora/data/humans/englishHonorifics.json
inst/corpora/data/humans/famousDuos.json
inst/corpora/data/geography
inst/corpora/data/geography/oceans.json
inst/corpora/data/geography/rivers.json
inst/corpora/data/geography/canada_provinces_and_territories.json
inst/corpora/data/geography/venues.json
inst/corpora/data/geography/london_underground_stations.json
inst/corpora/data/geography/us_cities.json
inst/corpora/data/geography/countries.json
inst/corpora/data/geography/english_towns_cities.json
inst/corpora/data/objects
inst/corpora/data/objects/objects.json
inst/corpora/data/art
inst/corpora/data/art/isms.json
inst/corpora/data/religion
inst/corpora/data/religion/religions.json
inst/corpora/data/religion/christian_saints.json
inst/corpora/data/religion/fictional_religions.json
inst/corpora/data/religion/parody_religions.json
inst/corpora/data/music
inst/corpora/data/music/genres.json
inst/corpora/data/music/rock_hall_of_fame.json
inst/corpora/data/music/mtv_day_one.json
inst/corpora/data/music/bands_that_have_opened_for_tool.json
inst/corpora/data/sports
inst/corpora/data/sports/nfl_teams.json
inst/corpora/data/materials
inst/corpora/data/materials/technical-fabrics.json
inst/corpora/data/materials/fabrics.json
inst/corpora/data/materials/building-materials.json
inst/corpora/data/materials/abridged-body-fluids.json
inst/corpora/data/materials/layperson-metals.json
inst/corpora/data/materials/metals.json
inst/corpora/data/materials/natural-materials.json
inst/corpora/data/materials/plastic-brands.json
inst/corpora/data/materials/gemstones.json
inst/corpora/data/materials/packaging.json
inst/corpora/data/materials/fibers.json
inst/corpora/data/materials/carbon-allotropes.json
inst/corpora/data/materials/sculpture-materials.json
inst/corpora/data/materials/decorative-stones.json
inst/corpora/data/science
inst/corpora/data/science/toxic_chemicals.json
inst/corpora/data/science/elements.json
inst/corpora/data/science/pregnancy.json
inst/corpora/data/science/minor_planets.json
inst/corpora/data/science/hail_size.json
inst/corpora/data/science/planets.json
inst/corpora/data/games
inst/corpora/data/games/street_fighter_ii.json
inst/corpora/data/games/jeopardy_questions.json
inst/corpora/data/games/dark_souls_iii_messages.json
inst/corpora/data/games/bannedGames
inst/corpora/data/games/bannedGames/brazil
inst/corpora/data/games/bannedGames/brazil/bannedList.json
inst/corpora/data/games/bannedGames/denmark
inst/corpora/data/games/bannedGames/denmark/bannedList.json
inst/corpora/data/games/bannedGames/china
inst/corpora/data/games/bannedGames/china/bannedList.json
inst/corpora/data/games/bannedGames/argentina
inst/corpora/data/games/bannedGames/argentina/bannedList.json
inst/corpora/data/games/trivial_pursuit.json
inst/corpora/data/games/cluedo.json
inst/corpora/data/games/wrestling_moves.json
inst/corpora/data/games/pokemon.json
inst/corpora/data/games/scrabble.json
inst/corpora/data/colors
inst/corpora/data/colors/web_colors.json
inst/corpora/data/colors/paints.json
inst/corpora/data/colors/crayola.json
inst/corpora/data/instructions
inst/corpora/data/instructions/laundry_care.json
inst/corpora/data/animals
inst/corpora/data/animals/birds_north_america.json
inst/corpora/data/animals/dinosaurs.json
inst/corpora/data/animals/common.json
inst/corpora/data/animals/birds_antarctica.json
inst/corpora/data/animals/birds_uk.json
inst/corpora/data/animals/dogs.json
inst/corpora/data/words
inst/corpora/data/words/interjections.json
inst/corpora/data/words/adverbs.json
inst/corpora/data/words/encouraging_words.json
inst/corpora/data/words/spells.json
inst/corpora/data/words/resume_action_words.json
inst/corpora/data/words/literature
inst/corpora/data/words/literature/mr_men_little_miss.json
inst/corpora/data/words/literature/shakespeare_sonnets.json
inst/corpora/data/words/literature/shakespeare_phrases.json
inst/corpora/data/words/literature/shakespeare_words.json
inst/corpora/data/words/common.json
inst/corpora/data/words/prefix_root_suffix.json
inst/corpora/data/words/word_clues
inst/corpora/data/words/word_clues/clues_four.json
inst/corpora/data/words/word_clues/clues_six.json
inst/corpora/data/words/word_clues/clues_five.json
inst/corpora/data/words/oprah_quotes.json
inst/corpora/data/words/personal_nouns.json
inst/corpora/data/words/adjs.json
inst/corpora/data/words/rhymeless_words.json
inst/corpora/data/words/crash_blossoms.json
inst/corpora/data/words/emoji
inst/corpora/data/words/emoji/positive_emoji.json
inst/corpora/data/words/emoji/cute_kaomoji.json
inst/corpora/data/words/emoji/sea_emoji.json
inst/corpora/data/words/verbs.json
inst/corpora/data/words/closed_pairs.json
inst/corpora/data/words/nouns.json
inst/corpora/data/words/states_of_drunkenness.json
inst/corpora/data/words/us_president_quotes.json
inst/corpora/data/words/proverbs.json
inst/corpora/data/words/eggcorns.json
inst/corpora/data/words/stopwords
inst/corpora/data/words/stopwords/nl.json
inst/corpora/data/words/stopwords/ru.json
inst/corpora/data/words/stopwords/bg.json
inst/corpora/data/words/stopwords/en.json
inst/corpora/data/words/stopwords/es.json
inst/corpora/data/words/stopwords/fr.json
inst/corpora/data/words/stopwords/de.json
inst/corpora/data/words/stopwords/sk.json
inst/corpora/data/words/stopwords/cs.json
inst/corpora/data/words/stopwords/fi.json
inst/corpora/data/words/stopwords/it.json
inst/corpora/data/words/stopwords/pt.json
inst/corpora/data/words/stopwords/no.json
inst/corpora/data/words/stopwords/sv.json
inst/corpora/data/words/stopwords/da.json
inst/corpora/data/words/stopwords/pl.json
inst/corpora/data/words/stopwords/lv.json
inst/corpora/data/words/stopwords/jp.json
inst/corpora/data/words/stopwords/ar.json
inst/corpora/data/words/stopwords/tr.json
inst/corpora/data/words/stopwords/gr.json
inst/corpora/data/plants
inst/corpora/data/plants/flowers.json
inst/corpora/data/plants/cannabis.json
inst/corpora/data/divination
inst/corpora/data/divination/tarot_interpretations.json
inst/corpora/data/societies_and_groups
inst/corpora/data/societies_and_groups/fraternities
inst/corpora/data/societies_and_groups/fraternities/coeducational_fraternities.json
inst/corpora/data/societies_and_groups/fraternities/sororities.json
inst/corpora/data/societies_and_groups/fraternities/service.json
inst/corpora/data/societies_and_groups/fraternities/professional.json
inst/corpora/data/societies_and_groups/fraternities/fraternities.json
inst/corpora/data/societies_and_groups/fraternities/defunct.json
inst/corpora/data/societies_and_groups/animal_welfare.json
inst/corpora/data/societies_and_groups/semi_secret.json
inst/corpora/data/societies_and_groups/designated_terrorist_groups
inst/corpora/data/societies_and_groups/designated_terrorist_groups/egypt.json
inst/corpora/data/societies_and_groups/designated_terrorist_groups/iran.json
inst/corpora/data/societies_and_groups/designated_terrorist_groups/united_nations.json
inst/corpora/data/societies_and_groups/designated_terrorist_groups/uae.json
inst/corpora/data/societies_and_groups/designated_terrorist_groups/australia.json
inst/corpora/data/societies_and_groups/designated_terrorist_groups/china.json
inst/corpora/data/societies_and_groups/designated_terrorist_groups/kazakhstan.json
inst/corpora/data/societies_and_groups/designated_terrorist_groups/tunisia.json
inst/corpora/data/societies_and_groups/designated_terrorist_groups/european_union.json
inst/corpora/data/societies_and_groups/designated_terrorist_groups/saudi_arabia.json
inst/corpora/data/societies_and_groups/designated_terrorist_groups/united_states.json
inst/corpora/data/societies_and_groups/designated_terrorist_groups/united_kingdom.json
inst/corpora/data/societies_and_groups/designated_terrorist_groups/israel.json
inst/corpora/data/societies_and_groups/designated_terrorist_groups/ukraine.json
inst/corpora/data/societies_and_groups/designated_terrorist_groups/russia.json
inst/corpora/data/societies_and_groups/designated_terrorist_groups/canada.json
inst/corpora/data/societies_and_groups/designated_terrorist_groups/india.json
inst/corpora/data/societies_and_groups/designated_terrorist_groups/turkey.json
inst/corpora/data/mathematics
inst/corpora/data/mathematics/trigonometry.json
inst/corpora/data/mathematics/primes.json
inst/corpora/data/mathematics/fibonnaciSequence.json
inst/corpora/data/foods
inst/corpora/data/foods/wine_descriptions.json
inst/corpora/data/foods/menuItems.json
inst/corpora/data/foods/beer_categories.json
inst/corpora/data/foods/tea.json
inst/corpora/data/foods/curds.json
inst/corpora/data/foods/fruits.json
inst/corpora/data/foods/apple_cultivars.json
inst/corpora/data/foods/vegetables.json
inst/corpora/data/foods/beer_styles.json
inst/corpora/data/foods/condiments.json
inst/corpora/data/foods/sandwiches.json
inst/corpora/data/foods/hot_peppers.json
inst/corpora/data/foods/pizzaToppings.json
inst/corpora/data/foods/herbs_n_spices.json
inst/corpora/data/foods/breads_and_pastries.json
inst/corpora/data/foods/vegetable_cooking_times.json
inst/corpora/data/foods/combine.json
inst/corpora/data/medicine
inst/corpora/data/medicine/diagnoses.json
inst/corpora/data/medicine/drugs.json
inst/corpora/data/medicine/drugNameStems.json
inst/corpora/data/archetypes
inst/corpora/data/archetypes/character.json
inst/corpora/data/archetypes/artifact.json
inst/corpora/data/archetypes/setting.json
inst/corpora/data/archetypes/event.json
inst/corpora/data/mythology
inst/corpora/data/mythology/hebrew_god.json
inst/corpora/data/mythology/greek_titans.json
inst/corpora/data/mythology/greek_monsters.json
inst/corpora/data/mythology/lovecraft.json
inst/corpora/data/mythology/greek_gods.json
inst/corpora/data/mythology/monsters.json
inst/corpora/data/mythology/norse_gods.json
inst/corpora/README.md
inst/corpora/package.json
inst/corpora/Gruntfile.js
inst/README.markdown
inst/NEWS.md
NAMESPACE
R
R/corpora.R
MD5
build
build/rcorpora.pdf
DESCRIPTION
man
man/categories.Rd man/corpora.Rd

Questions? Problems? Suggestions? or email at ian@mutexlabs.com.

Please suggest features or report bugs with the GitHub issue tracker.

All documentation is copyright its authors; we didn't write any of that.