rcorpora: A Collection of Small Text Corpora of Interesting Data

A collection of small text corpora of interesting data. It contains all data sets from https://github.com/dariusk/corpora. Some examples: names of animals: birds, dinosaurs, dogs; foods: beer categories, pizza toppings; geography: English towns, rivers, oceans; humans: authors, US presidents, occupations; science: elements, planets; words: adjectives, verbs, proverbs, US president quotes.

Author
Darius Kazemi, Cole Willsea, Matthew Rothenberg, Karl Swedberg, Daniel D. Beck, Javier Arce, Matthew Hokanson, Casey Kolderup, Nathaniel Mitchell, Mark Sample, Nathan Lachenmyer, Aaron Marriner, Greg Kennedy, Greg Borenstein, Peter Organisciak, Rachel White, Tod Robbins, John Wiseman, M. Nowak, Alice Maz, Allison Parrish, Andrew Gorman, Colin Mitchell, David Whitten, Mary Dickson, Michael R. Bernstein, Parker Higgins, Patrick Rodriguez, Ross Barclay, Ross Binden, Ryan Freebern, Justin Alford, Brian Detweiler, Ed Lea, John Ohno, Alexandra Murray, Sean May, Will Hankinson, Brett O'Connor, Brian Jones, Casey Olson, Edward Loveall, Felix Laurie von Massenbach, Garrett Miller, Grant Williamson, Jacob Fauber, Joe Mahoney, Jordan Killpack, Kay Belardinelli, K.Adam White, Kyle McDonald, Andy Dayton, Adam Malantonio, Marcos Wright-Kuhns, Mark Wunsch, Max Bittker, Michael Dewberry, Nathan Black, Noah Kantrowitz, Noah Swartz, Ranjit Bhatnagar, Rob Huzzey, Russell Horton, Vincent Bruijn, Virginia Murdoch, Zac Moody, Scott Grant, Tariq Ali
Date of publication
2016-05-02 06:24:49
Maintainer
Gabor Csardi <csardi.gabor@gmail.com>
License
CC0
Version
1.2.0
URLs

View on CRAN

Man pages

categories
List data set categories in the corpora package
corpora
Load a data set from the corpora package

Files in this package

rcorpora
rcorpora/inst
rcorpora/inst/corpora
rcorpora/inst/corpora/data
rcorpora/inst/corpora/data/corporations
rcorpora/inst/corpora/data/corporations/cars.json
rcorpora/inst/corpora/data/corporations/newspapers.json
rcorpora/inst/corpora/data/corporations/fortune500.json
rcorpora/inst/corpora/data/corporations/industries.json
rcorpora/inst/corpora/data/corporations/djia.json
rcorpora/inst/corpora/data/corporations/nasdaq.json
rcorpora/inst/corpora/data/film-tv
rcorpora/inst/corpora/data/film-tv/tv_shows.json
rcorpora/inst/corpora/data/architecture
rcorpora/inst/corpora/data/architecture/passages.json
rcorpora/inst/corpora/data/architecture/rooms.json
rcorpora/inst/corpora/data/governments
rcorpora/inst/corpora/data/governments/us_mil_operations.json
rcorpora/inst/corpora/data/governments/uk_political_parties.json
rcorpora/inst/corpora/data/governments/nsa_projects.json
rcorpora/inst/corpora/data/governments/us_federal_agencies.json
rcorpora/inst/corpora/data/technology
rcorpora/inst/corpora/data/technology/computer_sciences.json
rcorpora/inst/corpora/data/technology/photo_sharing_websites.json
rcorpora/inst/corpora/data/technology/video_hosting_websites.json
rcorpora/inst/corpora/data/technology/lisp.json
rcorpora/inst/corpora/data/technology/programming_languages.json
rcorpora/inst/corpora/data/technology/social_networking_websites.json
rcorpora/inst/corpora/data/technology/fireworks.json
rcorpora/inst/corpora/data/technology/guns_n_rifles.json
rcorpora/inst/corpora/data/technology/knots.json
rcorpora/inst/corpora/data/technology/appliances.json
rcorpora/inst/corpora/data/technology/new_technologies.json
rcorpora/inst/corpora/data/humans
rcorpora/inst/corpora/data/humans/occupations.json
rcorpora/inst/corpora/data/humans/wrestlers.json
rcorpora/inst/corpora/data/humans/firstNames.json
rcorpora/inst/corpora/data/humans/spanishFirstNames.json
rcorpora/inst/corpora/data/humans/bodyParts.json
rcorpora/inst/corpora/data/humans/prefixes.json
rcorpora/inst/corpora/data/humans/spanishLastNames.json
rcorpora/inst/corpora/data/humans/spinalTapDrummers.json
rcorpora/inst/corpora/data/humans/authors.json
rcorpora/inst/corpora/data/humans/scientists.json
rcorpora/inst/corpora/data/humans/richpeople.json
rcorpora/inst/corpora/data/humans/suffixes.json
rcorpora/inst/corpora/data/humans/lastNames.json
rcorpora/inst/corpora/data/humans/us_presidents.json
rcorpora/inst/corpora/data/humans/britishActors.json
rcorpora/inst/corpora/data/humans/moods.json
rcorpora/inst/corpora/data/humans/englishHonorifics.json
rcorpora/inst/corpora/data/humans/famousDuos.json
rcorpora/inst/corpora/data/geography
rcorpora/inst/corpora/data/geography/oceans.json
rcorpora/inst/corpora/data/geography/rivers.json
rcorpora/inst/corpora/data/geography/canada_provinces_and_territories.json
rcorpora/inst/corpora/data/geography/venues.json
rcorpora/inst/corpora/data/geography/london_underground_stations.json
rcorpora/inst/corpora/data/geography/us_cities.json
rcorpora/inst/corpora/data/geography/countries.json
rcorpora/inst/corpora/data/geography/english_towns_cities.json
rcorpora/inst/corpora/data/objects
rcorpora/inst/corpora/data/objects/objects.json
rcorpora/inst/corpora/data/art
rcorpora/inst/corpora/data/art/isms.json
rcorpora/inst/corpora/data/religion
rcorpora/inst/corpora/data/religion/religions.json
rcorpora/inst/corpora/data/religion/christian_saints.json
rcorpora/inst/corpora/data/religion/fictional_religions.json
rcorpora/inst/corpora/data/religion/parody_religions.json
rcorpora/inst/corpora/data/music
rcorpora/inst/corpora/data/music/genres.json
rcorpora/inst/corpora/data/music/rock_hall_of_fame.json
rcorpora/inst/corpora/data/music/mtv_day_one.json
rcorpora/inst/corpora/data/music/bands_that_have_opened_for_tool.json
rcorpora/inst/corpora/data/sports
rcorpora/inst/corpora/data/sports/nfl_teams.json
rcorpora/inst/corpora/data/materials
rcorpora/inst/corpora/data/materials/technical-fabrics.json
rcorpora/inst/corpora/data/materials/fabrics.json
rcorpora/inst/corpora/data/materials/building-materials.json
rcorpora/inst/corpora/data/materials/abridged-body-fluids.json
rcorpora/inst/corpora/data/materials/layperson-metals.json
rcorpora/inst/corpora/data/materials/metals.json
rcorpora/inst/corpora/data/materials/natural-materials.json
rcorpora/inst/corpora/data/materials/plastic-brands.json
rcorpora/inst/corpora/data/materials/gemstones.json
rcorpora/inst/corpora/data/materials/packaging.json
rcorpora/inst/corpora/data/materials/fibers.json
rcorpora/inst/corpora/data/materials/carbon-allotropes.json
rcorpora/inst/corpora/data/materials/sculpture-materials.json
rcorpora/inst/corpora/data/materials/decorative-stones.json
rcorpora/inst/corpora/data/science
rcorpora/inst/corpora/data/science/toxic_chemicals.json
rcorpora/inst/corpora/data/science/elements.json
rcorpora/inst/corpora/data/science/pregnancy.json
rcorpora/inst/corpora/data/science/minor_planets.json
rcorpora/inst/corpora/data/science/hail_size.json
rcorpora/inst/corpora/data/science/planets.json
rcorpora/inst/corpora/data/games
rcorpora/inst/corpora/data/games/street_fighter_ii.json
rcorpora/inst/corpora/data/games/jeopardy_questions.json
rcorpora/inst/corpora/data/games/dark_souls_iii_messages.json
rcorpora/inst/corpora/data/games/bannedGames
rcorpora/inst/corpora/data/games/bannedGames/brazil
rcorpora/inst/corpora/data/games/bannedGames/brazil/bannedList.json
rcorpora/inst/corpora/data/games/bannedGames/denmark
rcorpora/inst/corpora/data/games/bannedGames/denmark/bannedList.json
rcorpora/inst/corpora/data/games/bannedGames/china
rcorpora/inst/corpora/data/games/bannedGames/china/bannedList.json
rcorpora/inst/corpora/data/games/bannedGames/argentina
rcorpora/inst/corpora/data/games/bannedGames/argentina/bannedList.json
rcorpora/inst/corpora/data/games/trivial_pursuit.json
rcorpora/inst/corpora/data/games/cluedo.json
rcorpora/inst/corpora/data/games/wrestling_moves.json
rcorpora/inst/corpora/data/games/pokemon.json
rcorpora/inst/corpora/data/games/scrabble.json
rcorpora/inst/corpora/data/colors
rcorpora/inst/corpora/data/colors/web_colors.json
rcorpora/inst/corpora/data/colors/paints.json
rcorpora/inst/corpora/data/colors/crayola.json
rcorpora/inst/corpora/data/instructions
rcorpora/inst/corpora/data/instructions/laundry_care.json
rcorpora/inst/corpora/data/animals
rcorpora/inst/corpora/data/animals/birds_north_america.json
rcorpora/inst/corpora/data/animals/dinosaurs.json
rcorpora/inst/corpora/data/animals/common.json
rcorpora/inst/corpora/data/animals/birds_antarctica.json
rcorpora/inst/corpora/data/animals/birds_uk.json
rcorpora/inst/corpora/data/animals/dogs.json
rcorpora/inst/corpora/data/words
rcorpora/inst/corpora/data/words/interjections.json
rcorpora/inst/corpora/data/words/adverbs.json
rcorpora/inst/corpora/data/words/encouraging_words.json
rcorpora/inst/corpora/data/words/spells.json
rcorpora/inst/corpora/data/words/resume_action_words.json
rcorpora/inst/corpora/data/words/literature
rcorpora/inst/corpora/data/words/literature/mr_men_little_miss.json
rcorpora/inst/corpora/data/words/literature/shakespeare_sonnets.json
rcorpora/inst/corpora/data/words/literature/shakespeare_phrases.json
rcorpora/inst/corpora/data/words/literature/shakespeare_words.json
rcorpora/inst/corpora/data/words/common.json
rcorpora/inst/corpora/data/words/prefix_root_suffix.json
rcorpora/inst/corpora/data/words/word_clues
rcorpora/inst/corpora/data/words/word_clues/clues_four.json
rcorpora/inst/corpora/data/words/word_clues/clues_six.json
rcorpora/inst/corpora/data/words/word_clues/clues_five.json
rcorpora/inst/corpora/data/words/oprah_quotes.json
rcorpora/inst/corpora/data/words/personal_nouns.json
rcorpora/inst/corpora/data/words/adjs.json
rcorpora/inst/corpora/data/words/rhymeless_words.json
rcorpora/inst/corpora/data/words/crash_blossoms.json
rcorpora/inst/corpora/data/words/emoji
rcorpora/inst/corpora/data/words/emoji/positive_emoji.json
rcorpora/inst/corpora/data/words/emoji/cute_kaomoji.json
rcorpora/inst/corpora/data/words/emoji/sea_emoji.json
rcorpora/inst/corpora/data/words/verbs.json
rcorpora/inst/corpora/data/words/closed_pairs.json
rcorpora/inst/corpora/data/words/nouns.json
rcorpora/inst/corpora/data/words/states_of_drunkenness.json
rcorpora/inst/corpora/data/words/us_president_quotes.json
rcorpora/inst/corpora/data/words/proverbs.json
rcorpora/inst/corpora/data/words/eggcorns.json
rcorpora/inst/corpora/data/words/stopwords
rcorpora/inst/corpora/data/words/stopwords/nl.json
rcorpora/inst/corpora/data/words/stopwords/ru.json
rcorpora/inst/corpora/data/words/stopwords/bg.json
rcorpora/inst/corpora/data/words/stopwords/en.json
rcorpora/inst/corpora/data/words/stopwords/es.json
rcorpora/inst/corpora/data/words/stopwords/fr.json
rcorpora/inst/corpora/data/words/stopwords/de.json
rcorpora/inst/corpora/data/words/stopwords/sk.json
rcorpora/inst/corpora/data/words/stopwords/cs.json
rcorpora/inst/corpora/data/words/stopwords/fi.json
rcorpora/inst/corpora/data/words/stopwords/it.json
rcorpora/inst/corpora/data/words/stopwords/pt.json
rcorpora/inst/corpora/data/words/stopwords/no.json
rcorpora/inst/corpora/data/words/stopwords/sv.json
rcorpora/inst/corpora/data/words/stopwords/da.json
rcorpora/inst/corpora/data/words/stopwords/pl.json
rcorpora/inst/corpora/data/words/stopwords/lv.json
rcorpora/inst/corpora/data/words/stopwords/jp.json
rcorpora/inst/corpora/data/words/stopwords/ar.json
rcorpora/inst/corpora/data/words/stopwords/tr.json
rcorpora/inst/corpora/data/words/stopwords/gr.json
rcorpora/inst/corpora/data/plants
rcorpora/inst/corpora/data/plants/flowers.json
rcorpora/inst/corpora/data/plants/cannabis.json
rcorpora/inst/corpora/data/divination
rcorpora/inst/corpora/data/divination/tarot_interpretations.json
rcorpora/inst/corpora/data/societies_and_groups
rcorpora/inst/corpora/data/societies_and_groups/fraternities
rcorpora/inst/corpora/data/societies_and_groups/fraternities/coeducational_fraternities.json
rcorpora/inst/corpora/data/societies_and_groups/fraternities/sororities.json
rcorpora/inst/corpora/data/societies_and_groups/fraternities/service.json
rcorpora/inst/corpora/data/societies_and_groups/fraternities/professional.json
rcorpora/inst/corpora/data/societies_and_groups/fraternities/fraternities.json
rcorpora/inst/corpora/data/societies_and_groups/fraternities/defunct.json
rcorpora/inst/corpora/data/societies_and_groups/animal_welfare.json
rcorpora/inst/corpora/data/societies_and_groups/semi_secret.json
rcorpora/inst/corpora/data/societies_and_groups/designated_terrorist_groups
rcorpora/inst/corpora/data/societies_and_groups/designated_terrorist_groups/egypt.json
rcorpora/inst/corpora/data/societies_and_groups/designated_terrorist_groups/iran.json
rcorpora/inst/corpora/data/societies_and_groups/designated_terrorist_groups/united_nations.json
rcorpora/inst/corpora/data/societies_and_groups/designated_terrorist_groups/uae.json
rcorpora/inst/corpora/data/societies_and_groups/designated_terrorist_groups/australia.json
rcorpora/inst/corpora/data/societies_and_groups/designated_terrorist_groups/china.json
rcorpora/inst/corpora/data/societies_and_groups/designated_terrorist_groups/kazakhstan.json
rcorpora/inst/corpora/data/societies_and_groups/designated_terrorist_groups/tunisia.json
rcorpora/inst/corpora/data/societies_and_groups/designated_terrorist_groups/european_union.json
rcorpora/inst/corpora/data/societies_and_groups/designated_terrorist_groups/saudi_arabia.json
rcorpora/inst/corpora/data/societies_and_groups/designated_terrorist_groups/united_states.json
rcorpora/inst/corpora/data/societies_and_groups/designated_terrorist_groups/united_kingdom.json
rcorpora/inst/corpora/data/societies_and_groups/designated_terrorist_groups/israel.json
rcorpora/inst/corpora/data/societies_and_groups/designated_terrorist_groups/ukraine.json
rcorpora/inst/corpora/data/societies_and_groups/designated_terrorist_groups/russia.json
rcorpora/inst/corpora/data/societies_and_groups/designated_terrorist_groups/canada.json
rcorpora/inst/corpora/data/societies_and_groups/designated_terrorist_groups/india.json
rcorpora/inst/corpora/data/societies_and_groups/designated_terrorist_groups/turkey.json
rcorpora/inst/corpora/data/mathematics
rcorpora/inst/corpora/data/mathematics/trigonometry.json
rcorpora/inst/corpora/data/mathematics/primes.json
rcorpora/inst/corpora/data/mathematics/fibonnaciSequence.json
rcorpora/inst/corpora/data/foods
rcorpora/inst/corpora/data/foods/wine_descriptions.json
rcorpora/inst/corpora/data/foods/menuItems.json
rcorpora/inst/corpora/data/foods/beer_categories.json
rcorpora/inst/corpora/data/foods/tea.json
rcorpora/inst/corpora/data/foods/curds.json
rcorpora/inst/corpora/data/foods/fruits.json
rcorpora/inst/corpora/data/foods/apple_cultivars.json
rcorpora/inst/corpora/data/foods/vegetables.json
rcorpora/inst/corpora/data/foods/beer_styles.json
rcorpora/inst/corpora/data/foods/condiments.json
rcorpora/inst/corpora/data/foods/sandwiches.json
rcorpora/inst/corpora/data/foods/hot_peppers.json
rcorpora/inst/corpora/data/foods/pizzaToppings.json
rcorpora/inst/corpora/data/foods/herbs_n_spices.json
rcorpora/inst/corpora/data/foods/breads_and_pastries.json
rcorpora/inst/corpora/data/foods/vegetable_cooking_times.json
rcorpora/inst/corpora/data/foods/combine.json
rcorpora/inst/corpora/data/medicine
rcorpora/inst/corpora/data/medicine/diagnoses.json
rcorpora/inst/corpora/data/medicine/drugs.json
rcorpora/inst/corpora/data/medicine/drugNameStems.json
rcorpora/inst/corpora/data/archetypes
rcorpora/inst/corpora/data/archetypes/character.json
rcorpora/inst/corpora/data/archetypes/artifact.json
rcorpora/inst/corpora/data/archetypes/setting.json
rcorpora/inst/corpora/data/archetypes/event.json
rcorpora/inst/corpora/data/mythology
rcorpora/inst/corpora/data/mythology/hebrew_god.json
rcorpora/inst/corpora/data/mythology/greek_titans.json
rcorpora/inst/corpora/data/mythology/greek_monsters.json
rcorpora/inst/corpora/data/mythology/lovecraft.json
rcorpora/inst/corpora/data/mythology/greek_gods.json
rcorpora/inst/corpora/data/mythology/monsters.json
rcorpora/inst/corpora/data/mythology/norse_gods.json
rcorpora/inst/corpora/README.md
rcorpora/inst/corpora/package.json
rcorpora/inst/corpora/Gruntfile.js
rcorpora/inst/README.markdown
rcorpora/inst/NEWS.md
rcorpora/NAMESPACE
rcorpora/R
rcorpora/R/corpora.R
rcorpora/MD5
rcorpora/build
rcorpora/build/rcorpora.pdf
rcorpora/DESCRIPTION
rcorpora/man
rcorpora/man/categories.Rd
rcorpora/man/corpora.Rd