corpora: Load a data set from the corpora package

Description Usage Arguments Details Value Data set categories Data sets Examples

Description

corpora is a collection of small corpora of interesting data for the creation of bots and similar stuff.

Usage

1

Arguments

which

The data set to load, a string. If not given, then all data sets in the package are listed.

category

If given, which must be missing, and the data sets in the given category are listed.

Details

This project is a collection of static corpora (plural of "corpus") that are potentially useful in the creation of weird internet stuff. I've found that, as a creator, sometimes I am making something that needs access to a lot of adjectives, but not necessarily every adjective in the English language. So for the last year I've been copy/pasting an adjs.json file from project to project. This is kind of awful, so I'm hoping that this project will at least help me keep everything in one place.

I would like this to help with rapid prototyping of projects. For example: you might use nouns.json to start with, just to see if an idea you had was any good. Once you've built the project quickly around the nouns collection, you can then rip it out and replace it with a more complex or exhaustive data source.

I'm also hoping that this can be used as a teaching tool: maybe someone has three hours to teach how to make Twitter bots. That doesn't give the student much time to find/scrape/clean/parse interesting data. My hope is that students can be pointed to this project and they can pick and choose different interesting data sources to meld together for the creation of prototypes.

See https://github.com/dariusk/corpora

Value

A data frame containing the data set (if which is given), or a character vector of data set names.

Data set categories

\Sexpr[results=rd]{rcorpora:::corpora_manual_1()}

Data sets

\Sexpr[results=rd]{rcorpora:::corpora_manual_2()}

Examples

1
2
3
corpora()
corpora(category = "animals")
corpora("foods/pizzaToppings")

Example output

  [1] "animals/birds_antarctica"                                       
  [2] "animals/birds_north_america"                                    
  [3] "animals/birds_uk"                                               
  [4] "animals/common"                                                 
  [5] "animals/dinosaurs"                                              
  [6] "animals/dogs"                                                   
  [7] "archetypes/artifact"                                            
  [8] "archetypes/character"                                           
  [9] "archetypes/event"                                               
 [10] "archetypes/setting"                                             
 [11] "architecture/passages"                                          
 [12] "architecture/rooms"                                             
 [13] "art/isms"                                                       
 [14] "colors/crayola"                                                 
 [15] "colors/paints"                                                  
 [16] "colors/web_colors"                                              
 [17] "corporations/cars"                                              
 [18] "corporations/djia"                                              
 [19] "corporations/fortune500"                                        
 [20] "corporations/industries"                                        
 [21] "corporations/nasdaq"                                            
 [22] "corporations/newspapers"                                        
 [23] "divination/tarot_interpretations"                               
 [24] "film-tv/tv_shows"                                               
 [25] "foods/apple_cultivars"                                          
 [26] "foods/beer_categories"                                          
 [27] "foods/beer_styles"                                              
 [28] "foods/breads_and_pastries"                                      
 [29] "foods/combine"                                                  
 [30] "foods/condiments"                                               
 [31] "foods/curds"                                                    
 [32] "foods/fruits"                                                   
 [33] "foods/herbs_n_spices"                                           
 [34] "foods/hot_peppers"                                              
 [35] "foods/menuItems"                                                
 [36] "foods/pizzaToppings"                                            
 [37] "foods/sandwiches"                                               
 [38] "foods/tea"                                                      
 [39] "foods/vegetable_cooking_times"                                  
 [40] "foods/vegetables"                                               
 [41] "foods/wine_descriptions"                                        
 [42] "games/bannedGames/argentina/bannedList"                         
 [43] "games/bannedGames/brazil/bannedList"                            
 [44] "games/bannedGames/china/bannedList"                             
 [45] "games/bannedGames/denmark/bannedList"                           
 [46] "games/cluedo"                                                   
 [47] "games/dark_souls_iii_messages"                                  
 [48] "games/jeopardy_questions"                                       
 [49] "games/pokemon"                                                  
 [50] "games/scrabble"                                                 
 [51] "games/street_fighter_ii"                                        
 [52] "games/trivial_pursuit"                                          
 [53] "games/wrestling_moves"                                          
 [54] "geography/canada_provinces_and_territories"                     
 [55] "geography/countries"                                            
 [56] "geography/english_towns_cities"                                 
 [57] "geography/london_underground_stations"                          
 [58] "geography/oceans"                                               
 [59] "geography/rivers"                                               
 [60] "geography/us_cities"                                            
 [61] "geography/venues"                                               
 [62] "governments/nsa_projects"                                       
 [63] "governments/uk_political_parties"                               
 [64] "governments/us_federal_agencies"                                
 [65] "governments/us_mil_operations"                                  
 [66] "humans/authors"                                                 
 [67] "humans/bodyParts"                                               
 [68] "humans/britishActors"                                           
 [69] "humans/englishHonorifics"                                       
 [70] "humans/famousDuos"                                              
 [71] "humans/firstNames"                                              
 [72] "humans/lastNames"                                               
 [73] "humans/moods"                                                   
 [74] "humans/occupations"                                             
 [75] "humans/prefixes"                                                
 [76] "humans/richpeople"                                              
 [77] "humans/scientists"                                              
 [78] "humans/spanishFirstNames"                                       
 [79] "humans/spanishLastNames"                                        
 [80] "humans/spinalTapDrummers"                                       
 [81] "humans/suffixes"                                                
 [82] "humans/us_presidents"                                           
 [83] "humans/wrestlers"                                               
 [84] "instructions/laundry_care"                                      
 [85] "materials/abridged-body-fluids"                                 
 [86] "materials/building-materials"                                   
 [87] "materials/carbon-allotropes"                                    
 [88] "materials/decorative-stones"                                    
 [89] "materials/fabrics"                                              
 [90] "materials/fibers"                                               
 [91] "materials/gemstones"                                            
 [92] "materials/layperson-metals"                                     
 [93] "materials/metals"                                               
 [94] "materials/natural-materials"                                    
 [95] "materials/packaging"                                            
 [96] "materials/plastic-brands"                                       
 [97] "materials/sculpture-materials"                                  
 [98] "materials/technical-fabrics"                                    
 [99] "mathematics/fibonnaciSequence"                                  
[100] "mathematics/primes"                                             
[101] "mathematics/trigonometry"                                       
[102] "medicine/diagnoses"                                             
[103] "medicine/drugNameStems"                                         
[104] "medicine/drugs"                                                 
[105] "music/bands_that_have_opened_for_tool"                          
[106] "music/genres"                                                   
[107] "music/mtv_day_one"                                              
[108] "music/rock_hall_of_fame"                                        
[109] "mythology/greek_gods"                                           
[110] "mythology/greek_monsters"                                       
[111] "mythology/greek_titans"                                         
[112] "mythology/hebrew_god"                                           
[113] "mythology/lovecraft"                                            
[114] "mythology/monsters"                                             
[115] "mythology/norse_gods"                                           
[116] "objects/objects"                                                
[117] "plants/cannabis"                                                
[118] "plants/flowers"                                                 
[119] "religion/christian_saints"                                      
[120] "religion/fictional_religions"                                   
[121] "religion/parody_religions"                                      
[122] "religion/religions"                                             
[123] "science/elements"                                               
[124] "science/hail_size"                                              
[125] "science/minor_planets"                                          
[126] "science/planets"                                                
[127] "science/pregnancy"                                              
[128] "science/toxic_chemicals"                                        
[129] "societies_and_groups/animal_welfare"                            
[130] "societies_and_groups/designated_terrorist_groups/australia"     
[131] "societies_and_groups/designated_terrorist_groups/canada"        
[132] "societies_and_groups/designated_terrorist_groups/china"         
[133] "societies_and_groups/designated_terrorist_groups/egypt"         
[134] "societies_and_groups/designated_terrorist_groups/european_union"
[135] "societies_and_groups/designated_terrorist_groups/india"         
[136] "societies_and_groups/designated_terrorist_groups/iran"          
[137] "societies_and_groups/designated_terrorist_groups/israel"        
[138] "societies_and_groups/designated_terrorist_groups/kazakhstan"    
[139] "societies_and_groups/designated_terrorist_groups/russia"        
[140] "societies_and_groups/designated_terrorist_groups/saudi_arabia"  
[141] "societies_and_groups/designated_terrorist_groups/tunisia"       
[142] "societies_and_groups/designated_terrorist_groups/turkey"        
[143] "societies_and_groups/designated_terrorist_groups/uae"           
[144] "societies_and_groups/designated_terrorist_groups/ukraine"       
[145] "societies_and_groups/designated_terrorist_groups/united_kingdom"
[146] "societies_and_groups/designated_terrorist_groups/united_nations"
[147] "societies_and_groups/designated_terrorist_groups/united_states" 
[148] "societies_and_groups/fraternities/coeducational_fraternities"   
[149] "societies_and_groups/fraternities/defunct"                      
[150] "societies_and_groups/fraternities/fraternities"                 
[151] "societies_and_groups/fraternities/professional"                 
[152] "societies_and_groups/fraternities/service"                      
[153] "societies_and_groups/fraternities/sororities"                   
[154] "societies_and_groups/semi_secret"                               
[155] "sports/nfl_teams"                                               
[156] "technology/appliances"                                          
[157] "technology/computer_sciences"                                   
[158] "technology/fireworks"                                           
[159] "technology/guns_n_rifles"                                       
[160] "technology/knots"                                               
[161] "technology/lisp"                                                
[162] "technology/new_technologies"                                    
[163] "technology/photo_sharing_websites"                              
[164] "technology/programming_languages"                               
[165] "technology/social_networking_websites"                          
[166] "technology/video_hosting_websites"                              
[167] "words/adjs"                                                     
[168] "words/adverbs"                                                  
[169] "words/closed_pairs"                                             
[170] "words/common"                                                   
[171] "words/crash_blossoms"                                           
[172] "words/eggcorns"                                                 
[173] "words/emoji/cute_kaomoji"                                       
[174] "words/emoji/positive_emoji"                                     
[175] "words/emoji/sea_emoji"                                          
[176] "words/encouraging_words"                                        
[177] "words/interjections"                                            
[178] "words/literature/mr_men_little_miss"                            
[179] "words/literature/shakespeare_phrases"                           
[180] "words/literature/shakespeare_sonnets"                           
[181] "words/literature/shakespeare_words"                             
[182] "words/nouns"                                                    
[183] "words/oprah_quotes"                                             
[184] "words/personal_nouns"                                           
[185] "words/prefix_root_suffix"                                       
[186] "words/proverbs"                                                 
[187] "words/resume_action_words"                                      
[188] "words/rhymeless_words"                                          
[189] "words/spells"                                                   
[190] "words/states_of_drunkenness"                                    
[191] "words/stopwords/ar"                                             
[192] "words/stopwords/bg"                                             
[193] "words/stopwords/cs"                                             
[194] "words/stopwords/da"                                             
[195] "words/stopwords/de"                                             
[196] "words/stopwords/en"                                             
[197] "words/stopwords/es"                                             
[198] "words/stopwords/fi"                                             
[199] "words/stopwords/fr"                                             
[200] "words/stopwords/gr"                                             
[201] "words/stopwords/it"                                             
[202] "words/stopwords/jp"                                             
[203] "words/stopwords/lv"                                             
[204] "words/stopwords/nl"                                             
[205] "words/stopwords/no"                                             
[206] "words/stopwords/pl"                                             
[207] "words/stopwords/pt"                                             
[208] "words/stopwords/ru"                                             
[209] "words/stopwords/sk"                                             
[210] "words/stopwords/sv"                                             
[211] "words/stopwords/tr"                                             
[212] "words/us_president_quotes"                                      
[213] "words/verbs"                                                    
[214] "words/word_clues/clues_five"                                    
[215] "words/word_clues/clues_four"                                    
[216] "words/word_clues/clues_six"                                     
[1] "birds_antarctica"    "birds_north_america" "birds_uk"           
[4] "common"              "dinosaurs"           "dogs"               
$description
[1] "A list of pizza toppings."

$pizzaToppings
 [1] "anchovies"        "artichoke"        "bacon"            "breakfast bacon" 
 [5] "Canadian bacon"   "cheese"           "chicken"          "chili peppers"   
 [9] "feta"             "garlic"           "green peppers"    "grilled onions"  
[13] "ground beef"      "ham"              "hot sauce"        "meatballs"       
[17] "mushrooms"        "olives"           "onions"           "pepperoni"       
[21] "pineapple"        "sausage"          "spinach"          "sun-dried tomato"
[25] "tomatoes"        

rcorpora documentation built on May 2, 2019, 7:23 a.m.