wikidata is a corpus (data.frame
format) of
Wikipedia articles scraped from a noun
list search.
noun
: The noun used to search for the Wikipedia article.text
: The text from the article.element_id
: The article number.sentence_id
: The sentence within the element_id
.if (!require("pacman")) install.packages("pacman")
pacman::p_load_gh('trinker/wikidata')
pacman::p_load(data.table, stringdist)
wiki_look <- function(x, method = 'osa', cutoff = 3, ...){
vals <- stringdist::stringdist(
SnowballC::wordStem(tolower(wikipedia[["noun"]])),
SnowballC::wordStem(tolower(x)),
method
)
if (min(vals) > cutoff) stop('No nouns meet the cutoff')
word <- as.character(wikipedia[which.min(vals), "noun", with=FALSE])
wikipedia[noun %in% word,]
}
wiki_look('dog')
wiki_look('dog')[, .(text = paste(text, collapse = " ")), by = c('noun')]
To download the development version of wikidata:
Download the zip
ball or tar
ball, decompress
and run R CMD INSTALL
on it, or use the pacman package to install
the development version:
if (!require("pacman")) install.packages("pacman")
pacman::p_load_gh("trinker/wikidata")
You are welcome to: submit suggestions and bug-reports at: https://github.com/trinker/wikidata/issues send a pull request on: https://github.com/trinker/wikidata/ *
compose a friendly e-mail to: tyler.rinker@gmail.com
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.