README.md

wikidata

Build
Status Coverage
Status Version

wikipedia data

wikidata is a corpus (data.frame format) of Wikipedia articles scraped from a noun list search.

Table of Contents

Variables

Sample Slice

noun text element_id sentence_id Bezique The game is derived from Piquet, possibly via Marriage (Sixty-six) and Briscan, with additional scor... 1781 2 Chromium Several in vitro studies indicated that high concentrations of chromium(III) in the cell can lead to... 3310 210 Dengue In severe disease, plasma leakage results in hemoconcentration (as indicated by a rising hematocrit ... 4611 113 Pigment wherein they have created plastic swatches on website by 3D modelling to including various special e... 12764 136 Rutabaga The roots and tops are used as winter feed for livestock. 14368 76 Toxicologist Toxicologists perform many more duties including research in the academic, nonprofit and industrial ... 16861 42 Vinery However, in the late 19th century, the entire species was nearly destroyed by the plant louse phyllo... 17578 10

Lookup

if (!require("pacman")) install.packages("pacman")
pacman::p_load_gh('trinker/wikidata')
pacman::p_load(data.table, stringdist)


wiki_look <- function(x, method = 'osa', cutoff = 3, ...){
    vals <- stringdist::stringdist(
        SnowballC::wordStem(tolower(wikipedia[["noun"]])), 
        SnowballC::wordStem(tolower(x)), 
        method
    )
    if (min(vals) > cutoff) stop('No nouns meet the cutoff')
    word <- as.character(wikipedia[which.min(vals), "noun", with=FALSE])
    wikipedia[noun %in% word,]
}

wiki_look('dog')
wiki_look('dog')[, .(text = paste(text, collapse = " ")), by = c('noun')]

Installation

To download the development version of wikidata:

Download the zip ball or tar ball, decompress and run R CMD INSTALL on it, or use the pacman package to install the development version:

if (!require("pacman")) install.packages("pacman")
pacman::p_load_gh("trinker/wikidata")

Contact

You are welcome to: submit suggestions and bug-reports at: https://github.com/trinker/wikidata/issues send a pull request on: https://github.com/trinker/wikidata/ *

compose a friendly e-mail to: tyler.rinker@gmail.com



trinker/wikidata documentation built on June 1, 2019, 1:49 a.m.