Home

/

GitHub

/

trinker/wikidata

/

README.md

README.md
In trinker/wikidata: Wikipedia Noun Articles Data

wikidata

wikidata is a corpus (data.frame format) of Wikipedia articles scraped from a noun list search.

Variables

noun: The noun used to search for the Wikipedia article.
text: The text from the article.
element_id: The article number.
sentence_id: The sentence within the element_id.

noun text element_id sentence_id Bezique The game is derived from Piquet, possibly via Marriage (Sixty-six) and Briscan, with additional scor... 1781 2 Chromium Several in vitro studies indicated that high concentrations of chromium(III) in the cell can lead to... 3310 210 Dengue In severe disease, plasma leakage results in hemoconcentration (as indicated by a rising hematocrit ... 4611 113 Pigment wherein they have created plastic swatches on website by 3D modelling to including various special e... 12764 136 Rutabaga The roots and tops are used as winter feed for livestock. 14368 76 Toxicologist Toxicologists perform many more duties including research in the academic, nonprofit and industrial ... 16861 42 Vinery However, in the late 19th century, the entire species was nearly destroyed by the plant louse phyllo... 17578 10

Lookup

if (!require("pacman")) install.packages("pacman")
pacman::p_load_gh('trinker/wikidata')
pacman::p_load(data.table, stringdist)


wiki_look <- function(x, method = 'osa', cutoff = 3, ...){
    vals <- stringdist::stringdist(
        SnowballC::wordStem(tolower(wikipedia[["noun"]])), 
        SnowballC::wordStem(tolower(x)), 
        method
    )
    if (min(vals) > cutoff) stop('No nouns meet the cutoff')
    word <- as.character(wikipedia[which.min(vals), "noun", with=FALSE])
    wikipedia[noun %in% word,]
}

wiki_look('dog')
wiki_look('dog')[, .(text = paste(text, collapse = " ")), by = c('noun')]

Installation

To download the development version of wikidata:

Download the zip ball or tar ball, decompress and run R CMD INSTALL on it, or use the pacman package to install the development version:

if (!require("pacman")) install.packages("pacman")
pacman::p_load_gh("trinker/wikidata")

Contact

You are welcome to: submit suggestions and bug-reports at: https://github.com/trinker/wikidata/issues send a pull request on: https://github.com/trinker/wikidata/ *

compose a friendly e-mail to: tyler.rinker@gmail.com

trinker/wikidata documentation built on June 1, 2019, 1:49 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

trinker/wikidata
Wikipedia Noun Articles Data

README.md
In trinker/wikidata: Wikipedia Noun Articles Data

wikidata

Table of Contents

Variables

Sample Slice

Lookup

Installation

Contact

R Package Documentation

Browse R Packages

We want your feedback!

trinker/wikidata Wikipedia Noun Articles Data

README.md In trinker/wikidata: Wikipedia Noun Articles Data

wikidata

Table of Contents

Variables

Sample Slice

Lookup

Installation

Contact

R Package Documentation

Browse R Packages

We want your feedback!

trinker/wikidata
Wikipedia Noun Articles Data

README.md
In trinker/wikidata: Wikipedia Noun Articles Data