desc <- suppressWarnings(readLines("DESCRIPTION"))
regex <- "(^Version:\\s+)(\\d+\\.\\d+\\.\\d+)"
loc <- grep(regex, desc)
ver <- gsub(regex, "\\2", desc[loc])
verbadge <- sprintf('<a href="https://img.shields.io/badge/Version-%s-orange.svg"><img src="https://img.shields.io/badge/Version-%s-orange.svg" alt="Version"/></a></p>', ver, ver)
````

[![Build Status](https://travis-ci.org/trinker/wikidata.svg?branch=master)](https://travis-ci.org/trinker/wikidata)
[![Coverage Status](https://coveralls.io/repos/trinker/wikidata/badge.svg?branch=master)](https://coveralls.io/r/trinker/wikidata?branch=master)
`r verbadge`

<img src="inst/wikidata_logo/wiki.png" width="250" alt="wikipedia data">  


**wikidata** is a corpus (`data.frame` format) of [Wikipedia](https://www.wikipedia.org/) articles scraped from a noun list search.

## Variables


* `noun`: The noun used to search for the Wikipedia article.
* `text`:  The text from the article.
* `element_id`:  The article number.
* `sentence_id`:  The sentence within the `element_id`.

## Sample Slice

```r
set.seed(21)
qdap::htruncdf(data.frame(wikidata::wikipedia[sort(sample(1:nrow(wikidata::wikipedia), 7)),]), 7, 100) %>%
    dplyr::mutate(text = ifelse(grepl("\\.$", text), as.character(text), paste0(text, "..."))) %>%
    pander::pander(justify = c('left', 'left', 'right', 'right'))

Lookup

if (!require("pacman")) install.packages("pacman")
pacman::p_load_gh('trinker/wikidata')
pacman::p_load(data.table, stringdist)


wiki_look <- function(x, method = 'osa', cutoff = 3, ...){
    vals <- stringdist::stringdist(
        SnowballC::wordStem(tolower(wikipedia[["noun"]])), 
        SnowballC::wordStem(tolower(x)), 
        method
    )
    if (min(vals) > cutoff) stop('No nouns meet the cutoff')
    word <- as.character(wikipedia[which.min(vals), "noun", with=FALSE])
    wikipedia[noun %in% word,]
}

wiki_look('dog')
wiki_look('dog')[, .(text = paste(text, collapse = " ")), by = c('noun')]

Installation

To download the development version of wikidata:

Download the zip ball or tar ball, decompress and run R CMD INSTALL on it, or use the pacman package to install the development version:

if (!require("pacman")) install.packages("pacman")
pacman::p_load_gh("trinker/wikidata")

Contact

You are welcome to:
submit suggestions and bug-reports at: https://github.com/trinker/wikidata/issues
send a pull request on: https://github.com/trinker/wikidata/ * compose a friendly e-mail to: tyler.rinker@gmail.com



trinker/wikidata documentation built on June 1, 2019, 1:49 a.m.