In wri/retrieveR: Automate information retrieval and analysis of documents

knitr::opts_chunk$set(echo = TRUE)

Info

This vignette walks the user through applying the neural embedding NLP approach to a novel set of PDF documents. We use a sample corpus of eight peer-reviewed academic journal articles about restoration.

Installation instructions

R can be downloaded from this link. Once it is downloaded, open up the 32-bit version (i386, as WRI computers only seem to have 32-bit version of Java). Then, you can proceed to installing the package by running the following lines of code. Copy and paste them one at a time and press enter.

install.packages("devtools")
library(devtools)
install_github("wri/retrieveR")

Downloading data

Next, we load up the package into R using library. Depending on your operating system, you then need to run either install_mac or install_windows - these functions will get the Java dependencies to extract text from images, as well as install the necessary components to run neural networks.

Finally, the download_example function will download the example PDFs.

library(retrieveR)
install_mac()
install_windows()
download_example()

Prepping documents for querying

The prep_documents function will strip text from the PDFs, clean up the results, and calculate neural weights. These can be turned off by specifying ocr = F, clean = F, or weights = F. The function takes a path to the folder of documents - in this case they are stored in a folder called pdfs. This pathing is local to the directory that R is running in - this can be printed with getwd() and changed with setwd().

corpus <- prep_documents("pdfs")

Querying documents for paragraphs related to land tenure

The create_report function takes the following arguments:

query: Query phrase within quotations.
data: name that the output of prep_documents is stored to.

create_report(query = "food water waste wastewater reuse", data = corpus)

create_report(query="land tenure", data = corpus, interactive = F, thresh = 0.51)

Results

The results of the create_report function are stored in an html file in the working directory. I have included the results within this file for ease of example.

htmltools::includeHTML("land_tenure.html")

wri/retrieveR documentation built on July 23, 2019, 11:54 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

wri/retrieveR
Automate information retrieval and analysis of documents

In wri/retrieveR: Automate information retrieval and analysis of documents

Info

Installation instructions

Downloading data

Prepping documents for querying

Querying documents for paragraphs related to land tenure

Results

R Package Documentation

Browse R Packages

We want your feedback!

wri/retrieveR Automate information retrieval and analysis of documents

In wri/retrieveR: Automate information retrieval and analysis of documents

Info

Installation instructions

Downloading data

Prepping documents for querying

Querying documents for paragraphs related to land tenure

Results

R Package Documentation

Browse R Packages

We want your feedback!

wri/retrieveR
Automate information retrieval and analysis of documents