README.md

scrapR

R package to extract data from PDF figures

Installation

The easiest way to install the development version of scrapR is to use the devtools package:

# install.packages("devtools")
library(devtools)
install_github("adamkucharski/scrapR")
library(scrapR)

# load dependencies
# install.packages("readr")
# install.packages("grImport")
# install.packages("magrittr")
library(grImport)
library(readr)

Note that the dependency grImport requires the ghostscript PDF interpreter to be installed. You can check which version you have installed (if any) by running $ gs -v on the command line. If required, installation can be done via homebrew with $ brew install ghostscript.

Example

First you need a figure to extract data from. If you want a simple test figure, you can run:

simulate_PDF_data()

to generate a simulated set of lines and output as figure1.pdf.

Next, navigate to the directory containing your PDF figure and import the data:

load_PDF_data(file_name="figure1.pdf")

This will output a raw RDS file and a figure ([FIGURENAME].guide.pdf) with the different vector components labelled with numbers.

If the data fails to import, it's probably because the vector graphic has too many surrounding features. In this case, use an editor like Affinity/Illustrator etc. to delete unnecessary surrounding content, making sure to leave the lines with data you want and at least four tick marks (2 on x-axis, 2 on y-axis), which will be used to calibrate the scale.

Once you've run load_PDF_data(), edit/create [FIGURE NAME].guide.csv so numbers match up with two x-axis tick marks and two y-axis tick marks, and specify which data you want to extract:

point | value | axis ------------- | ------------- | ------------- 5 | 5 | x 10 | 30 | x 13 | 200 | y 16 | 800 | y 2 | NA | data 18 | NA | data

Then extract the data using the RDS file and guide CSV file.

extract_PDF_data(file_name = "figure1.pdf")

The resulting data for the line(s) will be output as [FIGURENAME].csv, with each line grouped by index. The above function also has an option to adjust for x and/or y axes on logarithmic scale.



adamkucharski/scrapR documentation built on Feb. 4, 2024, 11:37 a.m.