README.md

papieRmache

An R package for chewing up papers, spitting out the information you don't want, keeping the information you do.

To install:

install.packages("devtools") # if you have not installed "devtools" package
devtools::install_github("ajhelmstetter/papieRmache")
help(package=papieRmache)

Before running papieRmache, PDFs should be converted to text files using pdftotext.

There's less chance for annoying errors if special characters are removed from PDF names prior to conversion/papieRmache:

#example to replace various characters with underscores
rename 's/[\.,-]/_/g' *
rename 's/[\"]//g' *
rename "s/\'//g" *
rename 's/ /_/g' *
rename 's/_+/_/g' *
rename 's/\(//g' *
rename 's/\)//g' *

#fix file suffix
rename 's/_pdf/\.pdf/g' *

PDFs can be converted rapidly, in batch, as follows :

#run in folder containing PDFs of interest
ls -1 ./ | \
while read sample; do
    pdftotext $sample
done

Example papieRmache output



ajhelmstetter/papieRmache documentation built on March 30, 2024, 9:22 p.m.