extract_text: Extract text from PDFs.

Description Usage Arguments Value Author(s) Examples

Description

Scrap text from PDFs.

Usage

1
2
3
4
5
extract_text(files=".",
              word.length.min=4,
              word.length.max=Inf,
              freq.min=10,
              freq.max=Inf)

Arguments

files

Either the name of a file (ending with ".pdf"), a directory or nothing to scrap all the PDFs from the directory.

word.length.min

Keep only words with minimum length x.

word.length.max

Keep only words with maximum length x.

freq.min

Keep only words that appear more than x times.

freq.max

Keep only words that appear less than x times.

Value

data

A dataframe containing two columns for words and their frequency.

Author(s)

Dominique Makowski

Examples

1
2
3
require(neuropsychology)

# text <- extract_text() # In a folder containg some PDFs.

Example output

Loading required package: tidyverse
Loading tidyverse: ggplot2
Loading tidyverse: tibble
Loading tidyverse: tidyr
Loading tidyverse: readr
Loading tidyverse: purrr
Loading tidyverse: dplyr
Conflicts with tidy packages ---------------------------------------------------
filter(): dplyr, stats
lag():    dplyr, stats
************
Welcome to neuropsychology v0.5.0 (c) Dominique Makowski.
See documentation on https://www.rdocumentation.org/packages/neuropsychology
Do not hesitate to create an issue on https://github.com/neuropsychology/neuropsychology.R/issues with questions, comments, or movie recommendations.
************

neuropsychology documentation built on May 2, 2019, 2:13 p.m.