extract_text: Extract text from PDFs and HTMLs pages.

Description Usage Arguments Value Author(s) Examples

Description

Scrap text from PDFs.

Usage

1
2
3
4
5
6
extract_text(sources=".",
              type="pdf",
              word.length.min=4,
              word.length.max=Inf,
              freq.min=10,
              freq.max=Inf)

Arguments

sources

Either the name of a file (ending with ".pdf"), a directory, nothing to scrap all the PDFs of the current directory, a html link or a list of links.

type

"pdf" or "html".

word.length.min

Keep only words with minimum length x.

word.length.max

Keep only words with maximum length x.

freq.min

Keep only words that appear more than x times.

freq.max

Keep only words that appear less than x times.

Value

data

A dataframe containing two columns for words and their frequency.

Author(s)

Dominique Makowski

Examples

1
2
3
require(neuropsychology)

# text <- extract_text() # In a folder containg some PDFs.

neuropsychology/neuropsychology.R documentation built on May 23, 2019, 4:27 p.m.