ft_table: Collect metadata and text into a data.frame

Description Usage Arguments Details Examples

View source: R/ft_table.R

Description

Facilitates downstream processing with text mining packages by providing metadata and full text in a tidy data.frame format

Usage

1
ft_table(path = NULL, type = NULL, encoding = NULL, xml_extract_text = TRUE)

Arguments

path

a directory path, must exist

type

(character) type of files to get. Default is NULL which gets all types. Can be one of pdf, xml, or plain (file extensions: pdf, xml, and txt, respectively)

encoding

(character) encoding, if NULL we get it from getOption("encoding")

xml_extract_text

(logical) for XML, should we extract the text (TRUE) or return a string as XML (FALSE). Default: TRUE

Details

You can alternatively use readtext::readtext() or similar functions to achieve a similar outcome.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
## Not run: 
if (interactive()) {
## from a directory path
x <- ft_table()
x

## only xml
ft_table(type = "xml")

## only pdf
ft_table(type = "pdf")

## don't pull text out of xml, just give back the xml please
x <- ft_table(xml_extract_text = FALSE)
x
}
## End(Not run)

fulltext documentation built on June 12, 2021, 9:06 a.m.