ZipSource: ZIP File Source

Description Usage Arguments Details Value See Also Examples

View source: R/source.R

Description

Create a ZIP file source.

Usage

1
2
3
4
5
ZipSource(zipfile,
          pattern = NULL,
          recursive = FALSE,
          ignore.case = FALSE,
          mode = "text")

Arguments

zipfile

A character string with the full path name of a ZIP file.

pattern

an optional regular expression. Only file names in the ZIP file which match the regular expression will be returned.

recursive

logical. Should the listing recurse into directories?

ignore.case

logical. Should pattern-matching be case-insensitive?

mode

a character string specifying if and how files should be read in. Available modes are:

""

No read. In this case getElem and pGetElem only deliver URIs.

"binary"

Files are read in binary raw mode (via readBin).

"text"

Files are read as text (via readLines).

Details

A ZIP file source extracts a compressed ZIP file via unzip and interprets each file as a document.

Value

An object inheriting from ZipSource, SimpleSource, and Source.

See Also

Source for basic information on the source infrastructure employed by package tm.

Examples

1
2
3
4
5
6
zipfile <- tempfile()
files <- Sys.glob(file.path(system.file("texts", "txt", package = "tm"), "*"))
zip(zipfile, files)
zipfile <- paste0(zipfile, ".zip")
Corpus(ZipSource(zipfile, recursive = TRUE))[[1]]
file.remove(zipfile)

Example output

Loading required package: NLP
  adding: usr/lib/R/site-library/tm/texts/txt/ovid_1.txt (deflated 46%)
  adding: usr/lib/R/site-library/tm/texts/txt/ovid_2.txt (deflated 46%)
  adding: usr/lib/R/site-library/tm/texts/txt/ovid_3.txt (deflated 47%)
  adding: usr/lib/R/site-library/tm/texts/txt/ovid_4.txt (deflated 47%)
  adding: usr/lib/R/site-library/tm/texts/txt/ovid_5.txt (deflated 45%)
<<PlainTextDocument>>
Metadata:  7
Content:  chars: 676
[1] TRUE

tm documentation built on Nov. 18, 2020, 5:07 p.m.