URISource: Uniform Resource Identifier Source

Description Usage Arguments Details Value See Also Examples

View source: R/source.R

Description

Create a uniform resource identifier source.

Usage

1
URISource(x, encoding = "", mode = "text")

Arguments

x

A character vector of uniform resource identifiers (URIs.

encoding

A character string describing the current encoding. It is passed to iconv to convert the input to UTF-8.

mode

a character string specifying if and how URIs should be read in. Available modes are:

""

No read. In this case getElem and pGetElem only deliver URIs.

"binary"

URIs are read in binary raw mode (via readBin).

"text"

URIs are read as text (via readLines).

Details

A uniform resource identifier source interprets each URI as a document.

Value

An object inheriting from URISource, SimpleSource, and Source.

See Also

Source for basic information on the source infrastructure employed by package tm.

Encoding and iconv on encodings.

Examples

1
2
3
4
loremipsum <- system.file("texts", "loremipsum.txt", package = "tm")
ovid <- system.file("texts", "txt", "ovid_1.txt", package = "tm")
us <- URISource(sprintf("file://%s", c(loremipsum, ovid)))
inspect(VCorpus(us))

Example output

Loading required package: NLP
<<VCorpus>>
Metadata:  corpus specific: 0, document level (indexed): 0
Content:  documents: 2

[[1]]
<<PlainTextDocument>>
Metadata:  7
Content:  chars: 3163

[[2]]
<<PlainTextDocument>>
Metadata:  7
Content:  chars: 676

tm documentation built on April 7, 2021, 3:01 a.m.