DirSource: Directory Source

Description Usage Arguments Details Value See Also Examples

View source: R/source.R

Description

Create a directory source.

Usage

1
2
3
4
5
6
DirSource(directory = ".",
          encoding = "",
          pattern = NULL,
          recursive = FALSE,
          ignore.case = FALSE,
          mode = "text")

Arguments

directory

A character vector of full path names; the default corresponds to the working directory getwd().

encoding

a character string describing the current encoding. It is passed to iconv to convert the input to UTF-8.

pattern

an optional regular expression. Only file names which match the regular expression will be returned.

recursive

logical. Should the listing recurse into directories?

ignore.case

logical. Should pattern-matching be case-insensitive?

mode

a character string specifying if and how files should be read in. Available modes are:

""

No read. In this case getElem and pGetElem only deliver URIs.

"binary"

Files are read in binary raw mode (via readBin).

"text"

Files are read as text (via readLines).

Details

A directory source acquires a list of files via dir and interprets each file as a document.

Value

An object inheriting from DirSource, SimpleSource, and Source.

See Also

Source for basic information on the source infrastructure employed by package tm.

Encoding and iconv on encodings.

Examples

1
DirSource(system.file("texts", "txt", package = "tm"))

Example output

Loading required package: NLP
$encoding
[1] ""

$length
[1] 5

$position
[1] 0

$reader
function (elem, language, id) 
{
    if (!is.null(elem$uri)) 
        id <- basename(elem$uri)
    PlainTextDocument(elem$content, id = id, language = language)
}
<environment: namespace:tm>

$mode
[1] "text"

$filelist
[1] "/usr/lib/R/site-library/tm/texts/txt/ovid_1.txt"
[2] "/usr/lib/R/site-library/tm/texts/txt/ovid_2.txt"
[3] "/usr/lib/R/site-library/tm/texts/txt/ovid_3.txt"
[4] "/usr/lib/R/site-library/tm/texts/txt/ovid_4.txt"
[5] "/usr/lib/R/site-library/tm/texts/txt/ovid_5.txt"

attr(,"class")
[1] "DirSource"    "SimpleSource" "Source"      

tm documentation built on July 12, 2020, 3 p.m.