PCorpus: Permanent Corpora

View source: R/corpus.R

PCorpusR Documentation

Permanent Corpora

Description

Create permanent corpora.

Usage

PCorpus(x,
        readerControl = list(reader = reader(x), language = "en"),
        dbControl = list(dbName = "", dbType = "DB1"))

Arguments

x

A Source object.

readerControl

a named list of control parameters for reading in content from x.

reader

a function capable of reading in and processing the format delivered by x.

language

a character giving the language (preferably as IETF language tags, see language in package NLP). The default language is assumed to be English ("en").

dbControl

a named list of control parameters for the underlying database storage provided by package filehash.

dbName

a character giving the filename for the database.

dbType

a character giving the database format (see filehashOption for possible database formats).

Details

A permanent corpus stores documents outside of R in a database. Since multiple PCorpus R objects with the same underlying database can exist simultaneously in memory, changes in one get propagated to all corresponding objects (in contrast to the default R semantics).

Value

An object inheriting from PCorpus and Corpus.

See Also

Corpus for basic information on the corpus infrastructure employed by package tm.

VCorpus provides an implementation with volatile storage semantics.

Examples

txt <- system.file("texts", "txt", package = "tm")
## Not run: 
PCorpus(DirSource(txt),
        dbControl = list(dbName = "pcorpus.db", dbType = "DB1"))
## End(Not run)

tm documentation built on Sept. 11, 2024, 6:47 p.m.