PCorpus: Permanent Corpora

Description Usage Arguments Details Value See Also Examples

View source: R/corpus.R

Description

Create permanent corpora.

Usage

1
2
3
PCorpus(x,
        readerControl = list(reader = reader(x), language = "en"),
        dbControl = list(dbName = "", dbType = "DB1"))

Arguments

x

A Source object.

readerControl

a named list of control parameters for reading in content from x.

reader

a function capable of reading in and processing the format delivered by x.

language

a character giving the language (preferably as IETF language tags, see language in package NLP). The default language is assumed to be English ("en").

dbControl

a named list of control parameters for the underlying database storage provided by package filehash.

dbName

a character giving the filename for the database.

dbType

a character giving the database format (see filehashOption for possible database formats).

Details

A permanent corpus stores documents outside of R in a database. Since multiple PCorpus R objects with the same underlying database can exist simultaneously in memory, changes in one get propagated to all corresponding objects (in contrast to the default R semantics).

Value

An object inheriting from PCorpus and Corpus.

See Also

Corpus for basic information on the corpus infrastructure employed by package tm.

VCorpus provides an implementation with volatile storage semantics.

Examples

1
2
3
4
5
txt <- system.file("texts", "txt", package = "tm")
## Not run: 
PCorpus(DirSource(txt),
        dbControl = list(dbName = "pcorpus.db", dbType = "DB1"))
## End(Not run)

Example output

Loading required package: NLP
<<PCorpus>>
Metadata:  corpus specific: 0, document level (indexed): 0
Content:  documents: 5

tm documentation built on July 11, 2020, 3 a.m.