revisions: Revisions of a Distributed Corpus

Description Usage Arguments Value Examples

Description

Each modification of the documents in the corpus results in a new stage, i.e., revision of the corpus. To allow fast switching between multiple revisions all modifications may be kept on the file system. The function setRevision() allows to go back to any stage in the history of the corpus. The function keepRevisions() shows if revisions are turned on or off; the corresponding replacement function is used to set the desired behavior.

Usage

1
2
3
4
5
getRevisions( corpus )
removeRevision( corpus, revision )
setRevision( corpus, revision )
keepRevisions( corpus )
`keepRevisions<-`( corpus, value )

Arguments

corpus

A distributed corpus of class DCorpus.

revision

The revision which is to be set as active or removed.

value

A logical indicating whether revisions should be kept or not.

Value

Whereas getRevisions() returns a list of character strings naming all available revisions, setRevision() returns the distributed corpus with the given revision marked as active. The function keepRevisions() returns a logical indicating whether revisions are used or not.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
## provide data on storage
data("crude")
dc <- as.DCorpus(crude)
## do some preprocessing
dc <- tm_map(dc, content_transformer(tolower))
## retrieve available revisions
revs <- getRevisions(dc)
revs
## go back to original revision
setRevision(dc, revs[2])
keepRevisions(dc)
keepRevisions(dc) <- FALSE

Example output

Loading required package: DSL
Loading required package: tm
Loading required package: NLP
[1] "DSL-20190719-142748-jnighunrcr" "DSL-20190719-142748-kpsdmhhmco"
[1] TRUE

tm.plugin.dc documentation built on Nov. 29, 2020, 5:07 p.m.