tCorpus_modify_by_reference: Modify tCorpus by reference
In corpustools: Managing, Querying and Analyzing Tokenized Text

tCorpus_modify_by_reference

R Documentation

Modify tCorpus by reference

Description

(back to overview)

Details

If any tCorpus method is used that changes the corpus (e.g., set, subset), the change is made by reference. This is convenient when working with a large corpus, because it means that the corpus does not have to be copied when changes are made, which is slower and less memory efficient.

To illustrate, for a tCorpus object named 'tc', the subset method can be called like this:

tc$subset(doc_id %in% selection)

The 'tc' object itself is now modified, and does not have to be assigned to a name, as would be the more common R philosophy. Like this:

tc = tc$subset(doc_id %in% selection)

The results of both lines of code are the same. The assignment in the second approach is not necessary, but doesn't harm either because tc$subset returns the modified corpus invisibly (see ?invisible if that sounds spooky).

Be aware, however, that the following does not work!!

tc2 = tc$subset(doc_id %in% selection)

In this case, tc2 does contain the subsetted corpus, but tc itself will also be subsetted!!

Using the R6 method for subset forces this approach on you, because it is faster and more memory efficient. If you do want to make a copy, there are several solutions.

Firstly, for some methods we provide identical functions. For example, instead of the $subset() R6 method, we can use the subset() function.

tc2 = subset(tc, doc_id %in% selection)

We promise that only the R6 methods (called as tc$method()) will change the data by reference.

A second option is that R6 methods where copying is often usefull have copy parameter Modifying by reference only happens in the R6 methods

tc2 = tc$subset(doc_id %in% selection, copy=TRUE)

Finally, you can always make a deep copy of the entire tCorpus before modifying it, using the $copy() method.

tc2 = tc$copy()

corpustools documentation built on Aug. 8, 2025, 6:08 p.m.

corpustools index

README.md corpustools: Managing, Querying and Analyzing Tokenized Text

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

corpustools
Managing, Querying and Analyzing Tokenized Text

tCorpus_modify_by_reference: Modify tCorpus by reference
In corpustools: Managing, Querying and Analyzing Tokenized Text

Modify tCorpus by reference

Description

Details

Related to tCorpus_modify_by_reference in corpustools...

R Package Documentation

Browse R Packages

We want your feedback!

corpustools Managing, Querying and Analyzing Tokenized Text

tCorpus_modify_by_reference: Modify tCorpus by reference In corpustools: Managing, Querying and Analyzing Tokenized Text

Modify tCorpus by reference

Description

Details

Related to tCorpus_modify_by_reference in corpustools...

R Package Documentation

Browse R Packages

We want your feedback!

corpustools
Managing, Querying and Analyzing Tokenized Text

tCorpus_modify_by_reference: Modify tCorpus by reference
In corpustools: Managing, Querying and Analyzing Tokenized Text