tCorpus-cash-merge: Merge the token and meta data.tables of a tCorpus with...
In corpustools: Managing, Querying and Analyzing Tokenized Text

tCorpus$merge

R Documentation

Merge the token and meta data.tables of a tCorpus with another data.frame

Description

Add columns to token/meta by merging with a data.frame df. Only possible for unique matches (i.e. the columns specified in by are unique in df)

Arguments

`df`	A data.frame (can be regular, data.table or tibble)
`by`	The columns to match on. Must exist in both tokens/meta and df. If the columns in tokens/meta and df have different names, use by.x and by.y
`by.x`	The names of the columns used in tokens/meta
`by.y`	The names of the columns used in df
`columns`	Optionally, specify which specific columns from df to merge to tokens

Details

Usage:

## R6 method for class tCorpus. Use as tc$method (where tc is a tCorpus object).

merge(df, by, by.x, by.y)

merge_meta(df, by, by.x, by.y)

Examples

d = data.frame(text = c('This is an example. Best example ever.', 'oh my god', 'so good'),
               id = c('a','b','c'),
               source  =c('aa','bb','cc'))
tc = create_tcorpus(d, doc_col='id', split_sentences = TRUE)

df = data.frame(doc_id=c('a','b'), test=c('A','B'))
tc$merge(df, by='doc_id')
tc$tokens

df = data.frame(doc_id=c('a','b'), sentence=1, test2=c('A','B'))
tc$merge(df, by=c('doc_id', 'sentence'))
tc$tokens

df = data.frame(doc_id=c('a','b'), sentence=1, token_id=c(3,4), test3=c('A','B'))
tc$merge(df, by=c('doc_id', 'sentence', 'token_id'))
tc$tokens

meta = data.frame(doc_id=c('a','b'), test=c('A','B'))
tc$merge_meta(meta, by='doc_id')
tc$meta

meta = data.frame(source=c('aa'), test2=c('A'))
tc$merge_meta(meta, by='source')
tc$meta

corpustools documentation built on Aug. 8, 2025, 6:08 p.m.