tCorpus$fold_rsyntax | R Documentation |
If a tCorpus has rsyntax annotations (see annotate_rsyntax
), it can be convenient to aggregate tokens that have a certain semantic label.
For example, if you have a query for labeling "source" and "quote", you can add an aggegated value for the sources (such as a unique ID) as a column, and then remove the quote tokens.
annotation |
The name of an rsyntax annotation column |
by_label |
The labels in this column for which you want to aggregate the tokens |
... |
Specify the new aggregated columns in name-value pairs. The name is the name of the new column, and the value should be a function over a column in $tokens. For example: subject = paste(token, collapse = ' ') would create the column 'subject', of which the values are the concatenated tokens. See examples for more. |
txt |
If TRUE, add _txt column with concatenated tokens for by_label |
rm_by |
If TRUE (default), remove the column(s) specified in by_label |
copy |
If TRUE, return a copy of the transformed tCorpus, instead of transforming the tCorpus by reference |
Usage:
## R6 method for class tCorpus. Use as tc$method (where tc is a tCorpus object).
fold_rsyntax(annotation, by_label, ..., to_label=NULL, rm_by=T, copy=F)
tc = tc_sotu_udpipe$copy()
tc$udpipe_clauses()
tc$fold_rsyntax('clause', by_label = 'subject', subject = paste(token, collapse=' '))
tc$tokens
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.