annotate_nodes: Annotate a tokenlist based on rsyntax queries

Description Usage Arguments Details

View source: R/annotate.r


Apply queries to extract syntax patterns, and add the results as two columns to a tokenlist. One column contains the ids for each hit. The other column contains the annotations. Only nodes that are given a name in the tquery (using the 'save' parameter) will be added as annotation.


annotate_nodes(tokens, nodes, column, rm_dup = T, fill = F,
  fill_block = NULL, check = F, with_tquery = F, show_fill = F,
  concat_dup = T)



A tokenIndex data.table, created with as_tokenindex, or any data.frame with the required columns (see tokenindex_columns).


A data.table, as created with find_nodes or apply_queries. Can be a list of multiple data.tables.


The name of the column in which the annotations are added. The unique ids are added as [column]_id


If true (default), remove duplicate nodes (keeping the first match). Otherwise, rows in tokens will be repeated for each match. If the concat_dup argument is true (default), duplicate values will be concatenated. Otherwise, rows will be duplicated.


If TRUE, the children for each id are added recursively (children of children etc.). If this leads to duplicate ids (if an id in nodes is a child of another id in nodes), the most direct children are kept. For example, if 1 -> 2 -> 3, and both 1 and 2 are in 'nodes', then 3 is only added as a child of 2.


Optionally, another data.table of nodes (as created with find_nodes) or a list of data.tables, used to block the fill process. That is, the nodes in block and all their descendants are not used in fill.


For testing queries. If TRUE, give a warning if there are duplicates in the data (in which case duplicates are deleted)


For testing queries. If TRUE, add a column that shows the name of the specific tquery that was used. This only works if 'queries' is a named list.


see rm_dup arugment.


Note that while queries only find 1 node for each saved component of a pattern (e.g., quote queries have 1 node for "source" and 1 node for "quote"), all children of these nodes are also annotated (if fill is TRUE). If a child has multiple ancestors, only the most direct ancestors are used (see documentation for the fill argument).

vanatteveldt/rsyntax documentation built on Aug. 7, 2018, 1:31 a.m.