annotate_nodes: Annotate a tokenlist based on rsyntax queries

Description Usage Arguments Details

View source: R/annotate.r

Description

Apply queries to extract syntax patterns, and add the results as two columns to a tokenlist. One column contains the ids for each hit. The other column contains the annotations. Only nodes that are given a name in the tquery (using the 'save' parameter) will be added as annotation.

Usage

1
2
3
annotate_nodes(tokens, nodes, column, rm_dup = T, fill = F,
  fill_block = NULL, check = F, with_tquery = F, show_fill = F,
  concat_dup = T)

Arguments

tokens

A tokenIndex data.table, created with as_tokenindex, or any data.frame with the required columns (see tokenindex_columns).

nodes

A data.table, as created with find_nodes or apply_queries. Can be a list of multiple data.tables.

column

The name of the column in which the annotations are added. The unique ids are added as [column]_id

rm_dup

If true (default), remove duplicate nodes (keeping the first match). Otherwise, rows in tokens will be repeated for each match. If the concat_dup argument is true (default), duplicate values will be concatenated. Otherwise, rows will be duplicated.

fill

If TRUE, the children for each id are added recursively (children of children etc.). If this leads to duplicate ids (if an id in nodes is a child of another id in nodes), the most direct children are kept. For example, if 1 -> 2 -> 3, and both 1 and 2 are in 'nodes', then 3 is only added as a child of 2.

fill_block

Optionally, another data.table of nodes (as created with find_nodes) or a list of data.tables, used to block the fill process. That is, the nodes in block and all their descendants are not used in fill.

check

For testing queries. If TRUE, give a warning if there are duplicates in the data (in which case duplicates are deleted)

with_tquery

For testing queries. If TRUE, add a column that shows the name of the specific tquery that was used. This only works if 'queries' is a named list.

concat_dup

see rm_dup arugment.

Details

Note that while queries only find 1 node for each saved component of a pattern (e.g., quote queries have 1 node for "source" and 1 node for "quote"), all children of these nodes are also annotated (if fill is TRUE). If a child has multiple ancestors, only the most direct ancestors are used (see documentation for the fill argument).


vanatteveldt/rsyntax documentation built on Aug. 7, 2018, 1:31 a.m.