annotate_nodes: Annotate a tokenlist based on rsyntaxNodes

View source: R/annotate.r

annotate_nodesR Documentation

Annotate a tokenlist based on rsyntaxNodes

Description

Use rsyntaxNodes, as created with tquery and apply_queries, to annotate a tokenlist. Three columns will be added: a unique id for the query match, the labels assigned in the tquery, and a column with the fill level (0 is direct match, 1 is child of match, 2 is grandchild, etc.).

Usage

annotate_nodes(tokens, nodes, column)

Arguments

tokens

A tokenIndex data.table, or any data.frame coercible with as_tokenindex.

nodes

An rsyntaxNodes A data.table, as created with apply_queries. Can be a list of multiple data.tables.

column

The name of the column in which the annotations are added. The unique ids are added as [column]_id, and the fill values are added as [column]_fill.

Details

Note that you can also directly use annotate.

Value

The tokenIndex data.table with the annotation columns added

Examples

## spacy tokens for: Mary loves John, and Mary was loved by John
tokens = tokens_spacy[tokens_spacy$doc_id == 'text3',]

## two simple example tqueries
passive = tquery(pos = "VERB*", label = "predicate",
                 children(relation = c("agent"), label = "subject"))
active =  tquery(pos = "VERB*", label = "predicate",
                 children(relation = c("nsubj", "nsubjpass"), label = "subject"))

nodes = apply_queries(tokens, pas=passive, act=active)
annotate_nodes(tokens, nodes, 'clause')

rsyntax documentation built on June 7, 2022, 9:07 a.m.