get_nodes: Transform the nodes to long format and match with token data

View source: R/annotate.r

get_nodesR Documentation

Transform the nodes to long format and match with token data

Description

Transform the nodes to long format and match with token data

Usage

get_nodes(tokens, nodes, use = NULL, token_cols = c("token"))

Arguments

tokens

A tokenIndex data.table, or any data.frame coercible with as_tokenindex.

nodes

A data.table, as created with apply_queries. Can be a list of multiple data.tables.

use

Optionally, specify which columns from nodes to add. Other than convenient, this is slighly different from subsetting the columns in 'nodes' beforehand if fill is TRUE. When the children are collected, the ids from the not-used columns are still blocked (see 'block')

token_cols

A character vector, specifying which columns from tokens to include in the output

Value

A data.table with the nodes in long format, and the specified token_cols attached

Examples

## spacy tokens for: Mary loves John, and Mary was loved by John
tokens = tokens_spacy[tokens_spacy$doc_id == 'text3',]

## two simple example tqueries
passive = tquery(pos = "VERB*", label = "predicate",
                 children(relation = c("agent"), label = "subject"))
active =  tquery(pos = "VERB*", label = "predicate",
                 children(relation = c("nsubj", "nsubjpass"), label = "subject"))

nodes = apply_queries(tokens, pas=passive, act=active)
get_nodes(tokens, nodes)

rsyntax documentation built on June 7, 2022, 9:07 a.m.