as_tokenindex: Prepare a tokenIndex
In rsyntax: Extract Semantic Relations from Text by Querying and Reshaping Syntax

as_tokenindex

R Documentation

Prepare a tokenIndex

Description

Creates a tokenIndex data.table. Accepts any data.frame given that the required columns (doc_id, sentence, token_id, parent, relation) are present. The names of these columns must be one of the values specified in the respective arguments.

The data in the data.frame will not be changed, with three exceptions. First, the columnnames will be changed if the default values are not used. Second, if a token has itself as its parent (which in some parsers is used to indicate the root), the parent is set to NA (as used in other parsers) to prevent infinite cycles. Third, the data will be sorted by doc_id, sentence, token_id.

Usage

as_tokenindex(
  tokens,
  doc_id = c("doc_id", "document_id"),
  sentence = c("sentence", "sentence_id"),
  token_id = c("token_id"),
  parent = c("parent", "head_token_id"),
  relation = c("relation", "dep_rel"),
  paragraph = NULL
)

Arguments

`tokens`	A data.frame, data.table, or tokenindex.
`doc_id`	candidate names for the document id columns
`sentence`	candidate names for sentence (id/index) column
`token_id`	candidate names for the token id column. Has to be numeric (Some parsers return token_id's as numbers with a prefix (t_1, w_1))
`parent`	candidate names for the parent id column. Has to be numeric
`relation`	candidate names for the relation column
`paragraph`	Optionally, the name of a column with paragraph ids. This is only necessary if sentences are numbered per paragraph, and therefore not unique within documents. If given, sentences are re-indexed to be unique within documents.