View source: R/nlp_dependency_parsing.R
cbind_dependencies | R Documentation |
Annotated results of udpipe_annotate
contain dependency parsing results which indicate
how each word is linked to another word and the relation between these 2 words.
This information is available in the fields token_id, head_token_id and dep_rel which indicates how each token
is linked to the parent. The type of relation (dep_rel) is defined at
https://universaldependencies.org/u/dep/index.html.
For example in the text 'The economy is weak but the outlook is bright', the term economy is linked to weak
as the term economy is the nominal subject of weak.
This function adds the parent or child information to the annotated data.frame.
cbind_dependencies( x, type = c("parent", "child", "parent_rowid", "child_rowid"), recursive = FALSE )
x |
a data.frame or data.table as returned by |
type |
either one of 'parent', 'child', 'parent_rowid', 'child_rowid'. Look to the return value section for more information on the difference in logic. Defaults to 'parent', indicating to add the information of the head_token_id to the dataset |
recursive |
in case when |
Mark that the output which this function provides might possibly change in subsequent releases and is experimental.
a data.frame/data.table in the same order of x
where extra information is added on top namely:
In case type
is set to 'parent'
: the token/lemma/upos/xpos/feats information of the parent (head dependency) is added to the data.frame. See the examples.
In case type
is set to 'child'
: the token/lemma/upos/xpos/feats/dep_rel information of all the children is put into a column called 'children' which is added to the data.frame. This is a list column where each list element is a data.table with these
columns: token/lemma/upos/xpos/dep_rel. See the examples.
In case type
is set to 'parent_rowid'
: a new list column is added to x
containing the row numbers within each combination of doc_id, paragraph_id, sentence_id
which are parents of the token.
In case recursive is set to TRUE
the new column which is added to the data.frame is called parent_rowids
, otherwise it is called parent_rowid
. See the examples.
In case type
is set to 'child_rowid'
: a new list column is added to x
containing the row numbers within each combination of doc_id, paragraph_id, sentence_id
which are children of the token.
In case recursive is set to TRUE
the new column which is added to the data.frame is called child_rowids
, otherwise it is called child_rowid
. See the examples.
## Not run: udmodel <- udpipe_download_model(language = "english-ewt") udmodel <- udpipe_load_model(file = udmodel$file_model) x <- udpipe_annotate(udmodel, x = "The economy is weak but the outlook is bright") x <- as.data.frame(x) x[, c("token_id", "token", "head_token_id", "dep_rel")] x <- cbind_dependencies(x, type = "parent") nominalsubject <- subset(x, dep_rel %in% c("nsubj")) nominalsubject <- nominalsubject[, c("dep_rel", "token", "token_parent")] nominalsubject x <- cbind_dependencies(x, type = "child") x <- cbind_dependencies(x, type = "parent_rowid") x <- cbind_dependencies(x, type = "parent_rowid", recursive = TRUE) x <- cbind_dependencies(x, type = "child_rowid") x <- cbind_dependencies(x, type = "child_rowid", recursive = TRUE) x lapply(x$child_rowid, FUN=function(i) x[sort(i), ]) ## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.