knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  warning = FALSE
)
options(rmarkdown.html_vignette.check_title = FALSE)
library(rezonateR)

Getting started

This tutorial discusses trees in rezonateR. In the first tutorial vignette("import_save_basics"), we took a quick glimpse at trees when discussing the node map, but now it's time to dive a bit deeper! And if you haven't read the previous vignettes, as always, that's fine because each tutorial is self-contained. I won't assume you've read the bit about trees in vignette("import_save_basics").

We will be using the same Santa Barbara Corpus annotation as before:

library(rezonateR)
path = system.file("extdata", "rez007_tree.Rdata", package = "rezonateR", mustWork = T)
rez007 = rez_load(path)

The file contains predicate-argument structures for the first fifth of the text or so. In practice, trees are often the most time-consuming data structure to process if you have annotated them throughout the whole text. Each tree is a predicate-argument structure with a tree as the root and all its arguments as leaves. In cases of IUs with more than one verb, multiple trees are created, one for each verb.

The structure of trees: A deeper look

There are three tree-related entities in Rezonator (and rezonateR):

This section will examine each of these in detail.

Let's first take a quick look at our treeDF:

head(rez007$treeDF$default)

The main column of interest that you might not understand is maxLevel. You can actually see maxLevel by looking at Rezonator. The root of a tree has level 0, whereas the next one down has a level of 1. Because of our policy of marking only one predicate-arguments structure per tree, the maxLevel always comes out as 1.

Now onto our treeEntrys:

head(rez007$treeEntryDF$default)

Here are some explanations of the unfamiliar entries:

The column Relation (and subtype) at the end are automatically generated from treeLinkDF:

head(rez007$treeLinkDF)

Here, source is the parent entry, goal is the child entry, type is always treeLink, subtype is always tree. Relation is an annotation, manually annotated in Rezonator. Subjects are annotated as Subj and others are left blank.

Linking things up: getAllTreeCorrespondences()

The function getAllTreeCorrespondences() adds a column treeEntry to other rezrDFs like chunkDF and trackDF to indicate corresponding treeEntrys. The parameter entity determines which entity you're adding the information to. As always, if you set entity to a 'higher' entity like track, treeEntry is added to 'lower' entities like tokens and chunks too. When there are multiple tree entries corresponding to something, the first tree will take precedence. Let's do this to track:

rez007 = getAllTreeCorrespondences(rez007, entity = "track")
head(rez007$tokenDF %>% select(id, text, treeEntry))
head(rez007$chunkDF$refexpr %>% select(id, text, treeEntry))
head(rez007$trackDF$default %>% select(id, text, treeEntry))

Once this step is done, you can use the results to link up information from the trees to other tables using addFieldForeign() or rez_left_join(). In anticpiation of the coming tutorial, let's merge the Relation field's information into trackDF:

rez007 = rez007 %>%
  addFieldForeign("track", "default", "treeEntry", "default", "treeEntry", "gramRelation", "Relation", fieldaccess = "foreign")
head(rez007$track$default %>% select(id, chain, text, treeEntry, gramRelation))

Merging chunks with tree entries

The current Rezonator interface only supports chunks within an intonation unit. However, there are often reasons to have chunks that span across intonation units. For example, people often start a new intonation unit within a noun phrase after a filler like uh.... Or we might want to treat an entire subordinate clause and its dependents as a single chunk.

There are two ways to merge chunks, but one way to do it is through tree entries. The steps are as follows:

  1. Create constituent chunks, one per each unit that the chunk spans so that, taken together, the chunks span the desired multiline chunk
  2. Create a treeEntry that contains all tokens in the merged chunk, and put the leaf in a tree.
  3. Use the mergeChunksWithTree() command in rezonateR to merge them.

Steps 1 and 2 were already done in Rezonator. Take the following example of a subordinate clause and its dependent clauses:

Figure 1: An image of the Rezonator interface with the four lines 'when Ron gets home from work , (...) I wanna <0> spend time with Ron, because Ron, (...) usually does n't get home till nine or ten' each entered into a chunk in the same trail, and combined into a single treeEntry.{width=100%}

The function mergeChunksWithTree() helps you create the desired multiline chunks. It has a few arguments, only the first of which is obligatory;

rez007 = mergeChunksWithTree(rez007)
#Relevant rows only
rez007$chunkDF$refexpr %>%
  filter(combinedChunk != "") %>%
  select(id, name, text, combinedChunk) #Showing only combined chunks and their members

After you call this command, the merged chunks will be added to the bottom of the correponding chunk rezrDF. Chunk tags are taken from the first constituent chunk of each merger by default; see the manual for setting custom conditions. There will in addition be a column called combinedChunk that tells you whether a chunk is a combined chunk, a member of a combined chunk, or neither.

There are three possible types of values in combinedChunk:

If you didn't set addToTrack = T, the function mergeChunksToTrack has the same effect:

rez007 = mergedChunksToTrack(rez007, "default")

Looking for family

In many applications, we may want to navigate tree using a chunk (or token, track, or rez) as a starting point, such as finding the arguments of a verb (its children), or the verb that a referential expression is an argument of (its parent). Finding the parent is easy, because you can just look up the parent column in treeEntryDF. rezonateR provides some functions for finding siblings and children as well.

addPositionAmongSiblings(): Finding the relative position of an element among its siblings

One common thing we may want to do when investigating word order is to find the position of an element among its siblings. It has four :

Here is how we would apply this function to referential expressions:

rez007$chunkDF$refexpr = addPositionAmongSiblings(rez007$chunkDF$refexpr, rez007)

Advanced: Finding siblings and children of a chunk

For more advanced operations, there are several ways of finding the children and siblings of an entry or chunk:

Because rezrDFs discourage columns where each cell is a vector, and you can have multiple siblings or children, it is not encouraged to use these functions directly in rez_mutate() or addField(), unless you can be confident that the result is unique (e.g. if you are only looking for the subject of a clause, and there can only be one subject in the data you're working with).

The arguments of these functions are relatively straightforward: treeEntry: A treeEntry ID that you're starting from. chunkID: A chunk ID that you're starting from. treeEntryDF: A treeEntryDF from a rezrObj. Must be a rezrDF, not a list; combineLayers() can be used to combine multiple rezrDFs. chunkDF: A chunkDF from a rezrObj, * cond: A condition using columns from the chunkDF.

The only tricky thing about using these is the two rezrDF arguments. They must be a single rezrDF, not a list, and the chunkDF must contain both parents and children (for child-related functions) and all relevant siblings (for sibling-related functions). And of course, the chunkDF must have a treeEntry column from getAllTreeCorrespondences(). The functions combineLayers(), combineChunks() or combineTokenChunk() that we have discussed before in vignette("import_save_basics") can be used to combine multiple rezrDFs.

Here are two examples: First to find the siblings of the chunk 33EAD4C986974 mutual r respect, and then to find the subject of the verb 388D9A04E5559 do train:

siblings = getSiblingsOfChunk(chunkID = "33EAD4C986974",
                   chunkDF = rez007$chunkDF$refexpr,
                   treeEntryDF = rez007$treeEntryDF$default)
rez007$chunkDF$refexpr %>% filter(id %in% siblings)
rez007 = rez007 %>%
  addFieldForeign("chunk", "refexpr", "treeEntry", "default", "treeEntry", "gramRelation", "Relation", fieldaccess = "foreign") %>%
  addFieldForeign("chunk", "verb", "treeEntry", "default", "treeEntry", "gramRelation", "Relation", fieldaccess = "foreign") %>%
  addFieldForeign("token", "", "treeEntry", "default", "treeEntry", "gramRelation", "Relation", fieldaccess = "foreign")
getChildrenOfChunkIf(chunkID = "388D9A04E5559",
                     chunkDF = combineTokenChunk(rez007),
                     treeEntryDF = rez007$treeEntryDF$default,
                     cond = (gramRelation == "Subj"))

Speaking of tracks ...

As usual, let's not forget:

savePath = "rez007.Rdata"
rez_save(rez007, savePath)


johnwdubois/rezonateR documentation built on Nov. 19, 2024, 11:17 p.m.