knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  warning = FALSE
)
Sys.setlocale("LC_CTYPE", locale="Tibetan")
options(rmarkdown.html_vignette.check_title = FALSE)

Are you already familiar with Rezonator and want to do quantitative analyses with it, but don't yet know how to use rezonateR? If so, then this overview is for you. This overview will not go through all the details of rezonateR, but we will provide a snapshot of the most important functions in the package that will help you in your annotation and analysis. If you aren't very familiar with the companion tool Rezonator and want to know what can be done with this tool, the toy example vignette("sample_proj") gives a concrete example of a mini-research project using Rezonator and rezonateR, and will give you a feel of what sorts of projects are possible with Rezonator + rezonateR.It would also be helpful to check out the official Rezonator guides and try some simple annotations in Rezonator to get a feel of how it works. If you already know how rezonateR works, you can start from vignette("import_save_basics") to learn the nitty-gritty of coding in rezonateR.

Preliminaries

Why use rezonateR?

If you're reading this, you're probably already using Rezonator to do your daily work. Rezonator is a powerful tool for annotating and visualising the dynamics of human engagement. The purpose of rezonateR is to add a series of additional tools that enhance the functionality of Rezonator and increase your productivity! My goal is to minimise the time you spend on coding and annotating so you have more time for thinking.

Rezonator has many cool features, but due to technical restrictions, there are certain things it can't do, such as:

rezonateR can do all these, and more!

Moreover, rezonateR is geared toward people of all skill levels in R. I do a lot of the heavy lifting for you. As long as you have some familiarity with base R, you can quickly pick up the basic functions, but more advanced users can also extend it as they see fit.

The rezonateR engine is based heavily on Tidyverse packages, particularly rlang and dplyr. Although you don't need to be familiar with Tidyverse to use the basic functionality, Tidyverse users will be happy to see a wide range of functions that mimic Tidyverse functions in appearance, but with additional fields to support the wide range of functionality in rezonateR.

If you're wondering when you should start learning rezonateR, the best time is now! Although you can get started with the Rezonator GUI relatively easily, if there are certain things you know you will use in rezonateR - such as multi-line chunks - you probably want to have the rezonateR post-processing in mind, even when you're annotating in Rezonator.

install.packages("devtools")
library(devtools)
install_github("rezonators/rezonateR")

Some folks have reported that this does not work. You may want to try adding the code options(download.file.method = "auto") before running install_github() if this is the case.

A quick import

Let's start by importing our first file.

library(rezonateR)

Now let's import our first file, a short spoken text in Lhasa Tibetan (you can find the original video here: https://av.mandala.library.virginia.edu/video/couple-must-part-threes-company-02). This file contains a number of chunks, track chains, as well as trees, and we will deal with them in this vignette:

path = system.file("extdata", "virginia-library-20766.rez", package = "rezonateR")

layerRegex = list(
  track = list(field = "trailLayer", regex = c("clausearg", "discdeix"), names = c("clausearg", "discdeix", "refexpr")),
  chunk = list(field = "chunkLayer", regex = c("verb", "adv", "predadj"), names = c("verb", "adv", "predadj", "refexpr")))

myRez = importRez(path, layerRegex = layerRegex, concatFields = c("word", "wordWylie"))

The layerRegex object is a series of instructions to tell importRez how to divide chunks and track chains into different layers. In this case, I placed a field called 'trailLayer' on track chains, which has three possible values: clausearg, discdeix, and nothing. These are captured in the regex field. The first two regexes correspond to the two names 'clausearg' and 'discdeix', and the default case where neither of the first two regexes are detected is 'refexpr'. I have done the same thing to chunks, as you can see above. If you don't want to use layers, you don't have to specify layerRegex. In that case, I will create a single layer called 'default' for you.

The other mysterious field in the importRez function is concatFields. These are fields belonging to tokens that you would like to concatenate for higher-level units like chunks and tokens. For example, if tokens 1 and 2 are 'happy' and 'person', and you have a chunk that contains these two tokens, you would want the whole string 'happy person' to be associated with the chunk. Typically, you should specify at least one field for doing this. In this case, we will concatenate the fields 'word' and 'wordWylie' (Wylie is the most common Romanisation system for Tibetan). It is important not to overdo it, and specify too many fields to concatenate, as this step can slow down your import considerably.

The result of the import is an object called rezrObj, which we will discuss below. When you import a more substantial file, say around 30 minutes, the import speed can be rather slow. Please be patient! The good news is that rezonateR contains functionality for saving and loading rezrObj objects, so you don't have to import each time you work on a file in R.

savePath = "myRez.Rdata"
rez_save(myRez, savePath)
myRez = rez_load(savePath)

Introduction to rezrObjs and nodeMaps

rezrObjs

There are three main kinds of objects in rezonateR that you will interact with directly, namely rezrObj, nodeMap and rezrDF. rezrObjs and nodeMaps will be covered in this section. rezrDFs are relatively complex, and will form the bulk of our discussion in this vignette.

rezrObj objects contain one single nodeMap and several rezrDFs:

print("Item in myRez:")
names(myRez)

The relationships between the various entites are as follows:

This has some ramifications for updating, which we'll come to later.

nodeMaps

The nodeMap is similar to the internal representation of the file used in Rezonator-generated .rez files. The node map in .rez files are a disorganised list of nodes, each of which correspond to an entity inside Rezonator: units, tokens, and so on. The rezonateR nodeMap is similar. The major difference is that nodes are organised into sub-categories, according to the type of entity that the node is encoding. Let's have a sneak peak at these categories:

print("Items in the nodeMap:")
names(myRez$nodeMap)

In practice, you will not interact with most of these nodeMaps except token. Most of the time, you will only be dealing with rezrDFs, which are much easier to work with. Let's take a look at them.

Introduction to rezrDFs

A rezrDF is like a normal data frame that you know from base R. Here's the beginning of the unit table:

print(head(myRez$unitDF %>% select(id, unitSeq, srtLineBo)))

chunkDF, trackDF, rezDF, stackDF, etc., are divided into layers. If you directly access the 'chunkDF' and 'trackDF' components of myRez, you will get a list of rezrDFs, one for each layer. Here are the names of our chunk layers that you might remember from the introduction, along with the beginning of one of the associated rezrDFs:

``` {r}141 print("Component DFs of chunkDF:") names(myRez$chunkDF) print(head(myRez$chunkDF$refexpr) %>% select(id, word))

`rezrDF` inherits from the Tidyverse 'tibble' structure, so all tibble-related functions can be used with them. However, using classic Tidyverse functions on `rezrDF`s is often dangerous, as `rezrDF`s have additional functionality that go beyond classic `rezrDF`s.

There are three main differences that make `rezrDF` special:

### Perk 1: Field access labels

Field access labels prevents you from accidentally changing things that you shouldn't be changing. Let's look at the field access values of the `unitDF`:

```r
print("fieldaccess:")
fieldaccess(myRez$unitDF)

There are five possible field access values:

Perk 2: Reloads

The reload() function is one of the core features of rezonateR that makes it so convenient to use. The reload() feature is based on updateFunctions. You can access the updateFunctions of a table using updateFunct():

print("updateFunct of unitDF:")
updateFunct(myRez$unitDF)

There are three reload functions, reloadLocal(), reloadForeign() and reload(). reloadLocal() only takes a rezrDF, and only updates auto fields. reloadForeign() and reload() take a rezrDF and a rezrObj, and updates the rezrDF using the rezrObj (which may or may not contain the rezrDF).

Let's take a look at reload(), which is the most useful function of these. Here, in the original data, when there are zero mentions, only the orthographic representation is written as \<0>; the Wylie romanisation is a blank string. I want to change the Wylie romanisation to also contain \<0>s. I do that by using the rez_mutate() function on the tokenDF first (don't worry about what that means yet; we'll cover it later). After that, I reload the entryDF, and then reload the unitDF. (Recall that the units depend on entries which in turn depend on tokens; that's why we can't just reload the unitDF directly.) After the update, the unitDF is updated with \<0>s appearing in the Wylie romanisation:

print("Before the update")
myRez$unitDF %>% filter(str_detect(word, "<0>")) %>% rez_select(id, word, wordWylie) %>% head

#Change something in the token rezrDF that is significant for the unit rezrDF
myRez$tokenDF = myRez$tokenDF %>% rez_mutate(
  wordWylie = case_when(word == "<0>" ~ "<0>", T ~ wordWylie))
myRez$entryDF = myRez$entryDF %>% reload(myRez)
myRez$unitDF = myRez$unitDF %>% reload(myRez)

print("After the update")
myRez$unitDF %>% filter(str_detect(word, "<0>")) %>% rez_select(id, word, wordWylie) %>% head

You might be wondering how to reload an entire rezrObj. Because tables often depend on each other (for example, field A in table X relies on field B in table Y which in turn relies on field C in table X), this is technically difficult, but I plan to add this function before the 1.0 release. Stay tuned!

Perk 3: Correpondences to nodeMaps

rezrDFs encode information about whether a field is in the nodeMap or not:

print("inNodeMap:")
inNodeMap(myRez$unitDF)

This doesn't do so much yet, since you're not yet allowed to push a field created in a rezrDF back to a nodeMap. This will be available in the 1.0 release.

Editing rezrDFs

One of the core features of rezonateR is to facilitate the automatic and semi-automatic creation of fields, which is currently not supported in Rezonator. There are also other operations you may want to perform on rezrDFs.

To cater to users of different habits and skill levels, I have introduced three different levels of rezrDFs.

  1. EasyEdit can be quickly picked up by everyone, including base R users, and covers the most basic operations you would want to do to a rezrDF (e.g. addField(), changeFieldForeign())

  2. TidyRez is easy to pick up for tidyverse users, though there is some learning curve for others (e.g. rez_mutate(), rez_left_join())

  3. Core engine: These are mostly functions that I use within rezonateR under the hood. Users who want maximum flexibility may also use them (e.g. lowerToHigher(), createLeftJoinUpdate()), but do be aware that I may make changes to these without notice, since I will assume that most users have little use for them.

Crucially, while EasyEdit and TidyRez syntax are very similar within each category, functions within the core engine are a lot more divergent, and EasyEdit and TidyRez also differ considerably in their syntax. So if you are comfortable with TidyRez, minimising the use of EasyEdit may make your code look more consistent, and vice versa.

EasyEdit

EasyEdit consists of four commonly used functions, addFieldLocal(), addFieldForeign(), changeFieldLocal() and changeFieldForeign(). There are also some less useful functions not covered in this vignette, but which you can find in the references, like addRow().

All of the four basic functions can be applied to both rezrDFs and rezrObjs. In this vignette, we will mainly apply them to rezrObjs. If you want to deal with emancipated rezrDFs, i.e. rezrDFs that are not part of a rezrObj, you will want to use the versions that apply to rezrDFs, but those are simpler than the rezrObj versions, so you should be able to pick them up quickly using the manual.

Let's start by looking at addFieldLocal(). addField() is a shortcut for addFieldLocal(), and we will be using this shortcut name throughout. Our first example is very simple. In our tokenDF, let's add a field that automatically calculates the length of a word in characters. Here, 'entity' specifies the name of the entity you would like to change, 'layer' specifies the layer within that entity (which is an empty string since there are no token layers), fieldName is the name of the field we're adding, expression is the R expression with which we calculate the new field, and fieldaccess tells rezonateR to make this an auto field with an updateFunction that will be attached to the table:

myRez = addField(myRez, entity = "token", layer = "",
                 fieldName = "orthoLength",
                 expression = nchar(word),
                 fieldaccess = "auto")
print("A fragment of the updated table:")
head(myRez$tokenDF %>% rez_select(id, word, orthoLength))
print("The updateFunction:")
updateFunct(myRez$tokenDF, "orthoLength")

Now let's spice this up a bit by adding a complex field. A complex field takes information from multiple rows of a table. In this case, we are working with the tokenDF, but want the new column to be the longest length of the word that appears in the unit that the token comes from. In this case, the groupField is 'unit', and we specify the field type as 'complex'. The expression uses the function longestLength(), which is a rezonateR function that returns the longest word in a series of words.

myRez = addField(myRez, entity = "token", layer = "",
                 fieldName = "longestWordInUnit",
                 expression = longestLength(word),
                 type = "complex",
                 groupField = "unit",
                 fieldaccess = "auto")
head(myRez$tokenDF %>% select(id, word, longestWordInUnit))

Now let's add a simple foreign field. Let's say when we look at the tokenDF, we also want to know what the whole unit's words are. The source is the 'word' field of units, and we are creating a new field for tokens called 'unitWord'. The foreign key is the field that contains IDs of the source table inside the target table, in this case the 'unit' field of tokenDF:

myRez = addFieldForeign(myRez,
                targetEntity = "token", targetLayer = "",
                sourceEntity = "unit", sourceLayer = "",
                targetForeignKeyName = "unit",
                targetFieldName = "unitWord", sourceFieldName = "word",
                fieldaccess = "foreign")
head(myRez$tokenDF %>% select(id, word, unitWord))

Now let's wrap it up with a complex field foreign field. Here, we're going to add a field in the unitDF that tells us the length of the shortest word within the unit. We're going to base this off the entryDF.

However, because the entries that correspond to units are given in the nodeMap, you also need to supply the list of entries inside the unit nodeMap - here it's called entryList. Chunks and tree entries are built on tokens instead, so they have a list called tokenList. Instead of 'expression', complex foreign fields have a field called complexAction, which is a function performed on the source field of the source table:

myRez = addFieldForeign(myRez,
                targetEntity = "unit", targetLayer = "",
                sourceEntity = "entry", sourceLayer = "",
                targetForeignKeyName = "entryList",
                targetFieldName = "shortestWordLength",
                sourceFieldName = "word",
                type = "complex",
                complexAction = shortestLength,
                fieldaccess = "foreign")
head(myRez$unitDF %>% select(id, word, shortestWordLength))

(Because of the technicality that punctuation counts as a character, most of these values are 1. There are ways we can fix this using isWord conditions, as we'll discuss below.)

So far we've only looked at addField(), but the good news is that changeField works in 100% the exact same way! Here's an example, changing our orthoLength field to depend on the Romanisation instead of the original Tibetan script:

myRez = changeField(myRez, entity = "token", layer = "",
                 fieldName = "orthoLength",
                 expression = nchar(wordWylie),
                 fieldaccess = "auto")
print("A fragment of the updated table:")
head(myRez$tokenDF %>% rez_select(id, word, orthoLength))

Note that if you don't specify the field access value, I will automatically change it to flex, even in changeField(). This is to force you to remember that you are not only changing the value of the field itself, but also how it will be updated in the future. If you don't supply a field access value and it's originally an auto or foreign field, I will warn you about this change, so you can run changeField() again if you want to change your mind.

TidyRez

In general, TidyRez functions are called by adding 'rez_' in front of a dplyr function name, such a rez_group_by or rez_mutate. Using TidyRez functions allows you to keep and/or update your field access values, inNodeMap values, and updateFunctions. Using base R or classic dplyr functions with rezrDFs may result in reload fails, unless supplemented by core engine functions, which are not covered in this vignette.

A few dplyr functions are completely safe to use in rezonateR, mostly those that focus on selecting rows of a table, such as filter(), arrange() or slice(). Currently implemented TidyRez functions include:

A few other planned ones include rez_bind_cols() and rez_outer_join(), which will be especially useful for the calculation of inter-annotator agreement.

Because TidyRez is relatively straightforward for Tidyverse users, this vignette will focus on what TidyRez adds on top of Tidyverse. If you want to learn about basic Tidyverse, there are many existing tutorials on the Internet.

To see the power of TidyRez, let's try creating an emancipated rezrDF with only a subset of the original columns. Here, we take trackDF$refexpr, the table of referential expressions. We then damage one of the fields using a classic dplyr function. As you can see here, the emancipated rezrDF can still be updated using the rezrObj, effectively overriding the damage:

refTable = myRez$trackDF$refexpr %>% rez_select(id, token, chain, name, word, tokenOrderLast)
print("Before:")
head(refTable %>% select(id, tokenOrderLast))
refTable = refTable %>% mutate(tokenSeqLast = 1) #Damage refTable with a classic dplyr function
print("After:")
refTable = refTable %>% reload(myRez)
head(refTable %>% select(id, tokenOrderLast))

A warning is in order: TidyRez only updates the current table. If other tables have references to the table you're editing, they will not be updated. You must bear this in mind when using rez_select() and rez_rename(). No problems will arise if you use these functions on emancipated rezrDFs. However, if you use these functions on rezrDFs within rezrObjs, you should manually update any fields in other rezrDFs that refer to the field you've deleted or added. I plan to add a rename feature to EasyEdit in the near future that will update references from other rezrDFs.

Most of the TidyRez functions' syntax deviate from dplyr only minimally in ways that you can read about in the documentation. However, rez_left_join() is worth a quick mention. In addition to a fieldaccess field and a rezrObj field, which are self-explanatory, there is a fkey field and a df2Address field. fkey is the name of the field in the first data.frame that corresponds to IDs of the second data.frame. df2Address is a string that tells rez_left_join() how to find the source rezrDF next time. If the source rezrDF doesn't belong to a layer, e.g. tokenDF, just type that. If the source rezrDF belongs to a layer, put a '/' between the table and the layer, e.g. 'trackDF/refexpr'.

An interlude: Time and sequence

Before we continue our adventure, let's look at a couple of ways we can upgrade our rezrObj to contain even more information.

The first thing we can do, which we hinted at before, is to set certain tokens as non-words. You can do this with the addIsWordField function. One immediate benefit of this is that we get a new sequence value. The fields tokenOrder and docTokenSeq values in the original rezrDF count all tokens, whereas wordOrder and docWordSeq will only count tokens counted as words according to some criterion. Let's set our criterion to !str_detect(wordWylie, "/"), i.e. the token must not contain the main punctuation mark in Tibetan. Notice that wordOrder is generally slightly lower than tokenOrder:

myRez = addIsWordField(myRez, !str_detect(wordWylie, "/"))
head(myRez$tokenDF %>% select(id, tokenOrder, docTokenSeq, wordOrder, docWordSeq))

By default, unitSeq information is not available to rezrDFs other than unitDF. You can change this using the addUnitSeq() feature, which can add unitSeq information up to track chains:

myRez = addUnitSeq(myRez, "track")

This adds a unitSeqFirst and unitSeqLast field to chunks and track chains entries, and a unitSeq field to tokens.

Updating rezonateR using external information

Some annotation actions are easier with a spreadsheet than in a Rezonator, so one action you will frequently perform is to do annotations in a spreadsheet programme and then integrate that information back into a rezrObj. Fortunately, rezonateR contains functionality that can facilitate this and minimise errors generated in the process.

Let's say we want to annotate the person of the referential expressions inside trackDF$refexpr. Before we start annotating manually, I wrote some simple rules to guess what the person is that works for most situations, so we will only have to correct from this baseline:

myRez$trackDF$refexpr = myRez$trackDF$refexpr %>% rez_mutate(person = case_when(word == "ང" | str_starts(word, "ང་") ~ 1,
str_starts(word, "(ཁྱེད|ཁྱོད|ཇོ་ལགས|ཨ་ཅག་ལགས|རྒན་ལགས)") ~ 2,
str_ends(word, "(ལགས|<0>)") ~ 0, #Multiple likely scenarios
T ~ 3))

Before we export this as a CSV for annotation, I would like to add a column inside the rezrDF that gives us the word of the entire unit. (Since this document currently does not have multi-unit track entries, it will suffice to use unitLast or unitFirst). It will be useful to be able to see this column while making manual annotations:

myRez$trackDF$refexpr = myRez$trackDF$refexpr %>%
  rez_left_join(myRez$unitDF %>% rez_select(unitSeq, word), by = c(unitSeqLast = "unitSeq"), suffix = c("", "_unit"), df2key = "unitSeq", df2Address = "unitDF", fkey = "unitSeqLast") %>%
  rez_rename(unitLastWord = word_unit)

The next step is to write the CSV file. rez_write_csv() allows us to do this easily. The third argument of rez_write_csv() is a vector of field names that we want to export. It is advisable to keep the number of exported fields small to make the spreadsheet more manageable and require less scrolling:

rez_write_csv(myRez$trackDF$refexpr, "refexpr.csv", c("id", "name", "unitLastWord", "unitSeqLast", "word", "docTokenSeqLast", "entityType", "roleType", "person"))

After editing the CSV in a spreadsheet program, let's import it back using rez_read_csv(). (I've renamed the edited CSV - in general, I recommend doing this to avoid accidentally overwriting your edited file by running the export code again.) The origDF argument tells rezonateR to look in the original rezrDF that produced the CSV, and determine the data types accordingly:

changeDF = rez_read_csv("refexpr_edited.csv", origDF = myRez$trackDF$refexpr)

Finally, the updateFromDF() function allows us to update the original rezrDF using information from the new rezrDF. There are many fancy option you can choose from, such as deciding whether to delete rows, add rows, add columns, etc. We will only use the most vanilla options, and update the 'person' column:

myRez$trackDF$refexpr = myRez$trackDF$refexpr %>% updateFromDF(changeDF, changeCols = 'person')
head(myRez$trackDF$refexpr %>% select(id, word, person))

Analysing track chains with EasyTrack

Now that we've looked at an example of semi-automatic annotation, let's move on to some full automation! We will be looking in particular at coreference chains. rezonateR contains a suite of functions for generating features useful for analysing the choice of referential forms, reference comprehension, and similar topics.

Anaphoric and cataphoric distance

Let's first find out how many units we are from the previous mention of something. This is equivalent to the gapUnit column that already exists as automatically generated by Rezonator:

myRez$trackDF$refexpr = myRez$trackDF$refexpr %>%
  rez_mutate(unitsToLastMention = unitsToLastMention(unitSeqLast))
myRez$trackDF$refexpr %>% select(id, gapUnits, unitsToLastMention) %>% slice(10:16)

Now let's count the tokens from the last mention using the tokensToLastMention() function. This one has a couple of complications. The first one is which seq to count. In the interlude, we mentioned that in addition to docTokenSeq, we have a sequence value called docWordSeq that excludes nonwords. We will use that value in counting. The second complication is how we will treat zero mentions. Zeros do not actually exist in the world, so they have no time to speak of. The 'zeroProtocol' argumentis 'unitFinal', telling rezonateR to count the last word of whatever unit the zero comes from. Finally, since we're dealing with units, we need to pass the unitDF to ensure that tokensToLastMention can have access to unit information:

myRez$trackDF$refexpr =  myRez$trackDF$refexpr %>%
  rez_mutate(wordsToLastMention = tokensToLastMention(
    docWordSeqLast, #What seq to use
    zeroProtocol = "unitInitial", #How to treat zeroes
    zeroCond = (word == "<0>"),
    unitDF = myRez$unitDF)) #Additional argument for unitFinal protocol
myRez$trackDF$refexpr %>% select(id, wordsToLastMention) %>% slice(10:16)

Note that unitsToNextMention and tokensToNextMention work in the same way.

Tallying preceding and following mentions

We can also count how many previous mentions of something there were within a window of units. Most people do five or 20 unit. Let's try this with 5. The countPrevMentions allows us to do this (countNextMentions() does this but for the succeeding context):

myRez$trackDF$refexpr = myRez$trackDF$refexpr %>% rez_mutate(noPrevMentionsIn5 = countPrevMentions(5))
myRez$trackDF$refexpr %>% select(id, noPrevMentionsIn5)  %>% slice(10:16)

Sometimes, we may want to extract previous mentions conditionally, e.g. only count subject mentions or zero mentions. The functions countPrevMentionsIf() and countNextMentionIf() allow us to define such a condition. Let's try counting the number of coming zero mentions. Here, we use the condition word == "<0>", i.e. the word is a zero, and the window is Inf, i.e. there's no limit on how far in the future we look:

myRez$trackDF$refexpr = myRez$trackDF$refexpr %>% rez_mutate(noComingZeroes = countNextMentionsIf(Inf, word == "<0>"))
myRez$trackDF$refexpr %>% select(id, noComingZeroes)  %>% slice(10:16)

Counting competitors

We may also want to count competing mentions, that is, recent mentions not coreferential to the current mention. countCompetitors() tallies the number of competitors intervening between the previous and current mention, possibly within a window. Here is one example with no window:

myRez$trackDF$refexpr = myRez$trackDF$refexpr %>% rez_mutate(noCompetitors = countCompetitors())
myRez$trackDF$refexpr %>% select(id, noComingZeroes)  %>% slice(10:16)

All of the functions introduced in this section have additional fields that allow for further customisation. Please feel free to refer to the manual for more information.

Adding tree information

Now let's add some information from trees. The first thing to do is to run the getAllTreeCorrespondences() function, which adds a treeEntry column to non-tree tables. If you select entity = "track", this column will be added to tokenDF, chunkDF and trackDF.

myRez = getAllTreeCorrespondences(myRez, entity = "track")
myRez$trackDF$refexpr %>% select(id, treeEntry) %>% slice(10:16)

The best thing that trees do for us is connecting verb information, stored in chunks, to track chain entry (i.e. referential expression) information. We can do this in two steps. First we add a treeParent column to trackDF$refexpr that takes the value of the 'parent' column of treeEntryDF; in simple terms, this means we're getting the parent tree entry's ID into trackDF$refexpr. We then use this parent tree entry's ID to find the corresponding verb chunk, and with this, we have successfully put the verb on the trackDF$refexpr table.

myRez = myRez %>% addFieldForeign("track", "refexpr", "treeEntry", "default", "treeEntry", "treeParent", "parent", fieldaccess = "foreign")
myRez$trackDF$refexpr = myRez$trackDF$refexpr %>% rez_left_join(myRez$chunkDF$verb %>% select(id, word, treeEntry), by = c(treeParent = "treeEntry"), suffix = c("", "_verb"), df2Address = "chunkDF/verb", fkey = "treeParent", df2key = "treeEntry", rezrObj = myRez) %>% rename(verbID = id_verb, verbWord = word_verb)
myRez$trackDF$refexpr %>% select(id, treeParent, verbID, verbWord) %>% slice(10:16)

Advanced: Chunk mergers

The last topic to cover is merging chunks, most useful for creating muti-line chunks. There are several steps to merging chunks:

  1. Create constituent chunks that span the entire merged chunk

  2. Create a tree leaf that contains all tokens in the merged chunk, and put the leaf in a tree.

  3. Use the mergeChunksWithTree() command in rezonateR to merge them.

mergeChunksWithTree() is very easy to use. After you call this command, the merged chunks will be added to the bottom of the correponding chunk rezrDF. Chunk tags are taken from the first constituent chunk of each merger by default; see the manual for setting custom conditions. There will in addition be a column called combinedChunk that tells you whether a chunk is a combined chunk, a member of a combined chunk, or neither.

myRez = mergeChunksWithTree(myRez)
myRez$chunkDF$refexpr %>% filter(combinedChunk != "") %>% select(id, name, word, combinedChunk) #Showing only combined chunks and their members

You may also augment the trackDF with the merged chunks; the combinedChunk column works similarly:

myRez = mergedChunksToTrack(myRez, "refexpr")

Where to go from here?

Now that you've seen the bare-bones basics of using rezonateR, if you want to dive in and start using it, you can proceed to our sequence of detailed tutorials starting from vignette("import_save_basics"). If you want to see a concrete example of a mini-project, take a look at vignette("sample_proj").



johnwdubois/rezonateR documentation built on Nov. 19, 2024, 11:17 p.m.