Using the glossr package"
In glossr: Use Interlinear Glosses in R Markdown

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  echo = TRUE
)

Introduction

The {glossr} package offers useful functions to recreate interlinear glosses in R Markdown texts. The immediate solution for a LaTeX output is to use a specific library, such as {gb4e} (the one I knew when this package was born) or {expex} (the one this package uses now). If PDF output is enough for you, you can still use this package to automatically print them in an R-chunk, minimizing typos^[Most of my code is designed to avoid typos. Let's just say that this package would have taken a few hours less if I didn't constantly write leizpig instead of leipzig.], and even generate them automatically from a dataframe with your examples! But chances are, PDF is not enough for you, and you would also like a nice rendering (or at least some rendering) of your interlinear glosses in HTML as well... and why not, MS Word! This offers some challenges, because you would need to figure out how to render them to begin with, and neither the way to print them nor the way to reference them are compatible across output formats. I took that pain and packaged it so you don't need to feel it.

You can start using glossr in an R Markdown file by calling the library and then use_glossr() to activate some background stuff. Mainly, this function informs all the other functions whether you are using LaTeX, HTML^[You can also choose between the default HTML implementation, with leipzig.js, or a "legacy" implementation with simple tooltips and up to two glossing lines; do so by running use_glossr("tooltip").] or neither, in which case it assumes you have Word output. This vignette has been run in all three formats by changing the output format to bookdown::pdf_document2, which renders a PDF file; bookdown::html_document2 and rmdformats::readthedown, which render HTML files; and officedown::rdocx_document, which renders an MS Word file [@R-bookdown; @R-officedown; @R-rmdformats]. You can also use glossr in Quarto documents and create slides!

As you can see in vignette("styling"), use_glossr() also takes some variables to set up document-wide styling options for specific parts of your glosses. The code below sets the name of the source to render in boldface and the first line of each gloss in italics.

library(glossr)
use_glossr(styling = list(
  source = "b",
  first = "",
  font_family = "Cambria"
))

Basic usage

When you want to include an example, create a gloss with as_gloss() and call it inside a normal chunk. There are currently four named, optional arguments that will be treated specially:

label will be interpreted as the label for cross-references;
source will be interpreted as text for a non-aligned first line, e.g. a reference to the source of your example;
translation will be interpreted as text for a free translation;
trans_glosses indicates what character should surround the translation, by default ". (See vignette("styling").)

All other values will be interpreted as lines to be aligned and reproduced in the order given, but only up to 3 lines are allowed.[^chinese_extra]

[^chinese_extra]: Note that if you use Chinese characters you will need to add some LaTeX packages (namely {fontspec} and {xeCJK}). You can do that either by adding them to the header-includes section or to the extra_dependencies list inside the pdf_document output section in your YAML. Thanks to Thomas Van Hoey for offering me the example and pointing this out.

my_gloss <- as_gloss(
  "她 哇的一聲 大 哭起來，",
  "tā wā=de-yì-shēng dà kū-qǐlái,",
  "TSG waa.IDEO-LINK-one-sound big cry-inch",
  translation = "Waaaaa, she began to wail.",
  label = "my-label",
  source = "ASBC (nº 100622)"
)
my_gloss

The label given to as_gloss() allows you to cross-reference the example: in PDF this looks like example (\ref{my-label}), whereas in HTML and Word you would use example (@my-label). What should YOU do? gloss() can be used inline to generate a reference for either PDF or HTML, depending on the output of your file: r gloss("my-label") in this case.

If you have many examples, you might want to keep them in their own file, if you don't have them like that already. glossr offers a small dataset for testing, called data(glosses).

library(dplyr) # for select() and filter()
data(glosses)
glosses <- glosses |> 
  select(original, parsed, translation, label, source) |> 
  mutate(source = paste0("(", source, ")"))
glosses
glosses$label

Assuming you have them in a table with columns matching the arguments of as_gloss(), you can give it to gloss_df() directly and it will do the job. That is: columns named "translation", "source", "label" and "trans_glosses" will be interpreted as those arguments, and all the others will be read as lines to align regardless of their column names. This table has more columns than we need, so we will only select the right ones and print the glosses of the first three rows. Note that the values in the "label" column will be used as labels: `r knitr::inline_expr('gloss("feel-icelandic")')` will return r gloss("feel-icelandic").

gloss_df(head(glosses, 3))

PDF-only features

This package also offers a few extensions when working on PDF output. On the one hand, gloss_list() allows you to nest a list of glosses and have both a reference for the list and for each individual item. This will not work in HTML or Word, which will just keep the numbering on the top level. But on PDF, given the function on some examples from @shindo_2015, we can use `r knitr::inline_expr('gloss("jp")')` to reference r gloss("jp"), or `r knitr::inline_expr('gloss("heartwarming-jp")')` and `r knitr::inline_expr('gloss("languid-jp")')` to reference r gloss("heartwarming-jp") and r gloss("languid-jp").

filter(glosses, endsWith(label, "jp")) |> 
  gloss_df() |> 
  gloss_list(listlabel = "jp")

Finally, it might be the case that you want to apply LaTeX formatting to a long string of elements for your first lines of glosses, e.g. set half of your example in italics. In order to facilitate applying the same formatting to each individual element, this package offers you gloss_format_words(), which you can implement to the strings given to as_gloss().

Internally, glossr will try to parse LaTeX formatting into HTML but currently it doesn't parse it to Word or read HTML/markdown tags. (But see vignette("styling").)

gloss_format_words("A long piece of text", "textit")

my_gloss <- as_gloss(
  original = gloss_format_words("Hace calor/frío", "textbf"),
  parsed = "make.3SG.PRS heat/cold.N.A",
  translation = "'It is hot/cold'",
  label = "formatted"
)
my_gloss

About the formats

The Latex output writes your glosses with the format required by the {expex} package. The default HTML rendering uses leipzig.js 0.8.0 (and, of course, {htmltools} [@R-htmltools] to read it with R). The Word output computes the width of each aligned piece in pixels based on the family font and size using systemfonts::string_width() [@R-systemfonts] and tries to align them programmatically via spaces.

If you are familiar with these tools and would like to suggests expansions or contribute to the package, go ahead, I would love to hear from you!