library(lingglosses) # in order to have the same list of glosses through the whole document options("lingglosses.refresh_glosses_list" = FALSE)
The list of abbreviations is an obligatory part of linguistic articles that nobody reads. These lists contain definitions of abbreviations used in the article (e.g. the names of corpora or sign languages), but also a list of linguistic glosses --- abbreviations used in interlinear glossed examples. There is a document proposing standardized glossing rules [@comrie08], which ends with a list of 84 standard abbreviations. A much bigger list of standard abbreviations is present on Wikipedia. However, researchers can deviate from the proposed abbreviations and use their own instead.
The following list of abbreviations, which I came across in a published article, makes it clear that there is room for improvement in compiling such lists:
NOM = nominative, GEN = nominative, DAT = nominative, ACC = accusative, VOC = accusative, LOC = accusative, INS = accusative, PL = plural, SG = singular
Besides the obvious errors, this list contains more problems that I would like to point out:
r add_gloss("SBJV")
, r add_gloss("IMP")
) are absent in the list.The main goal of the lingglosses
R package is to provide an option for creating:
.html
output of rmarkdown
[@xie18][^latex];[^latex]: If you want to render a .pdf
version you can either use latex and multiple linguistic packages developed for it (see e. g. gb4e
, langsci
, expex
, philex
), or you can render .html
first and convert it to .pdf
afterwards.
You can install the stable version of the package from CRAN:
install.packages("lingglosses")
You can also install the development version of lingglosses
from GitHub with:
# install.packages("remotes") remotes::install_github("agricolamz/lingglosses")
In order to use the package you need to load it with the library()
call:
library(lingglosses)
You can go through the examples in this tutorial or you can create a lingglosses example from the rmarkdown template (File > New File > R Markdown... > From Template > lingglosses Document).
gloss_example()
The main function of the lingglosses
package is gloss_example()
. This package has the following arguments:
transliteration
;glosses
;free_translation
;comment
;grammaticality
;annotation
[^orth];line_length
.[^orth]: I used annotation
for representing orthography, but it also possible to use this tier for the annotation of words, like here:
gloss_example(transliteration = "eze a za a", glosses = "NP PRFX ROOT SFX", free_translation = "Eze swept... (Igbo, from [@goldsmith79: 209])", annotation = "HL H L H")
All arguments except the last one are self-explanatory.
gloss_example(transliteration = "bur-e-**ri** c'in-ne-sːu-w", glosses = "fly-NPST-**INF** know-HAB-NEG-M", free_translation = "I cannot fly. (Zilo Andi, East Caucasian)", comment = "(lit. do not know how to)", annotation = "Бурери цIиннессу.", grammaticality = "*")
In this first example you can see that:
italic_transliteration = FALSE
)[^ital-all];**a**
for bold);[^ital-all]: Sometimes it is make sense to set this option ones for the whole document using the following code options("lingglosses.italic_transliteration" = FALSE)
.
Since the function arguments' names are optional in R, users can omit them as long as they follow the order of the arguments (you can always find the correct order in ?gloss_example
):
gloss_example("bur-e-**ri** c'in-ne-sːu-w", "fly-NPST-**INF** know-HAB-NEG-M", "I cannot fly. (Zilo Andi, East Caucasian)", "(lit. do not know how to)", "Бурери цIиннессу.", "*")
It is possible to number and call your examples using the standard rmarkdown
tool for generating lists (@)
:
(@) my first example (@) my second example (@) my third example
renders as:
(@) my first example (@) my second example (@) my third example
In order to reference examples in the text you need to give them names:
(@my_ex) example for referencing
(@my_ex) example for referencing
With the names settled you can reference the example (@my_ex) in the text using the following code (@my_ex)
.
So this kind of example referencing can be used with lingglosses
examples like in (@lingglosses1) and (@lingglosses2). The only important details are:
echo = FALSE
(or specify it for all code chunks with the following comand in the begining of the document knitr::opts_chunk$set(echo = FALSE")
);(@...)
) and the code chunk with lingglosses
code.(@lingglosses1)
gloss_example("bur-e-**ri** c'in-ne-sːu", "fly-NPST-**INF** know-HAB-NEG", "I cannot fly. (Zilo Andi, East Caucasian)", "(lit. do not know how to)")
(@lingglosses2) Zilo Andi, East Caucasian
gloss_example("bur-e-**ri** c'in-ne-sːu", "fly-NPST-**INF** know-HAB-NEG", "I cannot fly.", "(lit. do not know how to)")
Sometimes people gloss morpheme by morpheme (this is especially useful for polysynthetic languages). It is also possible in lingglosses
. You can annotate slots with the annotation
argument, see footnote 2 for the details.
(@) Abaza, West Caucasian [@arkadiev20: example 5.2]
gloss_example("s- z- á- la- nəq'wa -wa -dzə -j -ɕa -t'", "1SG.ABS POT 3SG.N.IO LOC pass IPF LOC 3SG.M.IO seem(AOR) DCL", "It seemed to him that I would be able to pass there.")
The glossing extraction algorithm implemented in lingglosses
is case sensitive, so if you want to escape it you can use curly brackets:
(@) Kvankhidatli Andi, [@verhees19: 203]
gloss_example("den=no he.ʃː-qi hartʃ'on-k'o w-uʁi w-uk'o.", "{I}=ADD DEM.M-INS watch-CVB M-stand.AOR M-be.AOR", "And I stood there, watching him.")
In the example above {I}
is just the English word I that will be escaped and will not appear in the gloss list as marker of class I.
It make sense to avoid to use single quotes for the quotation, since it can cause some troubles for the package's functions and use escape slash for quotations, like in the following example:
(@) Kunbzang Japhug, [@jacques21: 1143]
gloss_example("\"a-pi ɲɯ-ɕpaʁ-a\" ti ɲɯ-ŋu", "1SG.POSS-elder.sibling SENS-be.thirsty-1SG say:FACT SENS-be", "She said: \"Sister, I am thirsty.\"")
After a while I was asked to make it possible to add sole line examples:
gloss_example("Learn to value yourself, which means: to fight for your happiness. (Ayn Rand)", line_length = 100)
Sometimes examples are too long and do not fit onto the page. In that case you need to add the argument results='asis'
to your chunk. gloss_example()
will then automatically split your example into multiple rows.
(@tsa_ex) Mishlesh Tsakhur, East Caucasian [@maisak07: 386]
gloss_example('za-s jaːluʁ **wo-b** **qa-b-ɨ**; turs-ubɨ qal-es-di ǯiqj-eː jaːluʁ-**o-b** **qa-b-ɨ**', '1SG.OBL-DAT shawl.3 AUX-3 PRF-3-bring.PFV woolen_sock-PL NPL.bring-PL-A.OBL place-IN shawl.3-AUX-3 PRF-3-bring.PFV', '(they) **brought** me a shawl; instead of (lit. in place of bringing) woolen socks, (they) **brought** a shawl.', '(Woolen socks are considered to be more valuable than a shawl.)')
If you are not satisfied with the result of the automatic split you can change the value of the line_length
argument (the default value is 70
, that means 70 characters of the longest line).
It is possible to add a soundtrack to the example using an audio_path
argument. It can be both: a path to the file or an URL.
(@) Abaza, West Caucasian (my field recording)
gloss_example("á-ɕa", "DEF-brother", "This brother", audio_path = "abaza_brother.wav")
You can hear the recording if you click on the note icon above. If you do not like the icon, you can change it to any text using an audio_label
argument.
Adding video is also possible:
(@) Ukrainian Sign Language (video from https://www.spreadthesign.com)
gloss_example("PIECE", "piece", video_path = "USL_piece.mp4")
There are additional arguments video_width
and video_hight
for width and hight.
When an example is small, the author may not want to put it in a separate paragraph, but prefer to display it as part of the running text. This is possible to achieve using the standard for rmarkdown
inline code. The result of the R code can be inserted into the rmarkdown document using the backtick symbol and the small r, for example `r 2+2`
will be rendered as r 2+2
. Currently lingglosses
can not automatically detect whether code was provided via code chunk or inline. So if you want to use an in-text glossed example and want the glosses to appear in list, it is possible to write them using the gloss_example()
with the intext = TRUE
argument. Here is a Turkish example from (@delancey97): r gloss_example("Kemal gel-miş", "Kemal come-MIR", intext = TRUE)
that was produced with the following inline code:
`r gloss_example("Kemal gel-miş", "Kemal come-MIR", intext = TRUE)`
In the third section I show how you can create a semi-automatically compiled list of abbreviations for your document. As an example I provide the list for this exact document. Even though the r add_gloss("MIR")
gloss appears only in this exact section in the in-text example above, it appears in the lists presented in the third section.
add_gloss()
Sometimes glosses are used in other environments besides examples, e.g. in a table or in the text. So if you want to use in-text glosses and want them to appear in the glosses list, it is possible to add them using the add_gloss()
function. As an example I adapted part of the verbal inflection paradigm of Andi (East Caucasian) from Table 2 [@verhees19: 199]:
| | r add_gloss("AFF")
| r add_gloss("NEG")
|
|----------------------|----------------------|----------------------|
| r add_gloss("AOR")
| -∅ | -sːu |
| r add_gloss("MSD")
| -r | -sːu-r |
| r add_gloss("HAB")
| -do | -do-sːu |
| r add_gloss("FUT")
| -dja | -do-sːja |
| r add_gloss("INF")
| -du | -du-sːu |
that is generated using the folowing markdown[^poortable] code[^tablesgenerator]:
[^poortable]: The table generated with markdown is visually poor. There are a lot of other ways to generate a table in R: kable()
from knitr
; kableExtra
package, DT
package and many others.
[^tablesgenerator]: It is easier to generate Markdown or Latex tables with Libre Office or MS Excel and then use an online table generator website like https://www.tablesgenerator.com/.
| | `r add_gloss("AFF")` | `r add_gloss("NEG")` | |----------------------|----------------------|----------------------| | `r add_gloss("AOR")` | -∅ | *-sːu* | | `r add_gloss("MSD")` | *-r* | *-sːu-r* | | `r add_gloss("HAB")` | *-do* | *-do-sːu* | | `r add_gloss("FUT")` | *-dja* | *-do-sːja* | | `r add_gloss("INF")` | *-du* | *-du-sːu* |
In the third section I show you how to create a semi-automatically compiled abbreviation list for your document. As an example I provide the list of abbreviations for this exact document. Even though the r add_gloss("FUT")
and r add_gloss("MSD")
glosses appears only in this exact section in the table above, it appears in the lists presented in the third section.
Unfortunately, gloss extraction implemented in lingglosses
is case sensitive. That makes it hard to use for the glossing of Sign Languages, because:
1) Sign linguists gloss lexical items with capitalized English translations; 2) Sign language glosses are sometimes split into two lines, each of which is associated with one hand (or even more if you want to account for non-manual markers); 3) Sign language glosses should be somehow aligned with video/pictures (see the fascinating signglossR by Calle Börstell); 4) There can be empty space in glosses; 5) There can be some placeholders that corresponds to an utterance by one articulator (e.g. a hand), which are held stationary in the signing space during the articulation made by another articulator.
I will illustrate these problems with an example from Russian Sign Language [@kimmelman12: 421]:
gloss_example(glosses = c("LH: {CHAIR} ________", "RH: {} CL:{SIT}.{ON}"), free_translation = "The cat sits on the chair", comment = "[RSL; Eks3–12]", drop_transliteration = TRUE)
The capitalization that is not used for morphemic glossing is embraced with curly brackets, so that lingglosses
does not treat these items as glosses. Two separate gloss lines for different hands are provided with a vector with two elements (see c()
function for the vector creation). It is important to provide the drop_transliteration = TRUE
argument, otherwise internal tests within the gloss_example()
function will fail.
It is also possible to use pictures in a transliteration line, see an example from Kazakh-Russian Sign Language [@kuznetsova21: 51] (pictures are used with the permission of the author Anna Kuznetsova):
gloss_example("![](when.png) ![](mom.png) ![](tired.png)", c("br_raise_______ {} {}", "chin_up_______ {} {}", "{WHEN} {MOM} {TIRED}"), "When was mom tired?")
The first line corresponds to pictures in markdown format that should be located in the same folder (otherwise you need to specify the path to them, e.g. ![](images/your_plot.png)
). The next three lines correspond to different lines in the example with some non-manual articulation: as before, all glossing lines are stored as a vector of strings. The user can replace {}
with _______
in order to show the scope of non-manual articulation.
After you finished your text, it is possible to call the make_gloss_list()
function in order to automatically create a list of abbreviations.
make_gloss_list()
This function works with the built-in dataset glosses_df
that is compiled from Leipzig Glosses, Wikipedia page and articles from the open access journal Glossa[^glossa]. Everybody can download and change this dataset for their own purposes. I would be grateful if you leave your proposals for changes to the dataset for this list in the issue tracker on GitHub.
[^glossa]: The script for collecting glosses is available here. The list was manually corrected and merged with glosses from other sources. This kind of glosses are marked in the glosses_df
dataset as lingglosses
in the source
column.
It is possible that the user is not satisfied with the result of the make_gloss_list()
function. In this case there are two possible strategies. The first strategy is to copy the result of the make_gloss_list()
, modify it and paste it into your rmarkdown
document. Sometimes you work on some volume dedicated to a particular group of languages and you want to assure that glosses are the same across all articles. Then you can compile your own table with the columns gloss
and definition_en
and use it within the make_gloss_list
function. As you can see, all glosses specified in the my_abbreviations
dataset changed their values in the output below:
my_abbreviations <- data.frame(gloss = c("NPST", "HAB", "INF", "NEG"), definition_en = c("non-past tense", "habitual aspect", "infinitive", "negation marker")) make_gloss_list(my_abbreviations)
Unfortunately, some glosses can have multiple meanings in different traditions (e.g. r add_gloss("ASS")
can be either an associative plural or assertive mood). By default make_gloss_list()
shows only some entries that were chosen by the author of this package. You can see all the possibilities if you add the argument all_possible_variants = TRUE
. As you can see, there are multiple possible values for r add_gloss("AFF")
, r add_gloss("ASS")
, r add_gloss("CL")
, r add_gloss("IMP")
, r add_gloss("IN")
, r add_gloss("INS")
, and r add_gloss("PRF")
:
make_gloss_list(all_possible_variants = TRUE)
You can notice that problematic glosses (those which lack a definition or are duplicated) are colored. This can be switched off adding the argument annotate_problematic = FALSE
:
make_gloss_list(all_possible_variants = TRUE, annotate_problematic = FALSE)
In case you want to remove some glosses from the list, you can use the argument remove_glosses
:
make_gloss_list(remove_glosses = c("1SG", "3SG"))
It is really important that one should not treat the results of the make_gloss_list()
function as carved in stone: once it is compiled you can copy, modify and paste it in your document. You can try to spend time improving the output of the function, but at the final stage it is probably faster to correct it manually.
Right now there is no direct way of knitting lingglosses
to .docx
format. You can knit by adding an argument always_allow_html: true
to your yaml file, however the result will be not ideal. You can work around this by copying and pasting from the .html
version:
knitr::include_graphics("for_word_users.gif")
Both kniting to .pdf
and .docx
outputs are possible, but there are some known restrictions:
So if you want to avoid these problems, the best solution is to use one of the latex glossing packages listed in the first footnote and the package glossaries
for automatic compilation of glosses.
glosses_df
datasetAs mentioned above, the make_gloss_list()
function's definitions are based on the glosses_df
dataset.
str(glosses_df)
Most definitions are too general on purpose: r add_gloss("ASC")
, for example, is defined as associative
, which can be associative case, associative plural, associative mood, or associated motion. Since the user can easily replace the output with their own definitions, it is not a problem for the lingglosses
package. However, it will make things easier, and more comparable and reproducible, if linguists would create a unified database of glosses, similar to the concepticon [@list21a] (Max Ionov mentioned to me that this could be Ontologies of Linguistic Annotation). If you think that some glosses and definitions should be changed, do not hesitate to open an issue on the GitHub page of the lingglosses
project.
There have been several alternative to lingglosses
infrastructure for interlinear glossed examples that might be interesting for the reader:
Xigt
[@goodman15];pyigt
[@list21b].Only several of them (ODIN
, Xigt
, scription
and pyigt
) are attempts towards creating a standard for the databases of interlinear glossed examples. I also wanted to mention paper by [@round20], where authors provided a script for the automated identification and parsing of interlinear glossed text from scanned page images. The motivation for creating cross-linguistic database of interlinear glossed examples is the following:
The lingglosses
package make an attempt for going in this direction and provide an ability to extract examples in table format that can be further transformed into other formats. Each interlinear glossed example could be easily represented as a table using the convert_to_df()
function.
convert_to_df(transliteration = "bur-e-**ri** c'in-ne-sːu", glosses = "fly-NPST-**INF** know-HAB-NEG", free_translation = "I cannot fly.", comment = "(lit. do not know how to)", annotation = "Бурери цIиннессу.")
This table lists all the parameters that could be useful for a database, and has the following columns:
id
--- unique identifier through the whole table;example_id
--- unique identifier of particular examples;word_id
--- unique identifier of the word in the example (delimited with spaces and other punctuation);morpheme_id
--- unique identifier of the morpheme within the word (delimited with -
or =
);transliteration
--- language material;gloss
--- glosses;delimiter
--- delimiters: space, -
or =
transliteration_orig
--- original string with transliteration;glosses_orig
--- original string with glosses;free_translation
--- original string with the free translation;comment
--- original string with a comment;When you use the gloss_example()
function, a table of the structure described above is added to the database, so in the end you can extract it by saving the output of the get_examples_db()
function to the file:
get_examples_db()
Of course one can just use a subset of some columns:
unique(get_examples_db()[, c("example_id", "transliteration_orig", "glosses_orig")])
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.