NOT_CRAN <- identical(tolower(Sys.getenv("NOT_CRAN")), "true")

knitr::opts_chunk$set(
  purl = NOT_CRAN,
  collapse = TRUE,
  comment = "#>"
)

keyring::key_set_with_value(
  "multilex",
  "gonzalo.garciadecastro@upf.edu", 
  Sys.getenv("ML_KEY")
)
library(multilex)
my_email <- "gonzalo.garciadecastro@upf.edu"
ml_connect(google_email = my_email)

The ml_vocabulary function allows to extract vocabulary sizes for individual responses to any of the questionnaires:

ml_connect()
p <- ml_participants()
r <- ml_responses()
ml_vocabulary(participants = p, responses = r)

Vocabulary sizes are, by default, computed in two different scales:

By default, four modalities of vocabulary size are computed:

Vocabulary sizes are also computed in two types:

Vocabulary size as counts

This is what the default output looks like:

library(multilex)
ml_connect()
p <- ml_participants()
r <- ml_responses(update = FALSE)
ml_vocabulary(participants = p, responses = r)

This data frame includes two rows per response: one for comprehensive vocabulary and one for productive vocabulary, and includes the following columns:

Vocabulary size as proportions

This is what the output looks like when scale = "prop":

ml_vocabulary(p, r, scale = "prop")

This data frame follows a similar structure to the one returned by ml_vocabulary when run with default arguments, but vocabulary sizes are now expressed as proportions:

We can also ask for vocabulary sizes expressed in both scales (counts and proportions):

ml_vocabulary(p, r, scale = c("count", "prop"))

Conditional vocabulary size: the by argument

We can also compute vocabulary sizes conditional to some variables at the item or participant level, such as semantic/functional category (category), cognate status (cognate) or language profile (lp), using the argument by. Just take a look the variables included i nthe data frames returned by ml_participants() or in the pool of items. You can use this argument as:

ml_vocabulary(p, r, by = "dominance")

This data frame follows a similar structure as the ones above, but preserves a column for the variable category, which indexes that functiona/semantic category the items belongs to. The value of this argument is passed to dplyr's group_by under the hood. As with group_by, you can compute vocabulary sizes for combinations of variables:

ml_vocabulary(p, r, by = c("dominance", "lp"))


gongcastro/multilex documentation built on Oct. 21, 2022, 6:24 p.m.