ml_vocabulary: Generate participant information and progress for each...
In gongcastro/multilex: Multilingual lexical assessment using online surveys

ml_vocabulary

R Documentation

Generate participant information and progress for each response

Description

This function generates a data frame with the vocabulary of each participant (keeping longitudinal data from the same participant in different rows). Comprehensive and productive vocabulary sizes are computed as raw counts (vocab_count) and as proportions vocab_prop, calculated from the total of items filled by the participant in the response vocab_n).

Usage

ml_vocabulary(
  participants = NULL,
  responses = NULL,
  by = NULL,
  scale = "count"
)

Arguments

`participants`	Participants data frame, as generated by `ml_participants`. If NULL (default), `ml_participants` is run.
`responses`	Responses data frame, as generated by `ml_responses`. If NULL (default), `ml_responses` is run.
`by`	A character vector that takes the name of the variable(s) to group data into. Vocabulary metrics will be calculated by aggregating responses within the groups that result from the combination of crossing of the variables provided in `by`. This variables can refer to item properties (see `pool`, e.g., "category") or to participant properties (see `ml_logs()`, e.g., "dominance").
`scale`	A character vector that takes the value "count" and/or "prop". If "count" (default) vocabulary metrics are reported as counts (number of words). If "prop", vocabulary metrics are calculated as proportions?

Value

A dataset (actually, a tibble) with each participant's comprehensive and/or vocabulary size in each language. This data frame contains the following variables:

id: a character string indicating a participant's identifier. This value is always the same for each participant, so that different responses from the same participant share the same id.
time: a numeric value indicating how many times a given participant has been sent the questionnaire, regardless of whether they completed it or not.
age: a numeric value indicating the number of months elapsed since participants' birth date until they filled in the last item of their questionnaire response.
type: a character string indicating the vocabulary type computed: "understands" if option Understands was selected, and "produces" if option Understands & Says was selected.
vocab_count_total: integer indicating the number of items selected as Understands or Understands and Says in both languages.
vocab_count_dominance_l1: positive integer indicating the number of items selected as Understands or Understands and Says in the dominant language (L1).
vocab_count_dominance_l2: positive integer indicating the number of items selected as Understands or Understands and Says in the non-dominant language (L2).
vocab_count_conceptual: positive integer indicating the number of translation equivalents (aka. cross-language synonyms or doublets) in which at list one of the items was selected as Understands or Understands and Says. This is a measure of the number of lexicalised concepts.
vocab_count_te: positive integer indicating the number of translation equivalents (out of the total number of items the participant answered to) in which at both items was selected as Understands or Understands and Says. This is a measure of the number of lexicalised concepts.
vocab_prop_total: numeric value ranging from 0 to 1 (both included) indicating the proportion of items selected as Understands or Understands and Says in both languages.
vocab_prop_dominance_l1: numeric value ranging from 0 to 1 (both included) indicating the proportion of of items selected as Understands or Understands and Says in the dominant language (L1).
vocab_prop_dominance_l2: numeric value ranging from 0 to 1 (both included) indicating the proportion of of items selected as Understands or Understands and Says in the non-dominant language (L2).
vocab_prop_conceptual: numeric value ranging from 0 to 1 (both included) indicating the proportion of of translation equivalents (aka. cross-language synonyms or doublets) in which at list one of the items was selected as Understands or Understands and Says. This is a measure of the number of lexicalised concepts.
vocab_prop_te: numeric value ranging from 0 to 1 (both included) indicating the proportion of of translation equivalents (aka. cross-language synonyms or doublets) in which at both items was selected as Understands or Understands and Says. This is a measure of the number of lexicalised concepts.