plosives: Spanish intervocalic plosives.
In CDEager/nauf: Regression with NA Values in Unordered Factors

Description Usage Format Note References

A dataset containing measures of plosive strength for instances of intervocalic Spanish /p/, /t/, /k/, /b/, /d/ and /g/. The data are taken from read speech and informal interviews of 30 speakers in Cuzco, Peru and 8 speakers in Lima, Peru; and from 18 speakers from Valladolid, Spain in the task dialogues in the Spanish portion of the Glissando Corpus (Garrido et al. 2013). If you analyze the plosives dataset in a publication, please cite Eager (2017) from the references section below.

plosives

A data frame with 5281 rows and 21 variables:

cdur: Total plosive duration, measured from preceding vowel intensity maximum to following vowel intensity maximum, in milliseconds. Set to 0 for elided plosives.
vdur: Duration of the period of voicelessness in the vowel-consonant-vowel sequence in milliseconds. Set to 0 for fully voiced plosives and elided plosives.
vpct: Percentage of the consonant duration which was voiceless. For non-elided plosives, vpct = vdur / cdur, and for elided plosives, vpct = 0.
intdiff: The maximum intensity in the following vowel minus the minimum intensity in the plosive, in decibels. Set to 0 for elided plosives.
intvel: The maximum rising velocity of the intensity contour between the consonant minimum intensity and following vowel maximum intensity, in decibels per millisecond. Set to 0 for elided plosives.
voicing: The underlying voicing of the plosive (Voiced or Voiceless).
place: Place of articulation (Bilabial, Dental, or Velar).
stress: Syllabic stress context (Tonic, Post-Tonic, or Unstressed).
prevowel: Preceding vowel phoneme identity (a, e, i, o, or u).
posvowel: Following vowel phoneme identity (a, e, i, o, or u).
wordpos: Position of the plosive in the word (Initial or Medial).
wordfreq: Number of times the word containing the plosive occurs in the CREA corpus (Real Academia Espanola).
speechrate: Local speech rate around the consonant in nuclei per second (nuclei located using De Jong and Wempe's (2009) script).
spont: Whether the speech was spontaneous (TRUE) or read (FALSE).
dialect: The city where the speaker is from (Cuzco, Lima, or Valladolid).
sex: The speaker's sex (Female or Male).
age: The speaker's age group (Older or Younger) based on whether they were over 40 years old, or 40 years old or younger at the time of recording.
ed: The speaker's highest level of education (Secondary or University).
ling: The speaker's language background (Bilingual or Monolingual) based on whether they spoke only Spanish, or both Spanish and Quechua.
speaker: Speaker identifier (s01 through s56).
item: Read speech item identifier (i01 through i54). Set to NA for spontaneous speech.

The ptk dataset in the standardize package is a subset of the plosives dataset, but with the speakers renumbered:

d <- droplevels(subset(plosives,
  dialect == "Valladolid" & voicing == "Voiceless"))

levels(d$speaker)  # s39 to s56
levels(ptk$speaker)  # s01 to s18

levels(d$speaker) <- levels(ptk$speaker)
d <- d[, colnames(ptk)]
rownames(d) <- NULL

all.equal(d, ptk)  # TRUE

Eager, Christopher D. (2017). Contrast preservation and constraints on individual phonetic variation. Doctoral thesis. University of Illinois at Urbana-Champaign.

Garrido, J. M., Escudero, D., Aguilar, L., Cardenoso, V., Rodero, E., de-la-Mota, C., ... Bonafonte, A. (2013). Glissando: a corpus for multidisciplinary prosodic studies in Spanish and Catalan. Language Resources and Evaluation, 47(4), 945-971.

Real Academia Espanola. Corpus de referencia del espanol actual (CREA). Banco de Datos. http://www.rae.es

De Jong, N. H., & Wempe, T. (2009). Praat script to detect syllable nuclei and measure speech rate automatically. Behavior Research Methods, 41(2), 385-390.

CDEager/nauf documentation built on May 6, 2019, 9:24 a.m.