A dataset containing measures of plosive strength
for instances of intervocalic Spanish /p/, /t/, /k/, /b/, /d/ and /g/.
The data are taken from read speech and informal interviews of 30 speakers in
Cuzco, Peru and 8 speakers in Lima, Peru; and from 18 speakers from
Valladolid, Spain in the task dialogues in the Spanish portion of the
Glissando Corpus (Garrido et al. 2013). If you analyze the
dataset in a publication, please cite Eager (2017) from the references section below.
A data frame with 5281 rows and 21 variables:
Total plosive duration, measured from preceding vowel intensity
maximum to following vowel intensity maximum, in milliseconds. Set to
0 for elided plosives.
Duration of the period of voicelessness in the
vowel-consonant-vowel sequence in milliseconds. Set to
fully voiced plosives and elided plosives.
Percentage of the consonant duration which was voiceless.
For non-elided plosives,
vpct = vdur / cdur, and for elided
vpct = 0.
The maximum intensity in the following vowel minus the
minimum intensity in the plosive, in decibels. Set to
The maximum rising velocity of the intensity contour between
the consonant minimum intensity and following vowel maximum intensity,
in decibels per millisecond. Set to
0 for elided plosives.
The underlying voicing of the plosive (Voiced or Voiceless).
Place of articulation (Bilabial, Dental, or Velar).
Syllabic stress context (Tonic, Post-Tonic, or Unstressed).
Preceding vowel phoneme identity (a, e, i, o, or u).
Following vowel phoneme identity (a, e, i, o, or u).
Position of the plosive in the word (Initial or Medial).
Number of times the word containing the plosive occurs in the CREA corpus (Real Academia Espanola).
Local speech rate around the consonant in nuclei per second (nuclei located using De Jong and Wempe's (2009) script).
Whether the speech was spontaneous (TRUE) or read (FALSE).
The city where the speaker is from (Cuzco, Lima, or Valladolid).
The speaker's sex (Female or Male).
The speaker's age group (Older or Younger) based on whether they were over 40 years old, or 40 years old or younger at the time of recording.
The speaker's highest level of education (Secondary or University).
The speaker's language background (Bilingual or Monolingual) based on whether they spoke only Spanish, or both Spanish and Quechua.
Speaker identifier (s01 through s56).
Read speech item identifier (i01 through i54). Set to
for spontaneous speech.
ptk dataset in the
standardize package is a subset of the
plosives dataset, but
with the speakers renumbered:
1 2 3 4 5 6 7 8 9 10 11
Eager, Christopher D. (2017). Contrast preservation and constraints on individual phonetic variation. Doctoral thesis. University of Illinois at Urbana-Champaign.
Garrido, J. M., Escudero, D., Aguilar, L., Cardenoso, V., Rodero, E., de-la-Mota, C., ... Bonafonte, A. (2013). Glissando: a corpus for multidisciplinary prosodic studies in Spanish and Catalan. Language Resources and Evaluation, 47(4), 945-971.
Real Academia Espanola. Corpus de referencia del espanol actual (CREA). Banco de Datos. http://www.rae.es
De Jong, N. H., & Wempe, T. (2009). Praat script to detect syllable nuclei and measure speech rate automatically. Behavior Research Methods, 41(2), 385-390.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.