buckeye: TD Deletion data from the Buckeye Corpus
In JoFrhwld/languageVariationAndChangeData: Language Variation and Change Data

Description Usage Format Details Source

TD Deletion data from the Buckeye Corpus

buckeye

a data frame with 26 columns

Speaker: Speaker ID from the Buckeye corpus metadata
Recording: The recording id
Word: The word in which the token appeared
WordBegin: Time stamp for the word onset in the recording
WordEnd: Time stamp for the word offset in the recording
POS: The part of speech tag from the Buckeye data
seg: Whether the token in question was a /t/ or a /d/
SegTrans: How the final phone of the word was phonetically transcribed
PreSegTrans: The preceding segment in the cannonical transcription
FolSegTrans: The following segment in the cannonical transcription
DictNSyl: The number of syllables in maximum 8 word window surrounding the target word, based on the dictionary entries
NSyl: The number of syllables in a maximum 8 word window surrounding the target word, based on the phonetic transcription
PreWindow: Number of preceding words included in the window
PostWindow: Number of following words in the window
WindowBegin: Time stamp of the contextual window onset
WindowEnd: Time stamp of the contextual window offset
DictRate: Number of syllables per second, based on the number of syllables in the dictionary entries
Rate: Number of syllables per second, based on the phonetic transcription
FolWord: The word following the token
Context: The broder context in which the token was found
Gram: A finer grained coding of grammatical class
Gram2: A coarser grained coding of grammatical class
PreSeg: A coding of the preceding segment
FolSeg: A coding of the following segment
DepVar: A finer grained coding of the realization of the /t/ or /d/
td: A coarse grained coding of the /t/ or /d/ into a 1 or 0

This data was automatically generated from the Buckeye corpus by comparing the canonical transcription for each word to its phonetic transcription in the corpus. It includes estimates for rate of speech (syllables per second), as well as two different sets of morphological coding.

The coding scheme for Gram is as follows

and: The word "and"
justT: Past tense and participial forms that just have a final [d] -> [t] change. specifically, "built", "sent", and "spent".
mono: Any word that doesn't have verbal morphology, and isn't a contraction.
nochange: No-change past tense forms. Specifically, "cost", "burst", "cast" and its contractions
nt: Not contraction, e.g. "don't".
past: Regular past tense
semiweakD: Verbs that have a stem change and add /d/. Specifically "heard","sold", "told", "unheard".
semiweakT: Verbs that have a stem change and add /t/. e.g. "felt", "kept"
stemchange: Verbs that have a stem change, and no apparent affix. Specifically "bound", "found", and "held"
went: The word "went"

The coding scheme for Gram2 is identical, except that it collapses the semiweakD and semiweakT categories from Gram.

Pitt, M. A., Dilley, L., Johnson, K., Kiesling, S., Raymond, W., Hume, E., & Fosler-Lussier, E. (2007). Buckeye Corpus of Conversational Speech (2nd release). Columbus, OH. Retrieved from www.buckeyecorpus.osu.edu

JoFrhwld/languageVariationAndChangeData documentation built on May 7, 2019, 10:53 a.m.