buckeye: TD Deletion data from the Buckeye Corpus

Description Usage Format Details Source

Description

TD Deletion data from the Buckeye Corpus

Usage

1

Format

a data frame with 26 columns

Speaker

Speaker ID from the Buckeye corpus metadata

Recording

The recording id

Word

The word in which the token appeared

WordBegin

Time stamp for the word onset in the recording

WordEnd

Time stamp for the word offset in the recording

POS

The part of speech tag from the Buckeye data

seg

Whether the token in question was a /t/ or a /d/

SegTrans

How the final phone of the word was phonetically transcribed

PreSegTrans

The preceding segment in the cannonical transcription

FolSegTrans

The following segment in the cannonical transcription

DictNSyl

The number of syllables in maximum 8 word window surrounding the target word, based on the dictionary entries

NSyl

The number of syllables in a maximum 8 word window surrounding the target word, based on the phonetic transcription

PreWindow

Number of preceding words included in the window

PostWindow

Number of following words in the window

WindowBegin

Time stamp of the contextual window onset

WindowEnd

Time stamp of the contextual window offset

DictRate

Number of syllables per second, based on the number of syllables in the dictionary entries

Rate

Number of syllables per second, based on the phonetic transcription

FolWord

The word following the token

Context

The broder context in which the token was found

Gram

A finer grained coding of grammatical class

Gram2

A coarser grained coding of grammatical class

PreSeg

A coding of the preceding segment

FolSeg

A coding of the following segment

DepVar

A finer grained coding of the realization of the /t/ or /d/

td

A coarse grained coding of the /t/ or /d/ into a 1 or 0

Details

This data was automatically generated from the Buckeye corpus by comparing the canonical transcription for each word to its phonetic transcription in the corpus. It includes estimates for rate of speech (syllables per second), as well as two different sets of morphological coding.

The coding scheme for Gram is as follows

and

The word "and"

justT

Past tense and participial forms that just have a final [d] -> [t] change. specifically, "built", "sent", and "spent".

mono

Any word that doesn't have verbal morphology, and isn't a contraction.

nochange

No-change past tense forms. Specifically, "cost", "burst", "cast" and its contractions

nt

Not contraction, e.g. "don't".

past

Regular past tense

semiweakD

Verbs that have a stem change and add /d/. Specifically "heard","sold", "told", "unheard".

semiweakT

Verbs that have a stem change and add /t/. e.g. "felt", "kept"

stemchange

Verbs that have a stem change, and no apparent affix. Specifically "bound", "found", and "held"

went

The word "went"

The coding scheme for Gram2 is identical, except that it collapses the semiweakD and semiweakT categories from Gram.

Source

Pitt, M. A., Dilley, L., Johnson, K., Kiesling, S., Raymond, W., Hume, E., & Fosler-Lussier, E. (2007). Buckeye Corpus of Conversational Speech (2nd release). Columbus, OH. Retrieved from www.buckeyecorpus.osu.edu


JoFrhwld/languageVariationAndChangeData documentation built on May 7, 2019, 10:53 a.m.