sbc: Santa Barbara Corpus of Spoken American English

Description Usage Format Source

Description

A dataset containing the 15,475 utterances by 44 speakers of American English.

Usage

1

Format

A data frame with 15,475 rows and 13 variables:

id

ID for each speaker

name

Name of each speaker

gender

Gender of the speaker

age

Age of the speaker at recording

dialect

Dialect self-assessment for each speaker

dialect_state

State where each speaker was raised

current_state

State of residence for each speaker at recording

highest_edu

Highest educational degree obtained

years_edu

Number of years in the educational setting

occupation

Occupation of the speaker at recording

ethnicity

Ethnicity self-assessment for each speaker

utterance

Annotated transcription of a speaker's utterance

utterrance_clean

Simplified transcription of a speaker's utterance

Source

http://www.linguistics.ucsb.edu/research/santa-barbara-corpus


WFU-TLC/analyzr documentation built on June 4, 2019, 2:27 p.m.