cmu_ipa: IPA
In antdurrant/word.lists: A Collection Of Word Lists For ESL

cmu_ipa

R Documentation

IPA

Description

A dataset containing the Carnegie-Mellon Pronouncing Dictionary (CMUDict). CMUDict includes all variations of the word - followed by 's etc as well, as it was designed to train text-to-speech systems. A great deal of the words in this dictionary are proper nouns, but all have been converted to lower case as they are provided in upper-case only. CMUDict uses the ARPABET for its transcription, so conversions to a couple of flavours of IPA are provided through quick-and-dirty phoneme translation, not through batch-downloads from an API etc. New Oxford translations DO NOT include stress-markers, as the translation was made at the phoneme-level, where New Oxford adds stress at the syllable-level. Wisdom ja-en translations include stress-markers, but are going to include more secondary stresses than the real dictionary, as that is how CMUDict behaves. This uses the most recent CMUDict that I could find - 0.7b, released in 2014

Usage

cmu_ipa

Format

A data frame with 133854 observations and 4 variables

token: the word
carnegie_mellon: the verbatim CMUDict
new_oxford_american: autotranslated IPA in the style of the New Oxford American dictionary (built-in to macOS)
wisdom: autotranslated IPA in the style of ウィズダム英和辞典