cmu_ipa: IPA

cmu_ipaR Documentation

IPA

Description

A dataset containing the Carnegie-Mellon Pronouncing Dictionary (CMUDict). CMUDict includes all variations of the word - followed by 's etc as well, as it was designed to train text-to-speech systems. A great deal of the words in this dictionary are proper nouns, but all have been converted to lower case as they are provided in upper-case only. CMUDict uses the ARPABET for its transcription, so conversions to a couple of flavours of IPA are provided through quick-and-dirty phoneme translation, not through batch-downloads from an API etc. New Oxford translations DO NOT include stress-markers, as the translation was made at the phoneme-level, where New Oxford adds stress at the syllable-level. Wisdom ja-en translations include stress-markers, but are going to include more secondary stresses than the real dictionary, as that is how CMUDict behaves. This uses the most recent CMUDict that I could find - 0.7b, released in 2014

Usage

cmu_ipa

Format

A data frame with 133854 observations and 4 variables

token

the word

carnegie_mellon

the verbatim CMUDict

new_oxford_american

autotranslated IPA in the style of the New Oxford American dictionary (built-in to macOS)

wisdom

autotranslated IPA in the style of ウィズダム英和辞典

Source

http://svn.code.sf.net/p/cmusphinx/code/trunk/cmudict/

http://www.speech.cs.cmu.edu/cgi-bin/cmudict/


antdurrant/word.lists documentation built on July 20, 2023, 3:57 p.m.