The goal of text2speech is to harmonize various text-to-speech engines, including Amazon Polly, Coqui TTS, Google Cloud Text-to-Speech API, and Microsoft Cognitive Services Text to Speech REST API.
With the exception of Coqui TTS, all these engines are accessible as R packages:
You might notice Coqui TTS doesn’t have its own R package. This is because, at this time, text2speech directly incorporates the functionality of Coqui TTS. The R wrapper of Coqui is under development.
You can install this package from CRAN or the development version from GitHub with:
# Install from CRAN
install.packages("text2speech")
# or the development version from GitHub
# install.packages("devtools")
devtools::install_github("jhudsl/text2speech")
Check for authentication. If not already authenticated, users must individually configure it for each service.
library(text2speech)
# Amazon Polly
tts_auth("amazon")
#> [1] TRUE
# Coqui TTS
tts_auth("coqui")
#> [1] TRUE
# Google Cloud Text-to-Speech API
tts_auth("google")
#> [1] TRUE
# Microsoft Cognitive Services Text to Speech REST API
tts_auth("microsoft")
#> [1] TRUE
List different voice options for each service.
# Amazon Polly
voices_amazon <- tts_amazon_voices()
head(voices_amazon)
#> voice language language_code gender service
#> 1 Zeina Arabic arb Female amazon
#> 2 Zhiyu Chinese Mandarin cmn-CN Female amazon
#> 3 Naja Danish da-DK Female amazon
#> 4 Mads Danish da-DK Male amazon
#> 5 Ruben Dutch nl-NL Male amazon
#> 6 Lotte Dutch nl-NL Female amazon
# Coqui TTS
voices_coqui <- tts_coqui_voices()
#> ℹ Test out different voices on the CoquiTTS Demo (<https://huggingface.co/spaces/coqui/CoquiTTS>)
head(voices_coqui)
#> # A tibble: 6 × 5
#> type language dataset model_name service
#> <chr> <chr> <chr> <chr> <chr>
#> 1 tts_models multilingual multi-dataset your_tts coqui
#> 2 tts_models multilingual multi-dataset bark coqui
#> 3 tts_models bg cv vits coqui
#> 4 tts_models cs cv vits coqui
#> 5 tts_models da cv vits coqui
#> 6 tts_models et cv vits coqui
# Google Cloud Text-to-Speech API
voices_google <- tts_google_voices()
head(voices_google)
#> voice language language_code gender service
#> 1 af-ZA-Standard-A <NA> af-ZA FEMALE google
#> 2 af-ZA-Standard-A <NA> af-ZA FEMALE google
#> 3 ar-XA-Wavenet-C Arabic ar-XA MALE google
#> 4 ar-XA-Standard-C Arabic ar-XA MALE google
#> 5 ar-XA-Standard-D Arabic ar-XA FEMALE google
#> 6 ar-XA-Wavenet-A Arabic ar-XA FEMALE google
# Microsoft Cognitive Services Text to Speech REST API
voices_microsoft <- tts_microsoft_voices()
head(voices_microsoft)
#> voice
#> 1 Microsoft Server Speech Text to Speech Voice (af-ZA, AdriNeural)
#> 2 Microsoft Server Speech Text to Speech Voice (af-ZA, WillemNeural)
#> 3 Microsoft Server Speech Text to Speech Voice (am-ET, MekdesNeural)
#> 4 Microsoft Server Speech Text to Speech Voice (am-ET, AmehaNeural)
#> 5 Microsoft Server Speech Text to Speech Voice (ar-AE, FatimaNeural)
#> 6 Microsoft Server Speech Text to Speech Voice (ar-AE, HamdanNeural)
#> language language_code gender service
#> 1 Afrikaans (South Africa) af-ZA Female microsoft
#> 2 Afrikaans (South Africa) af-ZA Male microsoft
#> 3 Amharic (Ethiopia) am-ET Female microsoft
#> 4 Amharic (Ethiopia) am-ET Male microsoft
#> 5 Arabic (United Arab Emirates) ar-AE Female microsoft
#> 6 Arabic (United Arab Emirates) ar-AE Male microsoft
Synthesize speech with tts(text = "TEXT", service = "ENGINE")
# Amazon Polly
tts("Hello world!", service = "amazon")
# Coqui TTS
tts("Hello world!", service = "coqui")
# Google Cloud Text-to-Speech API
tts("Hello world!", service = "google")
# Microsoft Cognitive Services Text to Speech REST API
tts("Hello world!", service = "microsoft")
The resulting output will consist of a standardized tibble featuring the following columns:
index
: Sequential identifier numberoriginal_text
: The text input provided by the usertext
: In case original_text
exceeds the character limit, text
represents the outcome of splitting original_text
. Otherwise, text
remains the same as original_text
.wav
: Wave object (S4 class)file
: File path to the audio fileaudio_type
: The audio format, either mp3 or wavduration
: The duration of the audio fileservice
: The text-to-speech engine usedAny scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.