knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "man/figures/README-", out.width = "100%" )
The goal of text2speech is to harmonize various text-to-speech engines, including Amazon Polly, Coqui TTS, Google Cloud Text-to-Speech API, and Microsoft Cognitive Services Text to Speech REST API.
With the exception of Coqui TTS, all these engines are accessible as R packages:
You might notice Coqui TTS doesn't have its own R package. This is because, at this time, text2speech directly incorporates the functionality of Coqui TTS. The R wrapper of Coqui is under development.
You can install this package from CRAN or the development version from GitHub with:
# Install from CRAN install.packages("text2speech") # or the development version from GitHub # install.packages("devtools") devtools::install_github("jhudsl/text2speech")
Check for authentication. If not already authenticated, users must individually configure it for each service.
library(text2speech) # Amazon Polly tts_auth("amazon") # Coqui TTS tts_auth("coqui") # Google Cloud Text-to-Speech API tts_auth("google") # Microsoft Cognitive Services Text to Speech REST API tts_auth("microsoft")
List different voice options for each service.
# Amazon Polly voices_amazon <- tts_amazon_voices() head(voices_amazon) # Coqui TTS voices_coqui <- tts_coqui_voices() head(voices_coqui) # Google Cloud Text-to-Speech API voices_google <- tts_google_voices() head(voices_google) # Microsoft Cognitive Services Text to Speech REST API voices_microsoft <- tts_microsoft_voices() head(voices_microsoft)
Synthesize speech with tts(text = "TEXT", service = "ENGINE")
# Amazon Polly tts("Hello world!", service = "amazon") # Coqui TTS tts("Hello world!", service = "coqui") # Google Cloud Text-to-Speech API tts("Hello world!", service = "google") # Microsoft Cognitive Services Text to Speech REST API tts("Hello world!", service = "microsoft")
The resulting output will consist of a standardized tibble featuring the following columns:
index
: Sequential identifier numberoriginal_text
: The text input provided by the usertext
: In case original_text
exceeds the character limit, text
represents the outcome of splitting original_text
. Otherwise, text
remains the same as original_text
.wav
: Wave object (S4 class)file
: File path to the audio fileaudio_type
: The audio format, either mp3 or wavduration
: The duration of the audio fileservice
: The text-to-speech engine usedAdd the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.