In jhudsl/text2speech: Text to Speech Conversion

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)

Overview

The goal of text2speech is to harmonize various text-to-speech engines, including Amazon Polly, Coqui TTS, Google Cloud Text-to-Speech API, and Microsoft Cognitive Services Text to Speech REST API.

With the exception of Coqui TTS, all these engines are accessible as R packages:

aws.polly is a client for Amazon Polly
googleLanguageR is a client to the Google Cloud Text-to-Speech API
conrad is a client to the Microsoft Cognitive Services Text to Speech REST API

You might notice Coqui TTS doesn't have its own R package. This is because, at this time, text2speech directly incorporates the functionality of Coqui TTS. The R wrapper of Coqui is under development.

Installation

You can install this package from CRAN or the development version from GitHub with:

# Install from CRAN
install.packages("text2speech")

# or the development version from GitHub
# install.packages("devtools")
devtools::install_github("jhudsl/text2speech")

Authentication

Check for authentication. If not already authenticated, users must individually configure it for each service.

library(text2speech)

# Amazon Polly
tts_auth("amazon")
# Coqui TTS
tts_auth("coqui")
# Google Cloud Text-to-Speech API 
tts_auth("google")
# Microsoft Cognitive Services Text to Speech REST API
tts_auth("microsoft")

Voices

List different voice options for each service.

# Amazon Polly
voices_amazon <- tts_amazon_voices()
head(voices_amazon)

# Coqui TTS
voices_coqui <- tts_coqui_voices()
head(voices_coqui)

# Google Cloud Text-to-Speech API 
voices_google <- tts_google_voices()
head(voices_google)

# Microsoft Cognitive Services Text to Speech REST API
voices_microsoft <- tts_microsoft_voices()
head(voices_microsoft)

Convert text to speech

Synthesize speech with tts(text = "TEXT", service = "ENGINE")

# Amazon Polly
tts("Hello world!", service = "amazon")

# Coqui TTS
tts("Hello world!", service = "coqui")

# Google Cloud Text-to-Speech API 
tts("Hello world!", service = "google")

# Microsoft Cognitive Services Text to Speech REST API
tts("Hello world!", service = "microsoft")

The resulting output will consist of a standardized tibble featuring the following columns:

index: Sequential identifier number
original_text: The text input provided by the user
text: In case original_text exceeds the character limit, text represents the outcome of splitting original_text. Otherwise, text remains the same as original_text.
wav: Wave object (S4 class)
file: File path to the audio file
audio_type: The audio format, either mp3 or wav
duration : The duration of the audio file
service: The text-to-speech engine used