knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
Coqui TTS is a text-to-speech (TTS) library that enables the conversion of regular text into speech and is completely free to use. This is not true of the other text to speech engines used by text2speech
.
Coqui TTS provides pre-trained tts and vocoder models as part of its package. To get a sense of the best tts and vocoder models, take a look at this GitHub Discussion post. In the Coqui TTS Hugging Face Space, you have the opportunity to experiment with a few of these models by inputting text and receiving corresponding audio output.
The underlying technology of text-to-speech is highly intricate and will not be the focus of this vignette. However, if you're interested in delving deeper into the subject, here are some recommended talks:
Coqui TTS includes pre-trained models like Spectogram models (such as Tacotron2 and FastSpeech2), End-to-End Models (including VITS and YourTTS), and Vocoder models (like MelGAN and WaveGRAD).
To install Coqui TTS, you will need to enter the following command in the terminal:
$ pip install TTS
Note: If you are using a Mac with an M1 chip, initial step is to execute the following command in terminal:
$ brew install mecab
Afterward, you can proceed to install TTS by executing the following command:
$ pip install TTS
library(text2speech)
To use Coqui TTS, text2speech needs to know the correct path to the Coqui TTS executable. This path can be obtained through two methods: manual and automatic.
You have the option to manually specify the path to the Coqui TTS executable in R. This can be done by setting a global option using the set_coqui_path()
function:
set_coqui_path("your/path/to/tts")
To determine the location of the Coqui TTS executable, you can enter the command which tts
in the terminal.
Internally, the set_coqui_path()
function runs options("path_to_coqui" = path)
to set the provided path as the value for the path_to_coqui
global option, as long as the Coqui TTS executable exists at that location.
The functions tts_auth(service = "coqui")
, tts_voices(service = "coqui")
, and tts(service = "coqui")
incorporate a way to search through a predetermined list of known locations for the Coqui TTS executable. If none of these paths yield a valid TTS executable, an error message will be generated, directing you to use set_coqui_path()
to manually set the correct path.
The function tts_voices(service = "coqui")
is a wrapper for the system command tts --list_models
, which lists the released Coqui TTS models.
tts_voices(service = "coqui")
The result is a tibble with the following columns: language
, dataset
, model_name
, and service
.
language
column contains the language code associated with the speaker. dataset
column indicates the specific dataset on which the text-to-speech model, denoted by model_name
, was trained.model_name
column refers to the name of the text-to-speech model.service
column refers to the specific TTS service used (Amazon, Google, Microsoft, or Coqui TTS)You can find a list of papers associated with some of the implemented models for Coqui TTS here.
By providing the values from this tibble (language
, dataset
, and model_name
) in tts()
, you can select the specific voice you want for text-to-speech synthesis.
To convert text to speech, you can use the function tts(text = "Hello world!", service = "coqui")
.
tts(text = "Hello world!", service = "coqui")
The result is a tibble with the following columns: index
, original_text
, text
, wav
, file
, audio_type
, duration
, and service
. Some of the noteworthy ones are:
text
: If the original_text
exceeds the character limit, text
represents the outcome of splitting original_text
. Otherwise, text
remains the same as original_text
.file
: The location where the audio output is saved. audio_type
: The format of the audio file, either mp3 or wav.By default, the function tts(service = "coqui")
uses the tacotron2-DDC_ph
model and the ljspeech/univnet
vocoder. You can specify a different model with the argument model_name
, or a different vocoder with the argument vocoder_name
.
tts(text = "Hello world, using a different voice!", service = "coqui", model_name = "fast_pitch", vocoder_name = "ljspeech/hifigan_v2")
Another default is that tts(service = "coqui")
saves the audio output in a temporary folder and its path is shown in the file
column of the resulting tibble. However, a temporary directory lasts only as long as the current R session, which means that when you restart your R session, that path will not exist!
A more sustainable workflow would be to save the audio output in a local folder. To save the audio output in a local folder, set the arguments save_local = TRUE
and save_local_dest = /full/path/to/local/folder
. Make sure to provide the full path to the local folder.
tts(text = "Hello world! I am saving the audio output in a local folder", service = "coqui", save_local = TRUE, save_local_dest = "/full/path/to/local/folder")
sessionInfo()
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.