model_wavernn: WaveRNN
In torchaudio: R Interface to 'pytorch''s 'torchaudio'

model_wavernn

R Documentation

WaveRNN

Description

WaveRNN model based on the implementation from fatchord. The original implementation was introduced in "Efficient Neural Audio Synthesis". #' Pass the input through the WaveRNN model.

Usage

model_wavernn(
  upsample_scales,
  n_classes,
  hop_length,
  n_res_block = 10,
  n_rnn = 512,
  n_fc = 512,
  kernel_size = 5,
  n_freq = 128,
  n_hidden = 128,
  n_output = 128
)

Arguments

`upsample_scales`	the list of upsample scales.
`n_classes`	the number of output classes.
`hop_length`	the number of samples between the starts of consecutive frames.
`n_res_block`	the number of ResBlock in stack. (Default: `10`)
`n_rnn`	the dimension of RNN layer. (Default: `512`)
`n_fc`	the dimension of fully connected layer. (Default: `512`)
`kernel_size`	the number of kernel size in the first Conv1d layer. (Default: `5`)
`n_freq`	the number of bins in a spectrogram. (Default: `128`)
`n_hidden`	the number of hidden dimensions of resblock. (Default: `128`)
`n_output`	the number of output dimensions of melresnet. (Default: `128`)

Details

forward param:

waveform the input waveform to the WaveRNN layer (n_batch, 1, (n_time - kernel_size + 1) * hop_length)

specgram the input spectrogram to the WaveRNN layer (n_batch, 1, n_freq, n_time)

The input channels of waveform and spectrogram have to be 1. The product of upsample_scales must equal hop_length.

Value

Tensor shape: (n_batch, 1, (n_time - kernel_size + 1) * hop_length, n_classes)

Examples

if(torch::torch_is_installed()) {
wavernn <- model_wavernn(upsample_scales=c(2,2,3), n_classes=5, hop_length=12)

waveform <- torch::torch_rand(3,1,(10 - 5 + 1)*12)
spectrogram <- torch::torch_rand(3,1,128,10)
# waveform shape:  (n_batch, n_channel, (n_time - kernel_size + 1) * hop_length)
output <- wavernn(waveform, spectrogram)
}

torchaudio documentation built on Feb. 16, 2023, 9:41 p.m.