layer_mel_spectrogram | R Documentation |
This layer takes float32
/float64
single or batched audio signal as
inputs and computes the Mel spectrogram using Short-Time Fourier Transform
and Mel scaling. The input should be a 1D (unbatched) or 2D (batched) tensor
representing audio signals. The output will be a 2D or 3D tensor
representing Mel spectrograms.
A spectrogram is an image-like representation that shows the frequency spectrum of a signal over time. It uses x-axis to represent time, y-axis to represent frequency, and each pixel to represent intensity. Mel spectrograms are a special type of spectrogram that use the mel scale, which approximates how humans perceive sound. They are commonly used in speech and music processing tasks like speech recognition, speaker identification, and music genre classification.
layer_mel_spectrogram(
object,
fft_length = 2048L,
sequence_stride = 512L,
sequence_length = NULL,
window = "hann",
sampling_rate = 16000L,
num_mel_bins = 128L,
min_freq = 20,
max_freq = NULL,
power_to_db = TRUE,
top_db = 80,
mag_exp = 2,
min_power = 1e-10,
ref_power = 1,
...
)
object |
Object to compose the layer with. A tensor, array, or sequential model. |
fft_length |
Integer, size of the FFT window. |
sequence_stride |
Integer, number of samples between successive STFT columns. |
sequence_length |
Integer, size of the window used for applying
|
window |
String, name of the window function to use. Available values
are |
sampling_rate |
Integer, sample rate of the input signal. |
num_mel_bins |
Integer, number of mel bins to generate. |
min_freq |
Float, minimum frequency of the mel bins. |
max_freq |
Float, maximum frequency of the mel bins.
If |
power_to_db |
If TRUE, convert the power spectrogram to decibels. |
top_db |
Float, minimum negative cut-off |
mag_exp |
Float, exponent for the magnitude spectrogram. 1 for magnitude, 2 for power, etc. Default is 2. |
min_power |
Float, minimum value for power and |
ref_power |
Float, the power is scaled relative to it
|
... |
For forward/backward compatability. |
The return value depends on the value provided for the first argument.
If object
is:
a keras_model_sequential()
, then the layer is added to the sequential model
(which is modified in place). To enable piping, the sequential model is also
returned, invisibly.
a keras_input()
, then the output tensor from calling layer(input)
is returned.
NULL
or missing, then a Layer
instance is returned.
Unbatched audio signal
layer <- layer_mel_spectrogram( num_mel_bins = 64, sampling_rate = 8000, sequence_stride = 256, fft_length = 2048 ) layer(random_uniform(shape = c(16000))) |> shape()
Batched audio signal
layer <- layer_mel_spectrogram( num_mel_bins = 80, sampling_rate = 8000, sequence_stride = 128, fft_length = 2048 ) layer(random_uniform(shape = c(2, 16000))) |> shape()
1D (unbatched) or 2D (batched) tensor with shape:(..., samples)
.
2D (unbatched) or 3D (batched) tensor with
shape:(..., num_mel_bins, time)
.
Other audio preprocessing layers:
layer_stft_spectrogram()
Other preprocessing layers:
layer_auto_contrast()
layer_category_encoding()
layer_center_crop()
layer_discretization()
layer_equalization()
layer_feature_space()
layer_hashed_crossing()
layer_hashing()
layer_integer_lookup()
layer_max_num_bounding_boxes()
layer_mix_up()
layer_normalization()
layer_rand_augment()
layer_random_brightness()
layer_random_color_degeneration()
layer_random_color_jitter()
layer_random_contrast()
layer_random_crop()
layer_random_flip()
layer_random_grayscale()
layer_random_hue()
layer_random_posterization()
layer_random_rotation()
layer_random_saturation()
layer_random_sharpness()
layer_random_shear()
layer_random_translation()
layer_random_zoom()
layer_rescaling()
layer_resizing()
layer_solarization()
layer_stft_spectrogram()
layer_string_lookup()
layer_text_vectorization()
Other layers:
Layer()
layer_activation()
layer_activation_elu()
layer_activation_leaky_relu()
layer_activation_parametric_relu()
layer_activation_relu()
layer_activation_softmax()
layer_activity_regularization()
layer_add()
layer_additive_attention()
layer_alpha_dropout()
layer_attention()
layer_auto_contrast()
layer_average()
layer_average_pooling_1d()
layer_average_pooling_2d()
layer_average_pooling_3d()
layer_batch_normalization()
layer_bidirectional()
layer_category_encoding()
layer_center_crop()
layer_concatenate()
layer_conv_1d()
layer_conv_1d_transpose()
layer_conv_2d()
layer_conv_2d_transpose()
layer_conv_3d()
layer_conv_3d_transpose()
layer_conv_lstm_1d()
layer_conv_lstm_2d()
layer_conv_lstm_3d()
layer_cropping_1d()
layer_cropping_2d()
layer_cropping_3d()
layer_dense()
layer_depthwise_conv_1d()
layer_depthwise_conv_2d()
layer_discretization()
layer_dot()
layer_dropout()
layer_einsum_dense()
layer_embedding()
layer_equalization()
layer_feature_space()
layer_flatten()
layer_flax_module_wrapper()
layer_gaussian_dropout()
layer_gaussian_noise()
layer_global_average_pooling_1d()
layer_global_average_pooling_2d()
layer_global_average_pooling_3d()
layer_global_max_pooling_1d()
layer_global_max_pooling_2d()
layer_global_max_pooling_3d()
layer_group_normalization()
layer_group_query_attention()
layer_gru()
layer_hashed_crossing()
layer_hashing()
layer_identity()
layer_integer_lookup()
layer_jax_model_wrapper()
layer_lambda()
layer_layer_normalization()
layer_lstm()
layer_masking()
layer_max_num_bounding_boxes()
layer_max_pooling_1d()
layer_max_pooling_2d()
layer_max_pooling_3d()
layer_maximum()
layer_minimum()
layer_mix_up()
layer_multi_head_attention()
layer_multiply()
layer_normalization()
layer_permute()
layer_rand_augment()
layer_random_brightness()
layer_random_color_degeneration()
layer_random_color_jitter()
layer_random_contrast()
layer_random_crop()
layer_random_flip()
layer_random_grayscale()
layer_random_hue()
layer_random_posterization()
layer_random_rotation()
layer_random_saturation()
layer_random_sharpness()
layer_random_shear()
layer_random_translation()
layer_random_zoom()
layer_repeat_vector()
layer_rescaling()
layer_reshape()
layer_resizing()
layer_rnn()
layer_separable_conv_1d()
layer_separable_conv_2d()
layer_simple_rnn()
layer_solarization()
layer_spatial_dropout_1d()
layer_spatial_dropout_2d()
layer_spatial_dropout_3d()
layer_spectral_normalization()
layer_stft_spectrogram()
layer_string_lookup()
layer_subtract()
layer_text_vectorization()
layer_tfsm()
layer_time_distributed()
layer_torch_module_wrapper()
layer_unit_normalization()
layer_upsampling_1d()
layer_upsampling_2d()
layer_upsampling_3d()
layer_zero_padding_1d()
layer_zero_padding_2d()
layer_zero_padding_3d()
rnn_cell_gru()
rnn_cell_lstm()
rnn_cell_simple()
rnn_cells_stack()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.