nn_ctc_loss | R Documentation |
Calculates loss between a continuous (unsegmented) time series and a target sequence. CTCLoss sums over the
probability of possible alignments of input to target, producing a loss value which is differentiable
with respect to each input node. The alignment of input to target is assumed to be "many-to-one", which
limits the length of the target sequence such that it must be \leq
the input length.
nn_ctc_loss(blank = 0, reduction = "mean", zero_infinity = FALSE)
blank |
(int, optional): blank label. Default |
reduction |
(string, optional): Specifies the reduction to apply to the output:
|
zero_infinity |
(bool, optional):
Whether to zero infinite losses and the associated gradients.
Default: |
Log_probs: Tensor of size (T, N, C)
,
where T = \mbox{input length}
,
N = \mbox{batch size}
, and
C = \mbox{number of classes (including blank)}
.
The logarithmized probabilities of the outputs (e.g. obtained with
[nnf)log_softmax()]).
Targets: Tensor of size (N, S)
or
(\mbox{sum}(\mbox{target\_lengths}))
,
where N = \mbox{batch size}
and
S = \mbox{max target length, if shape is } (N, S)
.
It represent the target sequences. Each element in the target
sequence is a class index. And the target index cannot be blank (default=0).
In the (N, S)
form, targets are padded to the
length of the longest sequence, and stacked.
In the (\mbox{sum}(\mbox{target\_lengths}))
form,
the targets are assumed to be un-padded and
concatenated within 1 dimension.
Input_lengths: Tuple or tensor of size (N)
,
where N = \mbox{batch size}
. It represent the lengths of the
inputs (must each be \leq T
). And the lengths are specified
for each sequence to achieve masking under the assumption that sequences
are padded to equal lengths.
Target_lengths: Tuple or tensor of size (N)
,
where N = \mbox{batch size}
. It represent lengths of the targets.
Lengths are specified for each sequence to achieve masking under the
assumption that sequences are padded to equal lengths. If target shape is
(N,S)
, target_lengths are effectively the stop index
s_n
for each target sequence, such that target_n = targets[n,0:s_n]
for
each target in a batch. Lengths must each be \leq S
If the targets are given as a 1d tensor that is the concatenation of individual
targets, the target_lengths must add up to the total length of the tensor.
Output: scalar. If reduction
is 'none'
, then
(N)
, where N = \mbox{batch size}
.
[nnf)log_softmax()]: R:nnf)log_softmax() [n,0:s_n]: R:n,0:s_n
In order to use CuDNN, the following must be satisfied: targets
must be
in concatenated format, all input_lengths
must be T
. blank=0
,
target_lengths
\leq 256
, the integer arguments must be of
The regular implementation uses the (more common in PyTorch) torch_long
dtype.
dtype torch_int32
.
In some circumstances when using the CUDA backend with CuDNN, this operator
may select a nondeterministic algorithm to increase performance. If this is
undesirable, you can try to make the operation deterministic (potentially at
a performance cost) by setting torch.backends.cudnn.deterministic = TRUE
.
A. Graves et al.: Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks: https://www.cs.toronto.edu/~graves/icml_2006.pdf
if (torch_is_installed()) {
# Target are to be padded
T <- 50 # Input sequence length
C <- 20 # Number of classes (including blank)
N <- 16 # Batch size
S <- 30 # Target sequence length of longest target in batch (padding length)
S_min <- 10 # Minimum target length, for demonstration purposes
# Initialize random batch of input vectors, for *size = (T,N,C)
input <- torch_randn(T, N, C)$log_softmax(2)$detach()$requires_grad_()
# Initialize random batch of targets (0 = blank, 1:C = classes)
target <- torch_randint(low = 1, high = C, size = c(N, S), dtype = torch_long())
input_lengths <- torch_full(size = c(N), fill_value = TRUE, dtype = torch_long())
target_lengths <- torch_randint(low = S_min, high = S, size = c(N), dtype = torch_long())
ctc_loss <- nn_ctc_loss()
loss <- ctc_loss(input, target, input_lengths, target_lengths)
loss$backward()
# Target are to be un-padded
T <- 50 # Input sequence length
C <- 20 # Number of classes (including blank)
N <- 16 # Batch size
# Initialize random batch of input vectors, for *size = (T,N,C)
input <- torch_randn(T, N, C)$log_softmax(2)$detach()$requires_grad_()
input_lengths <- torch_full(size = c(N), fill_value = TRUE, dtype = torch_long())
# Initialize random batch of targets (0 = blank, 1:C = classes)
target_lengths <- torch_randint(low = 1, high = T, size = c(N), dtype = torch_long())
target <- torch_randint(
low = 1, high = C, size = as.integer(sum(target_lengths)),
dtype = torch_long()
)
ctc_loss <- nn_ctc_loss()
loss <- ctc_loss(input, target, input_lengths, target_lengths)
loss$backward()
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.