View source: R/dataset_methods.R
dataset_bucket_by_sequence_length | R Documentation |
Dataset
by lengthA transformation that buckets elements in a Dataset
by length
dataset_bucket_by_sequence_length(
dataset,
element_length_func,
bucket_boundaries,
bucket_batch_sizes,
padded_shapes = NULL,
padding_values = NULL,
pad_to_bucket_boundary = FALSE,
no_padding = FALSE,
drop_remainder = FALSE,
name = NULL
)
dataset |
A |
element_length_func |
function from element in |
bucket_boundaries |
integers, upper length boundaries of the buckets. |
bucket_batch_sizes |
integers, batch size per bucket. Length should be
|
padded_shapes |
Nested structure of |
padding_values |
Values to pad with, passed to
|
pad_to_bucket_boundary |
bool, if |
no_padding |
boolean, indicates whether to pad the batch features (features
need to be either of type |
drop_remainder |
(Optional.) A logical scalar, representing
whether the last batch should be dropped in the case it has fewer than
|
name |
(Optional.) A name for the tf.data operation. |
Elements of the Dataset
are grouped together by length and then are padded
and batched.
This is useful for sequence tasks in which the elements have variable length. Grouping together elements that have similar lengths reduces the total fraction of padding in a batch which increases training step efficiency.
Below is an example to bucketize the input data to the 3 buckets "[0, 3), [3, 5), [5, Inf)" based on sequence length, with batch size 2.
## Not run:
dataset <- list(c(0),
c(1, 2, 3, 4),
c(5, 6, 7),
c(7, 8, 9, 10, 11),
c(13, 14, 15, 16, 17, 18, 19, 20),
c(21, 22)) %>%
lapply(as.array) %>% lapply(as_tensor, "int32") %>%
lapply(tensors_dataset) %>%
Reduce(dataset_concatenate, .)
dataset %>%
dataset_bucket_by_sequence_length(
element_length_func = function(elem) tf$shape(elem)[1],
bucket_boundaries = c(3, 5),
bucket_batch_sizes = c(2, 2, 2)
) %>%
as_array_iterator() %>%
iterate(print)
# [,1] [,2] [,3] [,4]
# [1,] 1 2 3 4
# [2,] 5 6 7 0
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
# [1,] 7 8 9 10 11 0 0 0
# [2,] 13 14 15 16 17 18 19 20
# [,1] [,2]
# [1,] 0 0
# [2,] 21 22
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.