kafka_records_class: R6 Class for Kafka Records

kafka_records_classR Documentation

R6 Class for Kafka Records

Description

R6 Class for Kafka Records

R6 Class for Kafka Records

Details

This class will handle kafka records. It allows to manage polling for new messages, retrieval of messages from JVM, local storage of message batches and iteration and forwarding messages or message batches for consumption.

It abstracts storage, polling, forwarding into an iteratable interface where messages can be accessed via next_record() and next_record_batch().

The main problem here is that message consumption is not trivial for a couple of reasons: (1) The R interface has to be in sync with the Java side of things - there is an records object at the Java side. (2) Kafka does fetch messages in batches. While batches might be as small as 0 or 1 message the default is to consume messages in batches of 500. This makes consuming single messages a non trivial process since the batch size is determined by how the consumer options were set e.g. timeouts and max fetch sizes the number of available messages on the topic - all things outside the s specific call of the poll method which executes data retrieval. (3) Extra processing has to be done to translate records from Java into R.

Methods

Public methods


Method new()

Create a new instance of class

Usage
kafka_records_class$new(parent)
Arguments
parent

enclosing consumer object


Method next_record()

Returns the next record ready for consumption. If the last poll returned a batch of messages one of those will be returned one after another. If all of these have been returned already a new poll will be initiated.

If the poll does not return any records a new poll will be initiated until data is returned.

Usage
kafka_records_class$next_record(timeout_ms = Inf)
Arguments
timeout_ms

defaults to 'Inf'. Time for which poll will wait for data Passed through to kafka_consumer$poll()


Method next_record_batch()

Usage
kafka_records_class$next_record_batch(timeout_ms = Inf)
Arguments
timeout_ms

defaults to 'Inf'. Time for which poll will wait for data Passed through to kafka_consumer$poll()

Returns all available, unconsumed messages. If no unconsumed messages are available it will poll for a new batch and return it. #'

If the poll does not return any records a new poll will be initiated until data is returned.

Reference to consumer object that serves as parent.

Holds a batch of messages received from kafka consumer as data.frame or data.table.

Records which message from local storage records is to be consumed next

Use poll method on kafka consumer to get new messages.


Method clone()

The objects of this class are cloneable with this method.

Usage
kafka_records_class$clone(deep = FALSE)
Arguments
deep

Whether to make a deep clone.


petermeissner/kafkaesque documentation built on Sept. 28, 2022, 4:30 a.m.