RStream: Main function to run a stream.

Description Usage Arguments Value Additional Info Author(s) References See Also Examples

Description

RStorm provides the main functionality of the RStorm package. The RStorm function is used to run a stream defined using a Topology. The Topology defines the spout (the data-source for the stream) and the order of processing units (bolts). See example below and in the main package description for examples of the usage of RStorm.

More details of the package, examples of streaming algorithms, and examples of the use of RStorm can be found at http://software.mauritskaptein.com/RStorm

Usage

1
RStorm(topology, .verbose = TRUE, .debug = FALSE, .batches = 100, ...)

Arguments

topology

a topology object specified using Topology. The topology contains all the necessary information (the definition of the spouts, the bolts, and the order of processing) for the stream to run in full.

.verbose

a logical indicator for verbose output. Default is TRUE.

.debug

a logical indicator for debug mode. If in debug mode all objects stored during the running of the stream will persist in memory and can be accessed using standard calls to ls(). Default is FALSE.

.batches

a number. Determines the size of batches processed by a stream. While RStorm simulates streaming processing, in actuality the rows of the data.frame defined in the spout are iterated through in batches to prevent memory overflow when the spout contains a large number of rows. This argument sets the size of these batches and with it limits the size of memory allocated to emitted data during the stream. Default batch size is 100.

...

additional arguments to pass to (e.g.) bolts or plotting functions.

Value

An object of type RStorm which is a list containing the following elements:

data

a list containing all the hashmaps stored during the stream using SetHash. Can be accessed by passing the result object to GetHash.

track

a list containing all the data.frames stored using the TrackRow functionality. Can be accessed by passing the result object of an RStorm to GetTrack.

finalize

the result of the finalize function. If a finalize function is added to the Topology this field will contain whatever was returned by the finalize function and can be accessed directly using ShowFinalize. If no finalize function was added to the topology or the finalize function does not return anything the value of finalize will be false

Additional Info

The is.RStorm function checks whether an object is of Type RStorm and is used internally.

Author(s)

Maurits Kaptein

References

http://software.mauritskaptein.com/RStorm

See Also

See Also: Topology, Bolt, Tuple, Emit, TrackRow, SetHash, GetHash, GetTrack

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
	# Run a simple RStorm. First, create some data:
	x <- seq(1, 1000)

	# Second, we start defining the topology
	topology <- Topology(data.frame(x=x))

	# Third, we define a bolt. 
	# This bolt computes the sum of a number stored 
	#   in a local Hashmap and the Tuple (x) that is received
	computeSum <- function(x, boltID, ...){
		sum <- GetHash(paste("sum", boltID))
		if(is.data.frame(sum)){
			x <- sum + (x[1])
		}
		SetHash(paste("sum", boltID), x)
		Emit(Tuple(x=x), ...)
	}

	# Add the bolts to the topology. 
	# Here the first bolt computes the sum of the sequence
	#   and the second bolt computes the sum of summed elements
	topology <- AddBolt(topology, Bolt(computeSum, listen=0, boltID=1))
	topology <- AddBolt(topology, Bolt(computeSum, listen=1, boltID=2))
	result <- RStorm(topology)
	print(GetHash("sum 1", result))
	print(GetHash("sum 2", result))

RStorm documentation built on May 2, 2019, 9:14 a.m.

Related to RStream in RStorm...