iterator.build.idx: Optimize single meter lookup in a data.frame containing many...

Description Usage Arguments Details

View source: R/iterator.R

Description

This function does a fast search for the first and last indices for each meter's data in a data.frame with many meters and caches the resulting indices. These indices can be used to very quickly access data for individual meters.

Usage

1

Arguments

ctx

The ctx environment that configures a feature run. This function is specifically looking to build an index for ctx$RAW_DATA, if present and saves the results as ctx$idxLookup. It allows for RAW_DATA to be set as a data.frame with data from a large number of meters loaded at once and in advance of the feature extraction runs to reduce runtimes.

Details

Standard methods of searching for all data for a given meterId would use boolean expressions like ctx$RAW_DATA[ctx$RAW_DATA$meterId == 'some_id',]. It turns out that this is pretty inefficient for large data.frames becaus it generates values for all rows and then does the necessary comparisons for all rows. Direct numerical indexing avoids this overhead and such indices can be quickly computed using match and the fact that all data for each meter must be returned from the data source in contigious rows. Finally, the constructor for a MeterDataClass checks for these ctx$idxLookup if ctx$RAW_DATA is found and uses them to pull the subset of data associated with the meterId passed in to that constructor.


ConvergenceDA/visdom documentation built on May 6, 2019, 12:51 p.m.