library(collection)
library(knitr)

knitr::opts_chunk$set(collapse = TRUE, comment = "#>", eval=FALSE)

General Remarks

Collections can be used in both interactive and programmatic mode. In the interactive mode a number of heuristic and (hopefully) smart guesses are used to ease the use of the package. In the programmatic mode none such guesses are present to make the API simple and predictable.

Following the convention introduced in the lazyeval package, functions indenteded for interactive use have names that end with a letter while their programmatic counterparts' names have an underscore appended to the original name. For example:

restore() # is intended for interactive use
restore_() # is intended for programmatic use

Basics

First we create a collection and store an object in it.

C <- collection('first collection')
store(C, iris)

Let's see the summary of C and then list objects stored there:

summary(C)
print(C)
show(C)

Finally, let's read that object back:

restore(C, name == 'iris')
restore(C, 'id')

Multiple Objects

What are the use cases when working with multiple objects? In order to illustrate them we will use a handy collection generator that comes with the collection package.

We create n = 12 time series objects, each len = 96 observations long. We also provide the random seed.

ts_col <- sample_time_series(n = 12, len = 96, seed = 1)

Now we can list the collection:

TODO should show the no tag

print(ts_col)

Let's now apply a function on each object in this collection:

do(ts_col, function (obj) {

}) # TODO should return result wrapped in a pretty-printer; a clist?

What if we only want to apply this function on a subset of objects?

filter(ts_col, no < 7) %>%
  do(function (obj) {

  })

Exploring with a single object

Let's pick a single time series data set and see what kind of model we can fit.

ts <- restore(ts_col, no == 1)
summary(ts)
lm(x ~ a + b + c, data = ts)

Fit model to all objects in a collection

ts_col %>%
  do(function (obj) {
    lm(x ~ a + b + c, data = obj)
  }) %>%
  store(C, name = 'models') # TODO store method for clist; differs from store.default

Multiple objects - a Table

We have now seen a way to process multiple objects using a collection. Since there are no limits to what objects are stored in a colletion, one can look at them as parts of a table, where tags are just indices used to group that table into sub-tables.

Let's store the iris data set a three separate groups/subsets with Species being the grouping attribute/column/tag.

tables < collection('tables')

iris %>%
  group_by(Species) %>%
  do({
    obj <- select(., -Species)
    tag <- .$Species[1]
    store(tables, obj, name = 'iris', species = tag)
  })


lbartnik/collection documentation built on May 20, 2019, 8:27 p.m.