RC.read.table: Read and write tables into column families in Cassandra
In RCassandra: R/Cassandra interface

Description Usage Arguments Details Value Note Author(s) See Also

RC.read.table reads the contents of a column family into a data frame

RC.write.table writes the contents of a data frame into a column familly

1
2
3

RC.read.table(conn, c.family, convert = TRUE, na.strings = "NA",
              as.is = FALSE, dec = ".")
RC.write.table(conn, c.family, df)

`conn`	connection handle as obtained form `RC.connect`
`c.family`	column family name (string)
`convert`	logical, if `TRUE` the resulting data frame is processed using `type.convert`, otherwise all columns will be character vectors
`na.strings`	passed to `type.convert`
`as.is`	passed to `type.convert`
`dec`	passed to `type.convert`
`df`	data frame - it must have both row and column names

Cassandra is a key/value store with dynamic columns, so tables are not the native format. Row names are used as keys and columns are treated as fixed. RC.read.table is really jsut a wrapper for RC.get.range.slices(conn, c.family, fixed=TRUE). RC.write.table uses the same facility as RC.mutate but without actually creating the mutation object on the R side.

Note that all updates in Cassandra are "upserts", i.e., RC.write.table updates any existing row key/coumn name combinations or creates new ones where not present (insert). Additonal columns (or even keys) may still exist in the column family and they will not be touched.

RC.read.table creates a data frame from all columns that are ever encountered in at least one key. All other values are filled with NAs.

RC.read.table returns the resulting data frame

RC.write.table returns conn

IMPORTANT: Cassandra does NOT preserve order of keys and columns. Internally, keys are ordered by their hash value and columns are ordered lexicographically (treated as bytes). However, due to the fact that columns are dynamic the order of columns will vary if keys have different columns, because columns are added to the data frame in the sequence they are encountered as the keys are loaded. You may want to use df <- df[order(as.integer(row.names(df))),] on the result of RC.read.table for tables with automatic row names to obtain the original order of rows.

RC.read.table is more effcient than RC.get.range.slices because it can store columns into vectors and can pre-allocate the whole structure in advance.

Note that the current implementation of tables (RC.read.table and RC.write.table) supports only string-based representation of columns and values ("UTF8Type", "AsciiType" or similar).

Simon Urbanek

RC.connect, RC.use, RC.get

RCassandra documentation built on May 2, 2019, 10:10 a.m.