RC.read.table: Read and write tables into column families in Cassandra

Description Usage Arguments Details Value Note Author(s) See Also


RC.read.table reads the contents of a column family into a data frame

RC.write.table writes the contents of a data frame into a column familly


RC.read.table(conn, c.family, convert = TRUE, na.strings = "NA",
              as.is = FALSE, dec = ".")
RC.write.table(conn, c.family, df)



connection handle as obtained form RC.connect


column family name (string)


logical, if TRUE the resulting data frame is processed using type.convert, otherwise all columns will be character vectors


passed to type.convert


passed to type.convert


passed to type.convert


data frame - it must have both row and column names


Cassandra is a key/value store with dynamic columns, so tables are not the native format. Row names are used as keys and columns are treated as fixed. RC.read.table is really jsut a wrapper for RC.get.range.slices(conn, c.family, fixed=TRUE). RC.write.table uses the same facility as RC.mutate but without actually creating the mutation object on the R side.

Note that all updates in Cassandra are "upserts", i.e., RC.write.table updates any existing row key/coumn name combinations or creates new ones where not present (insert). Additonal columns (or even keys) may still exist in the column family and they will not be touched.

RC.read.table creates a data frame from all columns that are ever encountered in at least one key. All other values are filled with NAs.


RC.read.table returns the resulting data frame

RC.write.table returns conn


IMPORTANT: Cassandra does NOT preserve order of keys and columns. Internally, keys are ordered by their hash value and columns are ordered lexicographically (treated as bytes). However, due to the fact that columns are dynamic the order of columns will vary if keys have different columns, because columns are added to the data frame in the sequence they are encountered as the keys are loaded. You may want to use df <- df[order(as.integer(row.names(df))),] on the result of RC.read.table for tables with automatic row names to obtain the original order of rows.

RC.read.table is more effcient than RC.get.range.slices because it can store columns into vectors and can pre-allocate the whole structure in advance.

Note that the current implementation of tables (RC.read.table and RC.write.table) supports only string-based representation of columns and values ("UTF8Type", "AsciiType" or similar).


Simon Urbanek

See Also

RC.connect, RC.use, RC.get

RCassandra documentation built on May 2, 2019, 10:10 a.m.