Description Usage Arguments Details Value Author(s) See Also Examples
RC.use
selects the keyspace (aka database) to use for all
subsequent operations. All functions described below require keyspace
to be set using this function.
RC.get
queries one key and a fixed list of columns
RC.get.range
queries one key and multiple columns
RC.mget.range
queries multiple keys and multiple columns
RC.get.range.slices
queries a range of keys (or tokens) and a
range of columns
RC.consistency
sets the desired consistency level for all query
operations
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | RC.use(conn, keyspace, cache.def = TRUE)
RC.get(conn, c.family, key, c.names,
comparator = NULL, validator = NULL)
RC.get.range(conn, c.family, key, first = "", last = "",
reverse = FALSE, limit = 1e+07,
comparator = NULL, validator = NULL)
RC.mget.range(conn, c.family, keys, first = "", last = "",
reverse = FALSE, limit = 1e+07,
comparator = NULL, validator = NULL)
RC.get.range.slices(conn, c.family, k.start = "", k.end = "",
first = "", last = "", reverse = FALSE,
limit = 1e+07, k.limit = 1e+07,
tokens = FALSE, fixed = FALSE,
comparator = NULL, validator = NULL)
RC.consistency(conn, level = c("one", "quorum", "local.quorum",
"each.quorum", "all", "any", "two", "three"))
|
conn |
connection handle as returned by |
keyspace |
name of the keyspace to use |
cache.def |
if |
c.family |
column family (aka table) name |
key |
row key |
c.names |
vector of column names |
comparator |
string, type of the column keys (comparator in Cassandra speak) or NULL to rely on cached schema definitions |
validator |
string, type of the values (validator in Cassandra speak) or NULL to rely on cached schema definitions |
first |
starting column name |
last |
ending column name |
reverse |
if |
limit |
return at most as many columns per key |
keys |
row keys (character vector) |
k.start |
start key (or token) |
k.end |
end key (or token) |
k.limit |
return at most as many keys (rows) |
tokens |
if |
fixed |
if |
level |
the desired consistency level for query operations on
this connection. |
The nomenclature can be a bit confusing and it comes from the
literature and the Cassandra API. Put in simple terms, keyspace
is comparable to a database, and column family is somewhat
comparable to a table. However, a table may have different number of
columns for each row, so it can be used to create a flexible
two-dimensional query structure. A row is defined by a (row)
key. A query is performed by first finding out which row(s)
will be fetched according to the key (RC.get
,
RC.get.range
), keys (RC.mget.range
) or key range
(RC.get.range.slices
), then selecting the columns of
interest. Empty string (""
) can be used to denote an
unspecified range (so the default is to fetch all columns).
comparator
and validator
specify the types of column
keys and values respectively. Every key or value in Cassandra is
simply a byte string, so it can deal with arbitrary values, but
sometimes it is convenient to impose some structure on that content
by declaring what is represented by that byte string. Unfortunately
Cassandra does not include that information in the results, so the
user has to define how column names and values are to be
interpreted. The default interpretation is simply as a UTF-8 encoded
string, but RCassandra also supports following conversions:
"UTF8Type", "AsciiType" (stored as character vectors), "BytesType"
(opaque stream of bytes, stored as raw vector),
"LongType" (8-bytes integer, stored as real vector in R), "DateType"
(8-bytes integer, stored as POSIXct
in R), "BooleanType" (one
byte, logical vector in R), "FloatType" (4-bytes float, real vector
in R), "DoubleType" (8-bytes float, real vector in R) and "UUIDType"
(16-bytes, stored as UUID-formatted string). No other conversions
are supported at this point. If the value is NULL
then
RCassandra
attempts to guess the proper value by taking into
account the schema definition obtained by
RC.use(..., cache.def=TRUE)
, otherwise it falls back to
"UTF8Type". You can always get the raw form using "BytesType" and
decode the values in R.
The comparator
also determines how the values of first
and last
will be interpreted. Regardless of the comparator, it
is always possible to pass either NULL
, ""
(both
denoting 0-length value) or a raw vector. Other supported types must
match the comparator.
Most users will be happy with the default settings, but if you want to
save every nanosecond you can, call
RC.use(..., cache.def = FALSE)
(which saves one extra
RC.describe.keyspace
request to the Cassandra instance)
and always specify both comparator
and validator
(even
if it is just "UTF8String").
Cassandra collects results in memory so key (k.limit
) and
column (limit
) limits are mandatory. Future versions of
RCassandra may abstract this limitation out (by using a limit and
repeating queries with new start key/column based on the last result
row), but not at this point.
Note that in Cassandra keys are typically hashed, so key range may be counter-intuitive as it is based on the hash and not on the actual value. Columns are always sorted by their name (=key).
The result of queries may be also counter-intuitive, especially when
querying fixed column tables as it is not returned in the form that
would be expected from a relational database. See
RC.read.table
and RC.write.table
for
retrieving and storing relational structures in rectangular tables
(column families with fixed columns). But you have to keep in
mind that Cassandra is essentailly key/key/value storage (row key,
column key, value) with partitioning on row keys and sorting of column
keys, so designing the correct schema for a task needs some
thought. Dynamic columns are what makes it so powerful.
RC.use
and RC.consistency
returns conn
RC.get
and RC.get.range
return a data frame with
columns key
(column name), value
(value in that column)
and ts
(timestamp).
RC.mget.range
and RC.get.range.slices
return a named
list of data frames as described in RC.get.range
with names
being the row keys, except if fixed=TRUE
in which case the
result is a data frame with row names as keys and values as elements
(timestamps are not retrieved in that case).
Simon Urbanek
RC.connect
, RC.read.table
, RC.write.table
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | ## Not run:
c <- RC.connect("cassandra-host")
RC.use(c, "testdb")
## you will have to use cassandra-cli to create the schema for the "iris" CF
RC.write.table(c, "iris", iris)
RC.get(c, "iris", "1", c("Sepal.Length", "Species"))
RC.get.range(c, "iris", "1")
## list of 150 data frames
r <- RC.get.range.slices(c, "iris")
## use limit=0 to obtain all row keys without pulling any data
rk <- RC.get.range.slices(c, "iris", limit=0)
y <- RC.read.table(c, "iris")
y <- y[order(as.integer(row.names(y))),]
RC.close(c)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.