knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
R package nodbi
provides a single interface for several NoSQL databases and SQL databases with JSON functionality, with the same function parameters and return values across all of:
Package nodbi
has been designed to use any specific SQL functions a database may have for JSON
and has added functionality tested for performance to enable switching databases without changing user code.
library(nodbi)
First, a connection to a database is opened. In the example, no additional parameters are used such as database file or server; see the help page for the respective database.
"Container" is used as term to indicate where conceptually the database holds the data (e.g. a collection in MongoDB, a table in DuckDB). The key
parameter of nodbi
functions holds the name of the relevant container.
# name of container key <- "my_container" # nodbi can connect any of these databases if (FALSE) { src <- src_duckdb() src <- src_mongo(collection = key) src <- src_sqlite() src <- src_postgres() src <- src_elastic() src <- src_couchdb( user = Sys.getenv("COUCHDB_TEST_USER"), pwd = Sys.getenv("COUCHDB_TEST_PWD") ) } # this example is run with src <- src_sqlite() # note additional parameters can be specified, # for example for local or remote MongoDb: help("src_mongo")
Create a container if it does not yet exist and fill with value
. The return value is the number of created documents. "Documents" refers to the rows in a data frame such as mtcars
, or the number of NDJSON
lines, or the number of list items, or the number of objects in an JSON
array.
The parameter value
in any nodbi
function can take a data frame, a list, a JSON string, or a file name or URL pointing to NDJSON
.
# check if container already exists docdb_exists(src = src, key = key) # load data from a data frame with row names into # the container specified in "key" parameter docdb_create(src = src, key = key, value = mtcars) # load additionally 98 NDJSON records docdb_create(src, key, "https://httpbin.org/stream/98") # load additionally mapdata as list docdb_create(src, key, jsonlite::fromJSON(mapdata, simplifyVector = FALSE)) # show JSON structure of contacts jsonlite::minify(contacts) # load additionally contacts JSON data docdb_create(src, key, contacts)
Check and list any other containers exist in the database:
docdb_list(src = src)
The unique document identifier is its _id
, corresponding to a primary index with a constraint to be unique in SQL databases.
The _id
's of an input value
are either the row names of a data frame (such as mtcars
) or top-level elements with the name _id
such as in contacts
shown just above.
Thus, expect a warning when trying to create documents with _id
's that already exist in the container.
The return value can be 0
when no documents could newly be created, or the number of the subset of documents in value
that did not yet exist and could newly be created.
# zero new documents created docdb_create(src, key, value = mtcars)
For updating existing documents, see below function docdb_update()
.
All documents in a container can now be retrieved with docdb_get()
.
# load library for more # readable print output library(tibble) # get all documents, irrespective of schema as_tibble(docdb_get(src, key)) # get just 2 documents using limit and note that # only fields for these documents are returned as_tibble(docdb_get(src, key, limit = 2L))
One of the most powerful functions of nodbi
is docdb_query()
because it permits to combine a query to select documents and to filter for fields of interest.
# query for some documents docdb_query(src = src, key = key, query = '{"mpg": {"$gte": 30}}')
Both parameters query
(obligatory) and fields
(optional) use, across all databases, MongoDB syntax such as documented for queries and fields.
# query some fields from some documents; 'query' is a mandatory # parameter and is used here in its position in the signature docdb_query(src, key, '{"mpg": {"$gte": 30}}', fields = '{"wt": 1, "mpg": 1}')
Unless fields
specifies "_id": 0
, the _id
field is always included in the output of docdb_query()
.
# query some fields from some documents, limit return to one document docdb_query(src, key, '{"mpg": {"$gte": 30}}', fields = '{"_id": 0, "mpg": 1}', limit = 1L)
Queries can be more complex such as in this example, showing a dot notation of a sub-field and an example operator (regular expression).
# query some subitem fields from some documents str(docdb_query( src, key, query = '{"$or": [{"age": {"$gt": 21}}, {"friends.name": {"$regex": "^B[a-z]{3,9}.*"}}]}', fields = '{"age": 1, "friends.name": 1}' ))
Queries work across documents of different structure such as here.
# query with results across documents docdb_query( src, key, query = '{"$or": [{"age": {"$gt": 21}}, {"mpg": {"$gte": 30}}]}', fields = '{"name": 1, "disp": 1}' )
The JSON
data handled by package nodbi
may have a large number of field included nested fields in objects (see for example name
within array friends
above). Thus, an argument is provided for docdb_query()
so that the function returns only the comprehensive list of all field names in documents selected with a query (or in all documents in the container if query = "{}"
is specified).
docdb_query(src, key, query = '{"_id": {"$regex": "^[0-9]"}}', listfields = TRUE)
The dot notation is a path from a root field to the nested field, and this notation can be used in query
and fields
parameters of docdb_query()
.
Queries can also be used for updating (patching) selected documents with a new value
. The return value of docdb_update()
corresponds to the number of documents that were updated.
This is another powerful function because value
can come from a data frame, a list, a JSON string, or a file name or URL pointing to NDJSON
, and if value
includes row names or _id
's, these are used to identify the documents to be updated.
# number of documents corresponding to query nrow(docdb_query(src, key, query = '{"carb": 3}')) # update all documents using JSON, replacing the previously existing values docdb_update(src, key, value = '{"vs": 9, "xy": [1, 2]}', query = '{"carb": 3}') # update with value that includes _id's docdb_update(src, key, value = '{"_id": "Merc 450SLC", "xy": 33}', query = "{}") # show updated values docdb_query(src, key, query = '{"carb": 3}', fields = '{"xy": 1}')
Documents and containers can be deleted with docdb_delete()
. Its return value corresponds to the success of the delete operation.
# number of documents corresponding to query nrow(docdb_query(src, key, query = '{"age": {"$lte": 23}}')) # to delete selected documents, specify a query parameter docdb_delete(src, key, query = '{"age": {"$lte": 23}}') # this deletes the complete container from database docdb_delete(src, key) # check if still exists docdb_exists(src, key)
Package nodbi
includes an automatic mechanism for shutting down, at the time of quit()
or session restart, those databases that require it (SQLite, DuckDB, PostgreSQL).
Nevertheless, it is good practice to manually disconnect and shut down connections as specific to the database, for example for SQLite:
src # shutdown DBI::dbDisconnect(src$con, shutdown = TRUE) rm(src)
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.