AzureCosmosR: Interface to Azure Cosmos DB"

AzureCosmosR

An interface to Azure Cosmos DB, a NoSQL database service from Microsoft.

Azure Cosmos DB is a fully managed NoSQL database for modern app development. Single-digit millisecond response times, and automatic and instant scalability, guarantee speed at any scale. Business continuity is assured with SLA-backed availability and enterprise-grade security. App development is faster and more productive thanks to turnkey multi region data distribution anywhere in the world, open source APIs and SDKs for popular languages. As a fully managed service, Azure Cosmos DB takes database administration off your hands with automatic management, updates and patching. It also handles capacity management with cost-effective serverless and automatic scaling options that respond to application needs to match capacity with demand.

On the Resource Manager side, AzureCosmosR extends the AzureRMR class framework to allow creating and managing Cosmos DB accounts. On the client side, it provides a comprehensive interface to the Cosmos DB SQL/core API as well as bridges to the MongoDB and table storage APIs.

SQL interface

AzureCosmosR provides a suite of methods to work with databases, containers (tables) and documents (rows) using the SQL API.

library(dplyr)
library(AzureCosmosR)

endp <- cosmos_endpoint("https://myaccount.documents.azure.com:443/", key="mykey")

list_cosmos_databases(endp)

db <- get_cosmos_database(endp, "mydatabase")

# create a new container and upload the Star Wars dataset from dplyr
cont <- create_cosmos_container(db, "mycontainer", partition_key="sex")
bulk_import(cont, starwars)

query_documents(cont, "select * from mycontainer")

# remove document metadata cruft
query_documents(cont, "select * from mycontainer", metadata=FALSE)

# an array select: all characters who appear in ANH
query_documents(cont,
    "select c.name
        from mycontainer c
        where array_contains(c.films, 'A New Hope')")

You can easily create and execute stored procedures and user-defined functions:

proc <- create_stored_procedure(
    cont,
    "helloworld",
    'function () {
        var context = getContext();
        var response = context.getResponse();
        response.setBody("Hello, World");
    }'
)

exec_stored_procedure(proc)

create_udf(cont, "times2", "function(x) { return 2*x; }")

query_documents(cont, "select udf.times2(c.height) from cont c")

Aggregates take some extra work, as the Cosmos DB REST API only has limited support for cross-partition queries. Set by_pkrange=TRUE in the query_documents call, which will run the query on each partition key range (pkrange) and return a list of data frames. You can then process the list to obtain an overall result.

# average height by sex, by pkrange
df_lst <- query_documents(cont,
    "select c.gender, count(1) n, avg(c.height) height
        from mycontainer c
        group by c.gender",
    by_pkrange=TRUE
)

# combine pkrange results
df_lst %>%
    bind_rows(.id="pkrange") %>%
    group_by(gender) %>%
    summarise(height=weighted.mean(height, n))

Full support for cross-partition queries, including aggregates, may come in a future version of AzureCosmosR.

Other client interfaces

MongoDB

You can query data in a MongoDB-enabled Cosmos DB instance using the mongolite package. AzureCosmosR provides a simple bridge to facilitate this.

endp <- cosmos_mongo_endpoint("https://myaccount.mongo.cosmos.azure.com:443/", key="mykey")

# a mongolite::mongo object
conn <- cosmos_mongo_connection(endp, "mycollection", "mydatabase")
conn$find("{}")

For more information on working with MongoDB, see the mongolite documentation.

Table storage

You can work with data in a table storage-enabled Cosmos DB instance using the AzureTableStor package.

endp <- AzureTableStor::table_endpoint("https://myaccount.table.cosmos.azure.com:443/", key="mykey")

tab <- AzureTableStor::storage_table(endp, "mytable")
AzureTableStor::list_table_entities(tab, filter="firstname eq 'Satya'")

ODBC (SQL interface)

As an alternative to AzureCosmosR, you can also use the ODBC protocol to interface with the SQL API. By installing a suitable ODBC driver, you can then talk to Cosmos DB in a manner similar to other SQL databases. An advantage of the ODBC interface is that it fully supports cross-partition queries, unlike the REST API. A disadvantage is that it does not support nested document fields; functions like array_contains() cannot be used, and attempts to reference arrays and objects may return incorrect results.

conn <- DBI::dbConnect(
    odbc::odbc(),
    driver="Microsoft Azure DocumentDB ODBC Driver",
    host="https://myaccount.documents.azure.com:443/",
    authenticationkey="mykey",
    RESTAPIversion="2018-12-31"  # for large partition key support
)

DBI::dbListTables(conn)

DBI::dbGetQuery(conn, "select * from mycontainer where gender = 'masculine'")

Azure Resource Manager interface

On the ARM side, AzureCosmosR extends the AzureRMR class framework with a new az_cosmosdb class representing a Cosmos DB account resource, and methods for the az_resource_group resource group class.

rg <- AzureRMR::get_azure_login()$
    get_subscription("sub_id")$
    get_resource_group("rgname")

rg$create_cosmosdb_account("mycosmosdb", interface="sql", free_tier=TRUE)
rg$list_cosmosdb_accounts()
cosmos <- rg$get_cosmosdb_account("mycosmosdb")

# access keys (passwords) for this account
cosmos$list_keys()

# get an endpoint object -- detects which API this account uses
endp <- cosmos$get_endpoint()

# API-specific endpoints
cosmos$get_sql_endpoint()
cosmos$get_mongo_endpoint()
cosmos$get_table_endpoint()

Further information



Try the AzureCosmosR package in your browser

Any scripts or data that you put into this service are public.

AzureCosmosR documentation built on Jan. 19, 2021, 1:07 a.m.