The dplyr package abstracts away database connection details, and a fair amount of dialect-specific SQL, behind its own src
objects or its use of DBI
connection objects. However, code that uses dplyr must construct an appropriate src
or connection, and use it to create any tbl
instances that are used in a join
operation. This creates two problems for code that one might want to distribute. First, the connection-building src_whatever
or dbConnect
statement typically contains credentials used to authenticate to the database server. This creates a security risk, as sending around personal credentials isn't a good idea for a variety of reasons, and using the same "service credentials" for many users makes tracking usage more difficult. It also makes it harder to reuse code, since it has to be edited to accommodate different users or different databases.
Argos provides some help in this area by providing the src_argos
function. As its documentation explains, src_argos
isn't a new type of dplyr data source, but an adapter that lets you create a data source of a type known to dplyr or DBI using configuration data passed to src_argos
, or, more importantly, supplied in a configuration file. This behavior is not OHDSI-specific, and can be used to set up any dplyr or DBIdata source.
Argos configuration files provide a simple way to represent in JSON the information needed to construct a data source. The JSON must define a single object (or hash), which is translated into a list structure within R. Two keys from the object are meaningful. The src_name
key must point to either
src_postgres
or src_mysql
, orSQLite
or PostgreSQL
(N.B. the initial R
in the package name is not included).The src_args
key itself points to a list, where the keys are names of arguments to the constructor, and the corresponding elements are the argument values. Here's a typical example:
{ "src_name" : "src_postgres", "src_args" : { "host" : "my.database.server", "port" : 5432, "dbname" : "project_db", "username" : "my_credential", "password" : "DontLook", "options" : "-c search_path=schema0,schema1,schema2" } }
If you're deriving the configuration information programatically, you can pass it directly to src_argos
via the config
argument, but it's perhaps more common that configuration remains the same for a given situation. For these cases, Argos encourages separation of configuration from code.
Once the connection is established, there may be additional work to do. For example, session settings may need to be altered, or schemas to search specified, or authorization roles changed. There are two options that Argos provides to address this: post_connect_sql
and post_connect_fun
. In both cases, this makes it possible to execute additional code specified in the configuration file. Since this creates the possibility that unknown code may be executed and produce unwanted effects, you need to opt in to each option, using the allow_post_connect_sql
or allow_post_connect_fun
parameters to src_argos
.
post_conect_sql
This option allows you to pass a series of SQL statements to the newly-established database session for execution. In this way, you can change database session settings to match the intended use of the connection. While these statements can make any changes the database server will permit, they cannot alter the R environment directly.
post_connect_fun
This option gives you the most freedom to make changes. It lets you write an R function that takes the newly-established database connection as its single parameter. It may perform computation in the database, change connection settings, or even replace the connection. The value it returns will be passed back as the return value of src_argos
.
Argos tries to provide you with a lot of flexibility in the way you deploy your code and configuration, by letting you get the latter to src_argos
in a variety of ways. The first option that returns valid JSON is used, and later options aren't checked; src_argos
does not try to merge data from more than one source.
src_argos
what to readIf you know where your configuration data lives, you can point src_argos
directly to it, using the paths
argument. This is a vector of paths to check, so you can provide a series of places to look, and src_argos
will use the first one it finds. Each place can be a path to a local file, or a URL that returns JSON. (As an implementation detail, since src_json
uses jsonlite::fromJSON under the hood, paths
can also contain a JSON string rather than a pointer to outside configuration data. We don't make any promises about this, as jsonlite::fromJSON might change someday, but it can be a handy way to provide fallback configuration information after having src_argos
check for an outside resource.)
If you need to specify where to look at runtime, you can use the environment variable BASENAME_CONFIG
to point to a configuration file, where BASENAME is one of the basenames src_argos
would usually check (see below). One note: src_argos
will only pay attention to this environment variable if it points to an actual file, not a URL or JSON string. This is construed as a feature, in that it may limit the damage someone can inflict by fiddling with the environment. If you trust the environment, you can be more permissive by writing something like
my.paths <- c() for (bn in my.basenames) { my.info <- Sys.getenv(paste0(toupper(bn), '_CONFIG')) if (my.info != '') my.paths <- c(my.paths, my.info) } src <- if (length(paths) > 0) src_argos(paths = my.paths) else src_argos(other.args)
Argos tries to support a number of common deployment styles through its use of default search locations for configuration files. For those who prefer per-user config files, it will look in your home directory. If you prefer to deploy configuration data with your application, you can put the configuration file in the same directory as your main application program. Finally, you can put the configuration file in the same directory as library code that calls src_argos
either directly or through one intermediate call.
Similarly, src_argos
will try to find files with the same basename as your application, or as the library file(s) making the call to src_argos
. Optionally, the file can have a "type" (i.e. suffix) of .json
or .conf
, or none at all. Whatever the suffix, though, the contents must be JSON. If these options don't suit your deployment strategy, you can provide explicit hints to src_argos
using the dirs
, basenames
, and suffices
arguments.
Finally, to accommodate convention on Unix-like systems, Argos first checks for a "hidden" file with a leading .
before checking for the plain basename.
Each time dplyr sets up a new tbl
it requires a data source. Since the OHDSI-specific parts of Argos frequently reference the database, especially vocabulary tables, and since dplyr's src
constructors aren't idempotent, we need to keep track of the active data source. You can pass the current data source to Argos functions using the named argument src
. If you don't, Argos will try to use the return value of ohdsi_default_src()
as a data source. You can define this function yourself, or you can use the set_ohdsi_default_src
function to set it.
Whether your an adherent of DRY or a devotee of no-action-at-a-distance, we've got you covered.
Enjoy!
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.