r-exasol is a R library providing access to an Exasol DB. This document is about the inner working of r-exasol.
As described in the README.md:
It is optimised for fast reading & writing from and to a multinode cluster.
This document will describe how the r-exasol library reads/write data from/to the database from high level perspective. The focus is hereby on the functions which leverage the fast data transfer, as the other functions make usage of standard DBI / RODBC functions.
When the client calls dbConnect(), the library connects to the Exasol DB via RODBC, this happens in EXADBI.R
:
EXANewConnection <- function(...)
...
odbcDriverConnect(...)
Once the connection is ready, the client can read data or manipulate data using basically two functions:
exa.readData()
exa.writeData()
Some of the standard SQL functions use internally one of those two methods, for example dbGetQuery()
,
but that is not always the case.
The following diagram shows the internal data flow when the client invokes exa.readData()
:
When the client invokes readData()
the library reads the server info (hostname/port) from the ODBC connection and establishes a TCP socket (A1). When the server receives a connection request on the listening port it creates another TCP port (A2) which will be used for data transfer. As last step within asyncRODBCIOStart()
the client reads the information of the new port (A3), which it will need to prepare the export SQL statements.
Then the library callsasyncRODBCQueryStart()
creates internally a new thread which sends the export statements via ODBC to the server (B1). The server then starts to write the data in CSV format to the formerly created TCP socket. Thus, the client receives the data from the server via the socket (not via ODBC).
The following diagram describes the high level architecture of the C level package.
R uses the plain C interface, which delegates all calls to class ConnectionContext
.
When initiating a new connection, ConnectionContext
instantiates ConnectionController
, which opens the socket
and reads the meta-data.
After receiving asyncRODBCQueryStart
, the ConnectionController
triggers the asynchronous ODBC call (ODBCAsyncExecutor
) and creates a HttpChunkReader
or HttpChunkWriter
instance, which is returned to the ConnectionContext
class.
ConnectionContext then create a specific rconnection class: RWriterConnection
or RReaderConnection
, and passes the reader/writer obtained from ConnectionController
to it.
Finally, ConnectionContext
returns the SEXP
instance (which encapsulates the rconnection data structure) to the R client, which can start reading/writing. For more information please check r-source.
The communication with ODBC requires access to the connection handle; we get access to this handle by reading the "handle_ptr" attribute in readData.R/writeData.R; please check here.
On Linux we assume that OpenSSL has been installed (for example via `sudo apt-get install libssl-dev
).
The build script (configure) checks if the header files and libraries are correctly installed.
The unit tests use CMake, CMake also finds the OpenSSL package on the system.
On MacOsX we also assume that OpenSSL has been installed (for example via brew install openssl``).
However, the package is not exposed publicly. So the build script (configure) will usually fail to
find OpenSSL using the
pkg-info```; for that case configure checks again for the predefined
path where brew installs OpenSSL.
On Windows we leverage the OpenSSL rwin-repository for building the package. It provides a static library for OpenSSL, this version is downloaded, unzipped and used for compilation.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.