db.virtualtable: Create a virtual table module

View source: R/db-vtable.R

db.virtualtableR Documentation

Create a virtual table module

Description

Create a virtual table module

Usage

db.virtualtable(name, methods, env = new.env())

Arguments

name

The name of the virtual table module

methods

A named list of functions implementing the virtual table

env

An environment in which virtuabl table methods will be evaluated.

Details

A virtual table acts like a regular database table except that SQL statements invoke callback methods defined by the virtual table module. In the current implementation there are 13 methods that can be written. Not all of these methods are required as some have sensible defaults. Each section below starts by showing the formal arguments each method is expected to have, followed by a description of the method. To be clear, these methods are typically not invoked by the user. Rather, they are invoked automatically when an SQL statement is run against a virtual table. The SQLite core will communicate the arguments to the different methods.

Value

A function that can be used to register the virtual table module with a database connection.

The .create method

function(db, table, ...)

The .create function is called whenever a new virtual table is created by a CREATE VIRTUAL TABLE statement. The function should return a new environment for the virtual table, which can be used to store data that needs to be communicated among the different callbacks. At a minimum, the returned environment should contain a 'schema' variable, which is a scalar character vector specifying the SQL schema of the virtual table.

Argument table is the name of the virtual table being created. The ... contain additional arguments specified as part of the CREATE VIRTUAL TABLE statement. For example, the statement

    CREATE VIRTUAL TABLE tablename USING modulename(arg1, arg2, arg3)

will call the .create function with argument table set equal to 'tablename' and with module arguments arg1, arg2, arg3 passed through to ...

Note that virtual tables can be associated with regular tables that provide a persistent data store. In this case, regular tables tables should follow a strict naming convention like so

    <virtual table name>_<persistent table name>

Furthermore, it is imperative that the .create and .connect function call db.register_shadowname with <persistent table name> as the name argument. Failure to do so will allow ordinary SQL to alter the contents of the data store. Calling db.register_shadowname ensures that SQL can alter the data store contents only if it originates from within one of the virtual table's methods.

The .connect method

function(db, table, ...)

The .connect function is called whenever a database connects to or reparses a virtual table schema. It receives the same arguments as the .create function. The only time the .create and .connect functions are different is when the .create function needs to do some initialization work to set up a persistent data store in the form of real database tables.

The .disconnect method

function(db, table, env)

The .disconnect function is called whenever a database connection that uses a virtual table is closed. The default method does nothing.

The .destroy method

function(db, table, env)

The .destroy function is called whenever a virtual table is deleted by a DROP TABLE statement. The default method does nothing.

The .open method

function(db, table, env)

The .open function is called whenever a cursor is needed to access rows of a virtual table. The default method creates a variable named "rowid" in env and assigns it the value of 1.

The .close method

function(db, table, env)

The .close function is called whenever a previously opened cursor is no longer needed. The default method does nothing.

The .rowid method

function(db, table, env)

The .rowid function returns the rowid of the row of data the cursor is currently pointing to. The default method returns the current value bound to "rowid" in env.

The .next method

function(db, table, env)

The .next function advances the cursor to the next row of data. The default method increments the current value bound to "rowid" in env by 1.

The .eof method

function(db, table, env)

The .eof function determines if the cursor has advanced passed the last row of data. The default method returns TRUE.

The .column method

function(db, table, env, j)

The .column function returns the value in the j-th column for the current row of data. The default method does nothing.

The .insert method

function(db, table, env, rid, ...)

The .insert function is called in response to an INSERT statement on a virtual table. If it returns NA the table is marked as read only and the operation is prohibited. The default method returns NA.

Values in ... contain the data being inserted. The values will be in the same length and order as the table's declared columns. Any columns not supplied in the insert statement will be represented as NA in ... The rowid for the new row is given by rid. If rid is NA then the .insert function is responsible for choosing a new rowid.

The function should return the value of the newly inserted rowid.

The .update method

function(db, table, env, rid, oid, ...)

The .update function is called in response to an UPDATE statement on a virtual table. If it returns NA the table is marked as read only and the operation is prohibited. The default method returns NA.

Values in ... contain the data being modified. The values will be in the same length and order as the table's declared columns. Any columns not supplied in the update statement will be represented as NA in ...

oid indicates the rowid of the data being modified. If rid != oid then this indicates that the rowid of the updated row is being changed from oid to rid.

The .delete method

function(db, table, env, rid)

The .delete function is called in response to a DELETE statement on a virtual table. If it returns NA the table is marked as read only and the operation is prohibited. The default method returns NA.

rid indicates the rowid of data to be deleted.

The .bestindex method

function(db, table, env, constraints, orderbys)

This is a complicated method. For starters, it is best to just leave it as the default, which is to return NULL.

The .bestindex function determines the best index to use to efficiently perform a SQL query.

Argument constraints will be a list of usable constraints that comprise the query. Each element in the list is a vector of length 2. The first element is the column index that the constraint operates on; the second, the type of constraint operator. If the column index is 0, this indicates the constraint operates on the table's rowid.

Argument orderbys is a list of order by terms. Each element in the list is also a vector of length 2. The first element is the column index of the order by term; the second, an integer indicating if the order by term is decreasing (1) or increasing (0).

The job of .bestindex is to use this information to select an index to perform the query and communicate this back to SQLite. The return value should be a list with the following structure.

The first element in the list should be an integer vector of the same length as the input constraints list. If no index is available for a given constraint, then the corresponding value in this vector should be set to NA, indicating a full table scan is necessary to get the desired result. If an index is available, then the corresponding value in this vector should be set to the position the constraint argument will appear in the ... arguments passed to the .filter function (see below). For example, consider the following query

  SELECT *
  FROM books
  WHERE author='Pratchett'
  AND subject='Discworld'
  AND pages >= 250

Now, consider that author is column 3, subject is column 1, and pages is column 6, and that we are able to index on author and subject.

The constraints input to .bestindex will look like

[[1]]
[1] 3 SQLITE_INDEX_CONSTRAINT_EQ
[[2]]
[1] 1 SQLITE_INDEX_CONSTRAINT_EQ
[[3]]
[1] 6 SQLITE_INDEX_CONSTRAINT_GE

Suppose we want the subject to appear before the author in the .filter argument list. Then the first item in the .bestindex list output should look like

[1] 2 1 NA

Implying that the ... arguments to filter will be

[[1]]
[1] 'Discworld'
[[2]]
[1] 'Pratchett'

If instead we had returned

[1] 1 2 NA

then the ... arguments to filter would have been

[[1]]
[1] 'Pratchett'
[[2]]
[1] 'Discworld'

The second item in the .bestindex list output should be an integer number used to identify the chosen index.

The third item in the .bestindex list output should be an character string used to identify the chosen index.

The fourth item in the .bestindex list output should be TRUE if the resulting output is already sorted according to the order by terms. Otherwise it should be FALSE, indicating that SQLite needs to do the sorting.

The fifth item in the .bestindex list output should be the estimated cost of using the returned index to perform the query. SQLite will call .bestindex multiple times with different inputs and will choose the index that has the lowest estimated cost.

The .filter method

function(db, table, env, idxnum, idxname, ...)

The .filter function sets the rowid to the rowid of the first row of data in the result set. For starters, just leave this function set to its default, which resets the "rowid" variable in env to 1.

The arguments are communicated to .filter by SQLite. Arguments idxnum and idxname are set by the .bestindex function, and the .filter function is free to interpret them however it likes. If .bestindex function always returns NULL (the default), idxnum will be 0 and idxname will be NULL. The remaining arguments in ... will be the constraint values in the SQL query that .bestindex selected in the order that .bestindex selected them. If .bestindex always returns NULL then the ... list will be empty.


blueraleigh/db documentation built on Feb. 25, 2024, 9:13 a.m.