knitr::opts_chunk$set(comment = "") options(width = 120, max.print = 100)
build_with_security = (Sys.getenv("SCIDB_TEST_WITH_SECURITY", "") == "true") db <- if (!build_with_security) { scidb::scidbconnect(host = Sys.getenv("SCIDB_TEST_HOST","localhost"), port = Sys.getenv("SCIDB_TEST_PORT", 8080)) } else { scidb::scidbconnect(host = Sys.getenv("SCIDB_TEST_HOST","localhost"), port = Sys.getenv("SCIDB_TEST_PORT",8083), username = Sys.getenv("SCIDB_USER"), password = Sys.getenv("SCIDB_TEST_PASSWORD"), protocol = "https") }
A SciDB database connection object returned by scidb::scidbconnect()
contains
a list of available AFL operators and macros presented as R functions.
The functions return scidb objects, and those objects may be used as
function arguments to compose complex AFL query expressions. RStudio users can
use code completion to find available operations (via
tab completion on db$
) and inside the function body to see suggested AFL operator
argument completions. (via tab completion on, e.g., db$build()
).
x <- db$build("<v:double>[i=1:2,2,0, j=1:3,1,0]", i * j) y <- db$apply(x, a, 2 * v, b, "'a character string'") print(y)
Note that in the above expression i
and j
are treated verbatim as SciDB dimension
values, not R values. The function argument evaluation procedure in SciDB AFL methods
is, in order of precedence:
R()
function (see next section).Certain SciDB schema literal values represent one important exception to the above
process. SciDB schemas contain characters like <
, :
, [
that correspond
to R infix operators. It's difficult to always overload their evaluation as
R operators. Thus, SciDB schemas must simply always be supplied as R character strings
instead of verbatim text as shown in the following example (also used above).
x <- db$build("<v:double>[i=1:2,2,0, j=1:3,1,0]", i * j)
Use the special R()
function syntax within any portion of an AFL statement to
replace the enclosed R expression with its scalar value. The R expression is
evaluated in the calling environment (the parent frame of the AFL function).
The evaluated R expression must be a scalar character, numeric, or logical
value. The next example uses two R values, one for the upper limit of the
j
dimension and another in the build statement.
jlim <- 3 z <- pi x <- db$build("<v:double>[i=1:2,2,0, j=1:R(jlim),1,0]", i * j + R(z))
Use this syntax to mix scalar R values into SciDB AFL expressions.
Use the TAB key for argument lists within an AFL function (RStudio only).
The standard R help()
function shows SciDB doxygen help for AFL function
arguments, for instance
scidb::help(db$aggregate)
Alternatively, supply a character string argument naming a SciDB operator
to the aflhelp()
function to show SciDB help for the operator.
scidb::aflhelp(db, "aggregate")
Use the special %as%
R infix operator in place of the usual AFL
word "as" in SciDB expressions using aliasing, either to rename a
SciDB array, expression, or certain AFL expression arguments often
in aggregations.
library(datasets) data(iris) x <- scidb::as.scidb(db, iris) # upload the iris data frame to SciDB scidb::as.R(db$grouped_aggregate(x, Species, avg(Petal_Length) %as% avg))
Creating filters for SciDB queries typically involves writing boolean
expressions, whose AFL syntax can differ from R syntax in many ways.
To ease this transition, the experimental scidb::expression_parser()
,
scidb::filter()
, and scidb::filter_to_afl()
methods offer some
support for mapping R syntax to AFL syntax, as well as enabling other
convenient methods for creating filter expressions.
The scidb::filter
utility allows
users to create symbolic boolean expressions referring to attributes present in SciDB arrays as
well as local variables in the R environment. These can be evaluated in the SciDB database and used to filter the
data before returning them to the user, for improved performance.
The output of the function scidb::filter
is a list of
objects in an R6 class hierarchy inheriting from scidb::scidb.expression
, and some helpful methods are
available as described in its documentation. Alternatively, the method scidb::filter_to_afl
returns a
scalar AFL strings, equivalent to the boolean conjunction of all inputs expressions.
The scidb::filter
and scidb::filter_to_afl
functions support a syntax similar to regular R syntax, and
allow the use of various numeric and string operators, as outlined in its documentation page.
There are three basic formats for a filter expression:
* an expression made up of symbols corresponding to properties in SciDB tables, literal and interpolated
values, and various operators
* a symbol name -- predicate pair in the form symbol = predicate
, where the predicate is
an anonymous unary predicate or composition of such objects
* a symbol name -- value pair in the form symbol = value
, where the value is a scalar or vector quantity
and is interpreted as requiring that the value of the corresponding property be equal to one of the listed
options.
Regular logical operators are supported.
A number of allowed binary comparison operators acting on SciDB types are supported as well.
Constant values may be used inserted into a filter expression either as R literals or user variables prefixed
with !!
; see the section on value substitution for detail on the latter. Finally,
some arithmetic and string functions can be used to build more complicated expressions.
Symbols present in the expression, besides those reserved for various operators, are assumed to correspond to attributes or dimensions of the table being queried, but are not validated until the query is executed by the database.
A scidb::filter
expression may contain a comma-delimited list of expressions, which is interpreted as the logical
conjunction of the separate expressions -- i.e., requiring all the expressions to be evaluated as true. This
can also be done via the logical AND operator &&
, demonstrated in the below, equivalent queries:
scidb::filter_to_afl(A, B, C) scidb::filter_to_afl(A && B && C)
The logical OR operator ||
can be used to require at least one of multiple expressions to be true:
scidb::filter_to_afl(A || B || C)
Finally, the logical negation operator !
is supported, as in these equivalent expressions, which
evaluate to the same AFL due to logic pushing down negation via De Morgan's laws:
scidb::filter_to_afl(!(A || B || C)) scidb::filter_to_afl(!A && !B && !C)
The logical precedence of these operators follows the standard convention that !
takes the highest
precedence, followed by &&
, and finally ||
with the lowest precedence.
The ==
and !=
are supported for both string fields and numeric fields:
scidb::filter_to_afl(x != 'unknown')
When the two sides of these operators are different types, normal casting rules are applied if possible; e.g. different arithemtic types may be compared, but a string value may not be compared to an arithmetic value. When the field takes on the null value, the result of these expressions is always null.
The supported numeric comparison operators are >
, >=
, <
, and <=
, as shown in these examples:
scidb::filter_to_afl(b > 1000) scidb::filter_to_afl(b > 2000 || c >= 60)
When the two sides of these comparison operators are different types, normal casting rules are applied. When the field takes on the null value, the result of these expressions is always null.
The supported comparison operators for string fields are the binary operators %like%
, %contains%
,
%starts_with%
, and %ends_with%
.
The %like%
operator takes a regular expression as the right-hand operand. Inside the regular
expression pattern, special characters such as *
and .
take special meanings; please refer to the R
regular expression syntax for
more details. In this example, the pattern [*][0-9]{2,}
matches a literal *
asterisk followed by at
least 2 digits, and the initial and terminal .*
placeholders mean that this pattern can be present
anywhere in the annotation string:
scidb::filter_to_afl(x %like% '.*[*][0-9]{2,}.*')
The operators %contains%
, %starts_with%
, and %ends_with%
are convenience methods for substring matching, with
the former allowing a floating position for the substring specified as the right-hand operand and
the latter fixing the substring's allowed position. For example, to require a string to end in a literal *
,
one could create the filter
scidb::filter_to_afl(x %ends_with% '[*]')
All four of these methods require escaping special regex characters to match them literally, with the
simplest method being to create single element character classes, e.g. [*]
as shown above.
The operators %in%
and %not_in%
are set operators, testing the membership of left-hand operand in
the vector in the right-hand operand. Those vectors can be included as literals in the expression
or as interpolated R variables:
scidb::filter_to_afl(a %in% c(3,5,7))
A filter expression can directly use literal R characters and numeric values, as shown here where the literal value 1000 is used directly:
scidb::filter_to_afl(b > 1000)
Alternatively, a user might prefer to store the value in an R variable. In this case, the
variable must be prefixed with !!
:
aValues <- c(3,5,7) scidb::filter_to_afl(a %in% !!aValues) myThreshold <- 1000 scidb::filter_to_afl(b > !!myThreshold)
If the !!
is omitted, the variable name will be assumed to be a field in the SciDB array, resulting
in either an incorrect result when the name happens to be a field in the array or, more likely, an
error when SciDB cannot find a field of the given name:
scidb::filter_to_afl(b > myThreshold) # error -- should add !! before myThreshold
These errors cannot be detected when the filter expression is created, and will not be apparent until the query is executed on the SciDB server.
If the name of the SciDB field is itself stored in a variable, it can be injected into an expression
via the use of the rlang::sym
function:
key <- "x" scidb::filter_to_afl(!!rlang::sym(key) == "value")
The various examples above show how a single field can be compared to a reference literal or symbolic value. The values of fields can also be modified using appropriate functions acting on one or more string or numeric type, to build more complicated boolean expressions.
The binary operator +
can be used to as a string concatenation operator, to join strings from different fields
and/or literals. For example, the filter below first constructs a synthetic field a + ':' + b
,
then requires that the synthetic field must be one of the strings specified with a %in%
function.
scidb::filter_to_afl(a + ':' + b == "A:B")
If preferred, it is also possible to use the regular R paste0
method, though it is currently limited to operating
as a binary expression as shown in the filter here, completely equivalent to the one above:
scidb::filter_to_afl(paste0(paste0(a,':'),b) == "A:B")
Additionally, the unary nchar
operator (or, alternatively, strlen
) returns the number of characters in a
string field, a value that can then be used in numeric comparisons:
scidb::filter_to_afl(nchar(a) <= 12)
The binary arithmetic operators +
, -
, *
, and /
can all be used to combine numeric fields, and the results can be
filtered with any of the regular numeric comparison operators.
scidb::filter_to_afl( a * b >= 1000)
Typical rules apply when comparison values of different numeric types or when attempting to divide by zero.
Expressions that are a function of a field's value must be
wrapped in parentheses if they are meant to be the left-hand operand of the %in%
or %not_in%
operators, due to R's normal precendence rules (save when the expression is just nchar(...)
).
A safe practice would be to always wrap these expressions in parenthesis, to prevent any ambiguity
in case a ==
or !=
were ever replaced with %in%
or %not_in%
.
The following unary predicate expressions are all shorthand methods for creating a scidb::filter
object for an
arbitrary (anonymous) attribute. In this syntax, the argument to scidb::filter
is a named key-value pair whose
key is the name of the attribute or dimension to which the value, a predicate, is to be applied.
which takes a list of fields (as names) and their corresponding allowed values or predicates, applied in conjunction.
For example, to restrict a field a
to be one of 3, 5, or 7, and simultaneously restrict b
to be equal to 6,
one can write
scidb::filter_to_afl( a = IN(c(3,5,7)), b = EQUALS(6) )
For convenience, the list values can be set to a primitive value or vector of values, as a shortcut for the common cases of equality or membership in a set:
scidb::filter_to_afl( a = c(3,5,7), b = 6 )
Additionally, predicates can be contained logically using the !
, &
, and |
operations, for example
to allow a field to be in one of two ranges with
scidb::filter_to_afl(a = IN_RANGE(5,10) | IN_RANGE(15,20))
When defining unary predicates, one can take advantage of the rlang :=
syntax to set the predicate
for a field whose name is stored in an R variable:
key <- "x" scidb::filter_to_afl(!!key := IN_RANGE(5,10) | IN_RANGE(15,20))
EQUALS()
NEQ()
IS_NULL()
NOT_NULL()
IN()
NOT_IN()
GT()
GEQ()
LT()
LEQ()
IN_RANGE()
and IN_RANGE_INCL()
IN_RANGE_EXCL()
LIKE()
CONTAINS()
STARTS_WITH()
ENDS_WITH()
... & ...
... | ...
!(...)
,
, &
, &&
, |
, ||
, !
==
, !=
, >
, >=
, <
, <=
==
, !=
, %contains%
, %starts_with%
, %ends_with%
, %like%
%in%
, %not_in%
!!
to flag symbols in the local R environment+
, -
, *
, /
, nchar
EQUALS
, NEQ
, IN
, NOT_IN
, IS_NULL
, NOT_NULL
GEQ
, GT
, LEQ
, LT
, IN_RANGE
, IN_RANGE_INCL
, IN_RANGE_EXCL
LIKE
, CONTAINS
, STARTS_WITH
, ENDS_WITH
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.