ScidbConnection: A Scidb connection

ScidbConnectionR Documentation

A Scidb connection

Description

A connection object that talks to SciDB

The connection object creates ArrayOp instances, execute AFL queries of arrayOps, and download qurey results as R data frames.

Methods

Public methods


Method new()

Create a new ScidbConnection instance

This function is only for package internal use. Please call arrayop::db_connect to get a ScidbConnection object

Usage
ScidbConnection$new()
Arguments
array_like

An array_op, schema string or existing scidb array name

Returns

A ScidbConenction instance


Method has_connected()

Whether this connection object is configured with connection arguments

Usage
ScidbConnection$has_connected()
Returns

A boolean. TRUE if db_connect is called or FALSE otherwise.


Method connect()

Connect to scidb with a list of connection arguments

Calling this function will update the current connection object's internal state. If the connection has timed out, just call this function without args will re-establish the connection with previously configured connection args.

Usage
ScidbConnection$connect(connection_args = NULL, save_token = FALSE, db = NULL)
Arguments
connection_args

NULL or a list of connection args. Default NULL means using the previous connection args. A list of connection args follow the same names as params in db_connect.

Returns

The same connection object with updated internal state.


Method scidb_version()

Return an R data frame of the currently connected SciDB server version

Usage
ScidbConnection$scidb_version()
Returns

A data frame. One row and three numeric columns: "major", "minor", "patch"


Method conn_args()

Return a list of connection args used to establish the scidb connection

Connection args follow the same names as db_connect args.

Usage
ScidbConnection$conn_args()
Returns

A named list of connection args


Method query_all()

Download a data frame of the query result with all dimensions and attributes

Usage
ScidbConnection$query_all(afl_str, arrow = FALSE)
Arguments
afl_str

A string of AFL expression

Returns

R data frame


Method query()

Download a data frame of the query result with all attributes

Usage
ScidbConnection$query(afl_str, arrow = FALSE)
Arguments
afl_str

A string of AFL expression

Returns

R data frame


Method execute()

Execute AFL expression without result

Use this for pure side effect and no result is downloaded. E.g. create arrays, remove arrays, remove array versions, update arrays.

Usage
ScidbConnection$execute(afl_str)
Arguments
afl_str

A string of AFL expression


Method create_array()

Create a new scidb array and return the arrayOp instance for it

Usage
ScidbConnection$create_array(name, schema_template, .temp = FALSE)
Arguments
name

Scidb array name. E.g. 'myNamespace.myArray'. If no namespace in array name, it will be created in the 'public' namespace.

schema_template

A scidb schema string or an arrayOp instance, used as the schema template for the newly created array.

.temp

Boolean. Whether to create the array as a scidb temporary array

Returns

An arrayOp instance of the newly created array


Method execute_mquery()

Execute multiple AFL statments wrapped in one mquery with transcation gurantee

mquery currently only supports top operators insert and delete

Usage
ScidbConnection$execute_mquery(..., .dry_run = FALSE)
Arguments
...

AFL strings and/or arrayOp isntances. arrayOp$to_afl() will be called to generate AFL strings.

.dry_run

A single boolean value, default FALSE. If TRUE, only print out the mquery; otherwise execute it in scidb without returns.

Examples
\dontrun{
conn$execute_mquery(
  target$filter(conc < 50)$update(target),
  target$delete_cells(Plant %contains% "3", uptake > 10)
)
}

Method array()

Get an ArrayOp instance of an existing scidb array

The scidb array denoted by the array_name must exsit in scidb.

Usage
ScidbConnection$array(array_name)
Arguments
array_name

Scidb array name. E.g. 'myNamespace.myArray'. If no namespace in array name, it will be searched in the 'public' namespace.

Returns

An arrayOp instance


Method array_from_schema()

Create an ArrayOp instance from array schema

Useful in creating an arrayOp as a template for other arrayOp operations. If an array name provided in the schema_str, the array of the same name does not have to exist in scidb. Obviously, you cannnot download data from a non-existent array, but it can be used as template for other operations.

Usage
ScidbConnection$array_from_schema(schema_str)
Arguments
schema_str

A scidb-format array schema. The array name is optional. E.g. ⁠<fa:int32, fb:string COMPRESSION 'zlib'> [i;j]⁠ creates an arrayOp with the specified attributes and dimensions, and an empty afl string. E.g. ⁠myArray <fa:int32, fb:string COMPRESSION 'zlib'> [i;j]⁠ creates an arrayOp with the specified attributes and dimensions, and encapsulates an afl string of "myArray". The array named "myArray" does not need to exist in scidb since this is done only in local R env without checking in scidb.

Returns

An arrayOp instance


Method afl_expr()

Create an arrayOp instance from an AFL expression

Implemented with scidb 'show' operator.

Usage
ScidbConnection$afl_expr(afl_str)
Arguments
afl_str

A AFL expression string. Can be any array operation in AFL except for a scidb array name.

Returns

An arrayOp instance with schema from scidb


Method array_from_df()

Get an arrayOp instance from an R data frame

Implemented by scidb 'build' operator or SciDBR scidb::as.scidb function which uploads a data frame to scidb. If the number of cells (nrow * ncolumns) of the data frame is smaller than the 'build_or_upload_threshold', use 'build' operator to create a build literal array; otherwise, a persistent scidb array is created by uploading the R data frame into scidb.

Usage
ScidbConnection$array_from_df(
  df,
  template = NULL,
  build_or_upload_threshold = 8000L,
  build_dim_spec = .random_field_name(),
  as_scidb_data_frame = FALSE,
  skip_scidb_schema_check = FALSE,
  force_template_schema = FALSE,
  ...
)
Arguments
df

an R data frame

template

The array schema template can be NULL, an arrayOp, or a scidb schema string. If NULL, inferr scidb data types from the classes of data frame columns. If arrayOp, use the actual scidb field types for matching columns of the R data frame. If schmea string, infer field types the same way as an arrayOp instance.

build_or_upload_threshold

An integer, below which the scidb 'build' operator is used to create a build literal; otherwise, upload the data frame into scidb with SciDBR's scidb::as.scidb function.

build_dim_spec

The build dimension spec if 'build' operator is chosen. Can be either a simple field name or a full dimension spec. E.g. "z", or "z=0:*:0:100"

as_scidb_data_frame

Boolean. If FALSE (default), create a scidb data frame (no explicit dimensions); otherwise, create a regular scidb array. Applicable for 'build' literal only.

skip_scidb_schema_check

Boolean. If FALSE (default), check with scidb to determine the exact schema of result arrayOp; otherwise, infer the schema locally (which is not accurate; but saves a round trip to scidb server and work in most cases if not used as an template).

force_template_schema

Boolean. If FALSE (default), do not change the result arrayOp schema to be compatible with the template using 'redimension' operator. If TRUE, force the result arrayOp to use the same schema as the template, which must be provided (not NULL).

...

other params used in upload_df function.

Returns

An arrayOp instance that encapsulates a build literal or uploaded R data frame(s)


Method upload_df()

Get an arrayOp instance from uploaded R data frame

Implemented by SciDBR scidb::as.scidb function which uploads a data frame or vector(s) to scidb.

By default, the uploaded R data frame is saved to scidb as a persistent array. If upload_by_vecotr = TRUE, multiple scidb arrays are created, each for one of the R data frame column by uploading individual vectors, which is faster but suffers from bugs in ScidbR.

Usage
ScidbConnection$upload_df(
  df,
  template = NULL,
  name = dbutils$random_array_name(),
  force_template_schema = FALSE,
  upload_by_vector = FALSE,
  .use_aio_input = FALSE,
  .temp = FALSE,
  .gc = TRUE
)
Arguments
df

an R data frame

template

The array schema template can be NULL, an arrayOp, or a scidb schema string. If NULL, inferr scidb data types from the classes of data frame columns. If arrayOp, use the actual scidb field types for matching columns of the R data frame. If schmea string, infer field types the same way as an arrayOp instance.

name

A string as the uploaded scidb array name, only applicable when upload_by_vector = FALSE in which case a single scidb array is created.

force_template_schema

Boolean. If FALSE (default), do not change the result arrayOp schema to be compatible with the template using 'redimension' operator. If TRUE, force the result arrayOp to use the same schema as the template, which must be provided (not NULL).

upload_by_vector

Boolean. If TRUE, upload R data frame by its vectors, which is faster than upload R data frame as a whole but suffers from unresolved ScidbR bugs. If FALSE (default), upload R data frame as a whole as a sicdb array.

.use_aio_input

Boolean, default FALSE. Whether to use 'aio_input' to import the uploaded data frame on scidb server side. If TRUE, 'aio_input' is faster than the default 'input' operator, but suffers from some bugs in the 'aio_input' scidb plugin.

.temp

Boolean, default FALSE. Whether to save the uploaded data frame as a temporary scidb array.

.gc

Boolean, default TRUE. Whether to remove the uploaded scidb array once the encapsulating arrayOp goes out of scodb in R.

Returns

An arrayOp instance that encapsulates a build literal or uploaded R data frame(s)


Method compile_df()

Get an arrayOp instance by compiling an R data frame into a scidb build literal

Implemented by scidb 'build' operator. No persistent scidb array will be created.

Usage
ScidbConnection$compile_df(
  df,
  template = NULL,
  build_dim_spec = dbutils$random_field_name(),
  force_template_schema = FALSE,
  as_scidb_data_frame = FALSE,
  skip_scidb_schema_check = FALSE
)
Arguments
df

an R data frame

template

The array schema template can be NULL, an arrayOp, or a scidb schema string. If NULL, inferr scidb data types from the classes of data frame columns. If arrayOp, use the actual scidb field types for matching columns of the R data frame. If schmea string, infer field types the same way as an arrayOp instance.

build_dim_spec

The build dimension spec if 'build' operator is chosen. Can be either a simple field name or a full dimension spec. E.g. "z", or "z=0:*:0:100"

force_template_schema

Boolean. If FALSE (default), do not change the result arrayOp schema to be compatible with the template using 'redimension' operator. If TRUE, force the result arrayOp to use the same schema as the template, which must be provided (not NULL).

as_scidb_data_frame

Boolean. If FALSE (default), create a scidb data frame (no explicit dimensions); otherwise, create a regular scidb array. Applicable for 'build' literal only.

skip_scidb_schema_check

Boolean. If FALSE (default), check with scidb to determine the exact schema of result arrayOp; otherwise, infer the schema locally (which is not accurate; but saves a round trip to scidb server and work in most cases if not used as an template).

Returns

An arrayOp instance that encapsulates a build literal or uploaded R data frame(s)


Method fread()

Get an arrayOp instance that encapsulates an array operation using 'aio_input' operator to read content from a file

Param convenctions similar to data.table::fread function. We can choose if column names/types should be inferred from peaking into the file by setting header = T and nrow = 10 for how many rows to peek for inference.

Unique to scidb, we can control how the file columns are converted and whether to use multiple scidb instances to read multiple files in parallel.

Usage
ScidbConnection$fread(
  file_path,
  template = NULL,
  header = TRUE,
  sep = "\t",
  col.names = NULL,
  mutate_fields = NULL,
  auto_dcast = FALSE,
  nrow = 10L,
  instances = NULL,
  .aio_settings = NULL
)
Arguments
file_path

A single string or a string vector, for a local file path or a list of paths. If multiple paths provided, the instances param must be set to the same number as file_path.

template

The array schema template can be NULL, an arrayOp, or a scidb schema string. If NULL, inferr scidb data types by peeking into the file and read a small data frame of nrows with data.table::fread. Sensible data type conversion between R and scidb will be performed. If arrayOp, use the actual scidb field types for matching file columns. If schmea string, infer field types the same way as an arrayOp instance.

header

Boolean, default TRUE. Whether to use the first file row to infer file column names and data types.

sep

A single character string as the field delimiter, default "\t" for TSV files. Set to "," for CSV files.

col.names

NULL (default) or a string vector. If ⁠col.names = NULL, header = T⁠, file column names are inferred from the first file row. If ⁠col.names = NULL, header = F, tempalte = NULL⁠, file column names follow the data.table::fread convention and are named as ⁠V1, V2, ... etc⁠. If ⁠col.names = NULL, header = F, tempalte = anTemplate⁠, assume file columns are in the same order as the template's dimensions + attributes. If set to a string vector, its length must match the actual file columns, and the acutal file column names are replaced with the provided col.names, but data types are still inferred from the actual file columns.

mutate_fields

NULL or a list of R expressions. When auto_dcast = T, this setting prevails. Similar to ArrayOpBase$mutate. E.g. ⁠a = b + 2, name = first + "-" + last, chrom = if(chrom == 'x') 23 else if(chrom == 'y') 24 else chrom⁠

auto_dcast

Boolean, default FALSE. If TRUE, all non-string fields are dcast'ed with ⁠dcast(ax, int64(null)), where ax is the 0-indexed mapping attribute name (e.g. a0, a1, etc), and int64 is the template field type. If FALSE, force coerce file columns into scidb types for all non-string fields, e.g. double(a0), int32(a1). Error will be thrown if incompatible field content is read during execution of this ⁠fread⁠function, not the⁠fread' itself since it doesn't actually execute any operation.

Even if auto_dcast = T which is useful in many cases when file is not strictly formatted, we can still overwrite the dcast rule by setting a mutate_fields expression list, as seen in ArrayOpBase$mutate.

nrow

An integer, default 10. How many rows to peek into the file to infer column names and data types using data.table::fread.

instances

NULL (default) or an integer vector. For single file path, set to NULL. For multiple file paths, set the same number of instances as the file paths, each reading from a file path in parallel.

.aio_settings

NULL (default) or a list of extra aio_input settings. Basic aio_input settings including path, num_attributes, and header are generated automatically and should not be manually provided. See scidb doc on extra aio_input settings.

Returns

An arrayOp instance that encapsulates a 'aio_input' operation with auto generated field mapping and data type conversion.

Examples


## ------------------------------------------------
## Method `ScidbConnection$execute_mquery`
## ------------------------------------------------

## Not run: 
conn$execute_mquery(
  target$filter(conc < 50)$update(target),
  target$delete_cells(Plant %contains% "3", uptake > 10)
)

## End(Not run)

Paradigm4/ArrayOpR documentation built on Dec. 11, 2023, 5:59 a.m.