exa.createScript: Deploys an R function as an UDF in the EXASolution database.

View source: R/exa.createScript.R

exa.createScriptR Documentation

Deploys an R function as an UDF in the EXASolution database.

Description

This function takes an R function and creates a R UDF script on the EXASolution database. A CREATE SCRIPT call will be used behind the scenes. The return value is a function that, when executed, will execute the script on the database server.

Usage

exa.createScript(
  channel,
  name,
  func = NA,
  env = list(),
  initCode = NA,
  cleanCode = NA,
  inType = SET,
  inArgs = list(),
  outType = EMITS,
  outArgs = list(),
  outputAddress = NA,
  replaceIfExists = TRUE,
  mockOnly = FALSE
)

Arguments

channel

The RODBC connection channel, typically created via odbcConnect.

name

The script will be created in the database with this name.

func

The R function to be created as a UDF R script in the database.

env

A list of values which will be available in the UDF function under the same name.

For example if you pass list(a=2) for this argument, you can access the value a in the function via env$a.

initCode

This code block will be executed once on each parallel instance of the database running the script, before the first call of the function.

cleanCode

This code block will be executed once on each parallel instance of the database running the script, after the function was called the last time.

inType

The input type of the UDF script, either SET or SCALAR.

SET will call the function once for each group, SCALAR will call the function once for each record. Please read the EXASolution manual about UDF scripts for details.

inArgs

Vector of strings specifying the names and types of the input arguments for the UDF script. Example: inArgs = c("k INT", "v VARCHAR(10)")

outType

The output type of the UDF script, either EMITS or RETURNS. For EMITS, the function emits any number of values. For RETURNS, the function emits just a single value.

outArgs

Vector of strings specifying the names and types of the output arguments of the UDF script. Example: inArgs = c("outputstring VARCHAR(10)")

outputAddress

This parameters specifies the address and port of the optional python output service is listening on. For example: c("192.168.1.10", 3000).Please read the README.txt of this R package for details.

replaceIfExists

Boolean whether the script shall be replaced if it already exists. Either TRUE or FALSE.

mockOnly

Boolean, default FALSE. This parameter is useful for unit-testing if the ODBC connection is not available. Setting mockOnly=TRUE will not install the UDF function to the EXASOL database.

Details

We recommend to read the EXASolution manual about UDF scripts for a better understanding.

Value

This function returns a function that, when called, will execute the script on the server. With the call you have to specify to which data it shall be applied. The returned function generates and executes a SELECT SQL statement behind the scenes. It has the following signature:

function(..., table = NA, where = NA, groupBy = NA, restQuery = "", returnSQL = FALSE, reader = NA, server = NA)

...

The first string parameters define the SQL expressions that will be used as the input for the UDF script. Typically this is one or more column names as you see in the example below.

table

A string with the table name to which the function shall be applied to. You can specify quoted names the following: table='myschema."MyQuotedTable"'

where

A string with the where clause (SQL) to filter the records.

groupBy

A string with the group-by clause (SQL) that will be used to group the. This is especially important for SET UDF scripts that will be called once for each group.

returnSQL

Boolean value. For TRUE, the autogenerated SQL statement will be returned, but NOT executed.

restQuery

A string with additional SQL code that will be appended at the end of the autogenerated query, e.g. ORDER BY or HAVING.

reader

For internal usage only.

server

For internal usage only.

RETURN VALUE

The return value of the function is the result of the SELECT query. The query will be executed internally with the exa.readData function.

Author(s)

EXASOL AG <opensource@exasol.com>

See Also

Other None-DBI-API: dbCurrentSchema(), dbEnd(), exa.readData(), exa.writeData()

Examples

## Not run: 

# This example creates a simple SET-EMITS script and executes
# it the table footable.
require(RODBC)
require(exasol)

# Connect via RODBC with configured DSN
C <- odbcConnect("exasolution")

# Generate example data frame with two groups
# of random values with different means.
valsMean0  <- rnorm(10, 0)
valsMean50 <- rnorm(10, 50)
twogroups <- data.frame(group = rep(1:2, each = 10),
                        value = c(valsMean0, valsMean50))

# Write example data to a table
odbcQuery(C, "CREATE SCHEMA test")
odbcQuery(C, "CREATE TABLE test.twogroups (groupid INT, val DOUBLE)")
exa.writeData(C, twogroups, tableName = "test.twogroups")

# Create the R function as an UDF R script in the database
# In our case it computes the mean for each group.
testscript <- exa.createScript(
  C,
  "test.mymean",
  function(data) {
    data$next_row(NA); # read all values from this group into a single vector
    data$emit(data$groupid[[1]], mean(data$val))
  },
  inArgs = c( "groupid INT", "val DOUBLE" ),
  outArgs = c( "groupid INT", "mean DOUBLE" ) )

# Run the function, grouping by the groupid column
# and aggregating on the "val" column. This returns
# two values which are close to the means of the two groups.
testscript("groupid", "val", table = "test.twogroups" , groupBy = "groupid")

## End(Not run)

EXASOL/r-exasol documentation built on Aug. 28, 2023, 2:32 a.m.