Methods for Function aggregate in Package scidb

Share:

Description

Aggregate a SciDB array object grouped by a subset of its dimensions and/or attributes.

Usage

1
2
3
4
## S4 method for signature 'scidb'
aggregate(x, by, FUN, eval, window, variable_window, unpack)
## S4 method for signature 'scidbdf'
aggregate(x, by, FUN, eval, window, variable_window, unpack)

Arguments

x

A scidb or scidbdf object.

by

(Optional) Either a single character string or a list of array dimension and/or attribute names to group by; or a SciDB array reference object to group by. Not required for windowed and grand aggregates–see details.

FUN

A character string representing a SciDB aggregation expression or a reduction function.

eval

(Optional) If true, execute the query and store the reult array. Otherwise defer evaluation.

window

(Optional) If specified, perform a moving window aggregate along the specified coordinate windows–see details below.

variable_window

(Optional) If specified, perform a moving window aggregate over successive data values along the coordinate dimension axis specified by by–see details below.

unpack

(Optional) If TRUE, return an unpacked SciDB result as a scidbdf dataframe-like object. It's sometimes useful to set this to FALSE if the aggregated result needs to be joined with another array. Default=FALSE.

Details

Group the scidb, or scidbdf array object x by dimensions and/or attributes in the array. applying the valid SciDB aggregation function FUN expressed as a character string to the groups. Set eval to TRUE to execute the aggregation and return a scidb object; set eval to FALSE to return an unevaluated SciDB array promise, which is essentially a character string describing the query that can be composed with other SciDB package functions.

If an R reduction function is speciied for FUN, it will be transliterated to a SciDB aggregate.

The by argument must be a list of dimension names and/or attribute names in the array x to group by, or a SciDB array reference object. If by is not specified and one of the window options is not specified, then a grand aggregate is performed over all values in the array.

The argument by may be a list of dimension names and/or attributes of the array x. Attributes that are not of type int64 will be 'factorized' first and replaced by enumerated int64 values that indicate each unique level (this requires SciDB 13.6 or higher).

When by is a SciDB array it must contain one or more common dimensions with x. The two arrays will be joined (using SciDB cross_join(x,by) and the resulting array will be grouped by the attributes in the by array. This is similar to the usual R data.frame aggregate method.

Perform moving window aggregates by specifying the optional window or variable_window arguments. Use window to compute the aggregate expression along a moving window specified along each coordinate axis as window=c(dimension_1_low, dim_1_high, dim_2_low,_dim_2_high, .... Moving window aggregates along coordinates may be applied in multiple dimensions.

Use variable_window to perform moving window aggregates over data values in a single dimension specified by the by argument. See below for examples. Moving window aggregates along data values are restricted to a single array dimension.

Value

A scidbdf reference object.

Author(s)

B. W. Lewis <blewis@paradigm4.com>

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
## Not run: 
# Create a copy of the iris data frame in a 1-d SciDB array named "iris."
# Note that SciDB attribute names will be changed to conform to SciDB
# naming convention.
x <- as.scidb(iris,name="iris")

# Compute averages of each variable grouped by Species
a <- aggregate(x, by="Species", FUN=mean)

# Aggregation by an auxillary vector (which in this example comes from
# an R data frame)--also note any valid SciDB aggregation expression may
# be used:
y <- as.scidb(data.frame(sample(1:4,150,replace=TRUE)))
a <- aggregate(x, by=y, FUN="avg(Petal_Width) as apw, min(Sepal_Length) as msl")

# Use the window argument to perform moving window aggregates along coordinate
# systems. You need to supply a window across all the array dimesions.
set.seed(1)
A <- as.scidb(matrix(rnorm(20),nrow=5))
# Compute a moving window aggregate only along the rows summing two rows at
# a time (returning result to R). The notation (0,1,0,0) means apply the
# aggregate over the current row (0) and (1) following row, and just over
# the current column (that is, a window size of one).
aggregate(A,FUN="sum(val)",window=c(0,1,0,0))[]
# The above aggregate is equivalent to, for example:
apply(a,2,function(x) x+c(x[-1],0))

# Moving windows using the window= argument run along array coordinates.
# Moving windows using the variable_window= argument run along data values,
# skipping over empty array cells. The next example illustrates the
# difference.

# First, create an array with empty values:
B <- A>0
# Here is what B looks like:
B[]
# Now, run a moving window aggregate along the rows with window just like
# the above example:
aggregate(B,FUN="sum(val)",window=c(0,1,0,0))[]
# And now, a moving window along only the data values down the rows, note
# that we need to specify the dimension with by=:
aggregate(B,by="i",FUN="sum(val)",variable_window=c(0,1))[]

## End(Not run)

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.