# Methods for Function aggregate in Package scidb

### Description

Aggregate a SciDB array object grouped by a subset of its dimensions and/or attributes.

### Usage

1 2 3 4 |

### Arguments

`x` |
A |

`by` |
(Optional) Either a single character string or a list of array dimension and/or attribute names to group by; or a SciDB array reference object to group by. Not required for |

`FUN` |
A character string representing a SciDB aggregation expression or a reduction function. |

`eval` |
(Optional) If true, execute the query and store the reult array. Otherwise defer evaluation. |

`window` |
(Optional) If specified, perform a moving window aggregate along the specified coordinate windows–see details below. |

`variable_window` |
(Optional) If specified, perform a moving window aggregate over successive data values along the coordinate dimension axis specified by |

`unpack` |
(Optional) If TRUE, return an unpacked SciDB result as a scidbdf dataframe-like object. It's sometimes useful to set this to FALSE if the aggregated result needs to be joined with another array. Default=FALSE. |

### Details

Group the `scidb`

, or `scidbdf`

array object `x`

by dimensions
and/or attributes in the array. applying the valid SciDB aggregation function
`FUN`

expressed as a character string to the groups. Set eval to TRUE to
execute the aggregation and return a scidb object; set eval to FALSE to return
an unevaluated SciDB array promise, which is essentially a character string
describing the query that can be composed with other SciDB package functions.

If an R reduction function is speciied for `FUN`

, it will be
transliterated to a SciDB aggregate.

The `by`

argument must be a list of dimension names and/or attribute names
in the array `x`

to group by, or a SciDB array reference object. If
`by`

is not specified and one of the `window`

options is not
specified, then a grand aggregate is performed over all values in the array.

The argument `by`

may be a list of dimension names and/or attributes of the
array `x`

. Attributes that are not of type int64 will be 'factorized' first
and replaced by enumerated int64 values that indicate each unique level (this
requires SciDB 13.6 or higher).

When `by`

is a SciDB array it must contain one or more common dimensions
with `x`

. The two arrays will be joined (using SciDB
`cross_join(x,by)`

and the resulting array will be grouped by the
attributes in the `by`

array. This is similar to the usual R data.frame
aggregate method.

Perform moving window aggregates by specifying the optional `window`

or
`variable_window`

arguments. Use `window`

to compute the aggregate
expression along a moving window specified along each coordinate axis as
`window=c(dimension_1_low, dim_1_high, dim_2_low,_dim_2_high, ...`

.
Moving window aggregates along coordinates may be applied in multiple
dimensions.

Use `variable_window`

to perform moving window aggregates over data
values in a single dimension specified by the `by`

argument. See below
for examples. Moving window aggregates along data values are restricted
to a single array dimension.

### Value

A `scidbdf`

reference object.

### Author(s)

B. W. Lewis <blewis@paradigm4.com>

### Examples

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 | ```
## Not run:
# Create a copy of the iris data frame in a 1-d SciDB array named "iris."
# Note that SciDB attribute names will be changed to conform to SciDB
# naming convention.
x <- as.scidb(iris,name="iris")
# Compute averages of each variable grouped by Species
a <- aggregate(x, by="Species", FUN=mean)
# Aggregation by an auxillary vector (which in this example comes from
# an R data frame)--also note any valid SciDB aggregation expression may
# be used:
y <- as.scidb(data.frame(sample(1:4,150,replace=TRUE)))
a <- aggregate(x, by=y, FUN="avg(Petal_Width) as apw, min(Sepal_Length) as msl")
# Use the window argument to perform moving window aggregates along coordinate
# systems. You need to supply a window across all the array dimesions.
set.seed(1)
A <- as.scidb(matrix(rnorm(20),nrow=5))
# Compute a moving window aggregate only along the rows summing two rows at
# a time (returning result to R). The notation (0,1,0,0) means apply the
# aggregate over the current row (0) and (1) following row, and just over
# the current column (that is, a window size of one).
aggregate(A,FUN="sum(val)",window=c(0,1,0,0))[]
# The above aggregate is equivalent to, for example:
apply(a,2,function(x) x+c(x[-1],0))
# Moving windows using the window= argument run along array coordinates.
# Moving windows using the variable_window= argument run along data values,
# skipping over empty array cells. The next example illustrates the
# difference.
# First, create an array with empty values:
B <- A>0
# Here is what B looks like:
B[]
# Now, run a moving window aggregate along the rows with window just like
# the above example:
aggregate(B,FUN="sum(val)",window=c(0,1,0,0))[]
# And now, a moving window along only the data values down the rows, note
# that we need to specify the dimension with by=:
aggregate(B,by="i",FUN="sum(val)",variable_window=c(0,1))[]
## End(Not run)
``` |