aggregatein Package scidb
Aggregate a SciDB array object grouped by a subset of its dimensions and/or attributes.
1 2 3 4
(Optional) Either a single character string or a list of array dimension and/or attribute names to group by; or a SciDB array reference object to group by. Not required for
A character string representing a SciDB aggregation expression or a reduction function.
(Optional) If true, execute the query and store the reult array. Otherwise defer evaluation.
(Optional) If specified, perform a moving window aggregate along the specified coordinate windows–see details below.
(Optional) If specified, perform a moving window aggregate over successive data values along the coordinate dimension axis specified by
(Optional) If TRUE, return an unpacked SciDB result as a scidbdf dataframe-like object. It's sometimes useful to set this to FALSE if the aggregated result needs to be joined with another array. Default=FALSE.
scidbdf array object
x by dimensions
and/or attributes in the array. applying the valid SciDB aggregation function
FUN expressed as a character string to the groups. Set eval to TRUE to
execute the aggregation and return a scidb object; set eval to FALSE to return
an unevaluated SciDB array promise, which is essentially a character string
describing the query that can be composed with other SciDB package functions.
If an R reduction function is speciied for
FUN, it will be
transliterated to a SciDB aggregate.
by argument must be a list of dimension names and/or attribute names
in the array
x to group by, or a SciDB array reference object. If
by is not specified and one of the
window options is not
specified, then a grand aggregate is performed over all values in the array.
by may be a list of dimension names and/or attributes of the
x. Attributes that are not of type int64 will be 'factorized' first
and replaced by enumerated int64 values that indicate each unique level (this
requires SciDB 13.6 or higher).
by is a SciDB array it must contain one or more common dimensions
x. The two arrays will be joined (using SciDB
cross_join(x,by) and the resulting array will be grouped by the
attributes in the
by array. This is similar to the usual R data.frame
Perform moving window aggregates by specifying the optional
variable_window arguments. Use
window to compute the aggregate
expression along a moving window specified along each coordinate axis as
window=c(dimension_1_low, dim_1_high, dim_2_low,_dim_2_high, ....
Moving window aggregates along coordinates may be applied in multiple
variable_window to perform moving window aggregates over data
values in a single dimension specified by the
by argument. See below
for examples. Moving window aggregates along data values are restricted
to a single array dimension.
scidbdf reference object.
B. W. Lewis <firstname.lastname@example.org>
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44
## Not run: # Create a copy of the iris data frame in a 1-d SciDB array named "iris." # Note that SciDB attribute names will be changed to conform to SciDB # naming convention. x <- as.scidb(iris,name="iris") # Compute averages of each variable grouped by Species a <- aggregate(x, by="Species", FUN=mean) # Aggregation by an auxillary vector (which in this example comes from # an R data frame)--also note any valid SciDB aggregation expression may # be used: y <- as.scidb(data.frame(sample(1:4,150,replace=TRUE))) a <- aggregate(x, by=y, FUN="avg(Petal_Width) as apw, min(Sepal_Length) as msl") # Use the window argument to perform moving window aggregates along coordinate # systems. You need to supply a window across all the array dimesions. set.seed(1) A <- as.scidb(matrix(rnorm(20),nrow=5)) # Compute a moving window aggregate only along the rows summing two rows at # a time (returning result to R). The notation (0,1,0,0) means apply the # aggregate over the current row (0) and (1) following row, and just over # the current column (that is, a window size of one). aggregate(A,FUN="sum(val)",window=c(0,1,0,0)) # The above aggregate is equivalent to, for example: apply(a,2,function(x) x+c(x[-1],0)) # Moving windows using the window= argument run along array coordinates. # Moving windows using the variable_window= argument run along data values, # skipping over empty array cells. The next example illustrates the # difference. # First, create an array with empty values: B <- A>0 # Here is what B looks like: B # Now, run a moving window aggregate along the rows with window just like # the above example: aggregate(B,FUN="sum(val)",window=c(0,1,0,0)) # And now, a moving window along only the data values down the rows, note # that we need to specify the dimension with by=: aggregate(B,by="i",FUN="sum(val)",variable_window=c(0,1)) ## End(Not run)