aggregate-methods: Functions to perform a calculation on multiple values and...

Aggregate functionsR Documentation

Functions to perform a calculation on multiple values and return a single value

Description

An aggregate function is a function where the values of multiple rows are grouped together as input to calculate a single value of more significant meaning or measurement. The aggregate functions included are mean, sum, count, max, min, standard deviation, and variance. Also included is a function to compute the mean value of each column and a function to compute the sum of each column.

Usage

## S4 method for signature 'db.obj'
mean(x, ...)

## S4 method for signature 'db.obj'
sum(x, ..., na.rm = FALSE)

## S4 method for signature 'db.obj'
count(x)

## S4 method for signature 'db.obj'
max(x, ..., na.rm = FALSE)

## S4 method for signature 'db.obj'
min(x, ..., na.rm = FALSE)

## S4 method for signature 'db.obj'
sd(x)

## S4 method for signature 'db.obj'
var(x)

## S4 method for signature 'db.obj'
colMeans(x, na.rm = FALSE, dims = 1, ...)

## S4 method for signature 'db.obj'
colSums(x, na.rm = FALSE, dims = 1, ...)

colAgg(x)

db.array(x, ...)

Arguments

x

A db.obj object. The signature of the method.

For db.array, x can also be a normal R object like double value.

...

further arguments passed to or from other methods This is currently not implemented.

na.rm

logical. Should missing values (including 'NaN') be removed? This is currently not implemented.

dims

integer: Which dimensions are regarded as 'rows' or 'columns' to sum over. This is currently not implemented and the default behavior is to sum over columns

Details

For aggregate functions: mean, sum, count, max, min, sd, and var, the signature x must be a reference to a single column in a table.

For aggregate functions: colMeans, colSums, and colAgg the signature x can be a db.obj referencing to a single column or a single table, or can be a db.Rquery referencing to multiple columns in a table.

Value

For mean, a db.Rquery which is a SQL query to extract the average of a column of a table. Actually, it can work on multiple columns, so it is the same as colMeans.

For sum, a db.Rquery which is a SQL query to extract the sum of a column of a table. Actually, it can work on multiple columns, so it is the same as colSums.

For count, a db.Rquery which is a SQL query to extract the count of a column of a table.

For max, a db.Rquery which is a SQL query to extract the max of a column of a table.

For min, a db.Rquery which is a SQL query to extract the min of a column of a table.

For sd, a db.Rquery which is a SQL query to extract the standard deviation of a column of a table.

For var, a db.Rquery which is a SQL query to extract the variance of a column of a table.

For colMeans, a db.Rquery which is a SQL query to extract the mean of multiple columns of a table.

For colSums, a db.Rquery which is a SQL query to extract the sum of multiple columns of a table.

For colAgg, a db.Rquery which is a SQL query to retreive the column values as an array aggregate.

For db.array, a db.Rquery which is a SQL query which combine all columns into an array.

Author(s)

Author: Predictive Analytics Team at Pivotal Inc.

Maintainer: Frank McQuillan, fmcquillan@pivotal.io

See Also

by,db.obj-method is usually used together with aggregate functions.

Examples

## Not run: 
## get the help for a method
## help("mean,db.obj-method")




## set up the database connection
## Assume that .port is port number and .dbname is the database name
cid <- db.connect(port = .port, dbname = .dbname, verbose = FALSE)

## ----------------------------------------------------------------------

## create a table from the example data.frame "abalone"
delete("abalone", conn.id = cid)
x <- as.db.data.frame(abalone, "abalone", conn.id = cid, verbose = FALSE)

## get the mean of a column
mean(x$diameter)

## get the sum of a column
sum(x$height)

## get the number of entries in a column
count(x$id)

## get the max value of a column
max(x$diameter)

## get the min value of a column
min(x$diameter)

## get the standard deviation of the values in column
sd(x$diameter)

## get the variance of the values in column
var(x$diameter)

## get the mean of all columns in the table
colMeans(x)

## get the sum of all columns in the table
colSums(x)

## get the array aggregate of a specific column in the table
colAgg(x$diameter)

## get the array aggregate of all columns in the table
colAgg(x)

## put everything into an array plus a constant 1 as the first element
db.array(1, x[,3:5], x[,6:7], x[,8:10])

## ----------------------------------------------------------------------

db.disconnect(cid, verbose = FALSE)

## End(Not run)

greenplum-db/GreenplumR documentation built on Sept. 2, 2023, 8:09 a.m.