computePercentiles: Compute percentiles of column values.

Description Usage Arguments Value Examples

Description

Compute percentiles including boxplot quartiles across values of column columnName. Multiple sets of percentiles achieved with the parameter by. Vector by may contain arbitrary number of column names: the percentiles are computed for each combination of values from these columns. Remember that when using computed quartiles with function createBoxplot it can utilize up to 3 columns by displaying them along the x-axis and inside facets.

Usage

1
2
3
4
5
computePercentiles(channel, tableName, columnName = NULL,
  columns = columnName, temporal = FALSE, percentiles = c(ifelse(temporal,
  5, 0), 5, 10, 25, 50, 75, 90, 95, 100), by = NULL, where = NULL,
  nameInDataFrame = "column", stringsAsFactors = FALSE, test = FALSE,
  parallel = FALSE)

Arguments

channel

connection object as returned by odbcConnect

tableName

Aster table name

columnName

deprecated. Use vector columns instead.

columns

names of the columns to compute percentiles on

temporal

logical: TRUE indicates all columns are temporal, otherwsie numerical. Temporal percentiles have 2 values: character value representing temporal percentile (date, time, timestamp or datetime) and integer epoch value of the number of seconds since 1970-01-01 00:00:00-00 (can be negative) or for interval values includeing time, the total number of seconds in the interval.

percentiles

integer vector with percentiles to compute. Values 0, 25, 50, 75, 100 will always be added if omitted for numerical types, and 25, 50, 75, 100 for temporal. Percentile 0 (minimum) has to be included explicitly for temporals as its computation affects performance more than others.

by

for optional grouping by one or more values for faceting or alike. Used with createBoxplot in combination with column name for x-axis and wrap or grid faceting.

where

specifies criteria to satisfy by the table rows before applying computation. The creteria are expressed in the form of SQL predicates (inside WHERE clause).

nameInDataFrame

name of the column in returned data frame to store table column name(s) defined by parameter columns. NULL indicates omit this column from the data frame (not recommended when computing percentiles for multiple columns).

stringsAsFactors

logical: should columns returned as character and not excluded by as.is and not converted to anything else be converted to factors?

test

logical: if TRUE show what would be done, only (similar to parameter test in RODBC functions like sqlQuery and sqlSave).

parallel

logical: enable parallel calls to Aster database. This option requires parallel backend enabled and registered (see in examples). Parallel execution requires ODBC channel obtained without explicit password: either with odbcConnect(dsn) or odbcDriverConnect calls, but not with odbcConnect(dsn, user, password).

Value

For numeric data function returns a data frame with percentile values organized into following columns:

For temporal data function returns a data frame with percentile values organized into following columns:

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
if(interactive()){
# initialize connection to Lahman baseball database in Aster 
conn = odbcDriverConnect(connection="driver={Aster ODBC Driver};
                         server=<dbhost>;port=2406;database=<dbname>;uid=<user>;pwd=<pw>")

# ipouts percentiles for pitching ipouts for AL in 2000s
ipop = computePercentiles(conn, "pitching", "ipouts",
                          where = "lgid = 'AL' and yearid >= 2000")

# ipouts percentiles by league
ipopLg = computePercentiles(conn, "pitching", "ipouts", by="lgid")

# percentiles on temporal columns
playerAllDates = computePercentiles(conn, "master_enh", 
                    columns=c('debut','finalgame','birthdate','deathdate'),
                    temporal=TRUE, percentiles=c(0))
createBoxplot(playerAllDates, x='column', value='epoch', useIQR=TRUE, 
              title="Boxplots for Date columns (epoch values)", 
              legendPosition="none")

}

teradata-aster-field/toaster documentation built on May 31, 2019, 8:36 a.m.