computeHistogram: Compute histogram distribution of the column.

Description Usage Arguments See Also Examples

View source: R/computeHistogram.R

Description

Compute histogram of the table column in Aster by mapping its value to bins based on parameters specified. When column is of numeric or temporal data type it uses map-reduce histogram function over continuous values. When column is categorical (character data types) it defers to computeBarchart that uses SQL aggregate COUNT(*) with GROUP BY <column>. Result is a data frame to visualize as bar charts (see creating visualizations with createHistogram).

Usage

1
2
3
4
5
computeHistogram(channel, tableName, columnName, tableInfo = NULL,
  columnFrequency = FALSE, binMethod = "manual", binsize = NULL,
  startvalue = NULL, endvalue = NULL, numbins = NULL, useIQR = TRUE,
  datepart = NULL, where = NULL, by = NULL, test = FALSE,
  oldStyle = FALSE)

Arguments

channel

connection object as returned by odbcConnect

tableName

Aster table name

columnName

table column name to compute histogram

tableInfo

pre-built summary of data to use (require when test=TRUE). See getTableSummary.

columnFrequency

logical indicates to build histogram of frequencies of column

binMethod

one of several methods to determine number and size of bins: 'manual' indicates to use paramters below, both 'Sturges' or 'Scott' will use corresponding methods of computing number of bins and width (see http://en.wikipedia.org/wiki/Histogram#Number_of_bins_and_width).

binsize

size (width) of discrete intervals defining histogram (all bins are equal)

startvalue

lower end (bound) of values to include in histogram

endvalue

upper end (bound) of values to include in histogram

numbins

number of bins to use in histogram

useIQR

logical indicates use of IQR interval to compute cutoff lower and upper bounds for values to be included in histogram: [Q1 - 1.5 * IQR, Q3 + 1.5 * IQR], IQR = Q3 - Q1

datepart

field to extract from timestamp/date/time column to build histogram on

where

specifies criteria to satisfy by the table rows before applying computation. The creteria are expressed in the form of SQL predicates (inside WHERE clause).

by

for optional grouping by one or more values for faceting or alike

test

logical: if TRUE show what would be done, only (similar to parameter test in RODBC functions like sqlQuery and sqlSave).

oldStyle

logical indicates if old style histogram paramters are in use (before Aster AF 5.11)

See Also

computeBarchart and createHistogram

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
if(interactive()){
# initialize connection to Lahman baseball database in Aster 
conn = odbcDriverConnect(connection="driver={Aster ODBC Driver};
                         server=<dbhost>;port=2406;database=<dbname>;uid=<user>;pwd=<pw>")

# Histogram of team ERA distribution: Rangers vs. Yankees in 2000s
h2000s = computeHistogram(channel=conn, tableName='pitching_enh', columnName='era',
                          binsize=0.2, startvalue=0, endvalue=10, by='teamid',
                          where="yearID between 2000 and 2012 and teamid in ('NYA','TEX')")
createHistogram(h2000s, fill='teamid', facet='teamid', 
                title='TEX vs. NYY 2000-2012', xlab='ERA', ylab='count',
                legendPosition='none') 
}

Example output

Loading required package: RODBC

toaster documentation built on May 30, 2017, 3:51 a.m.