showData: Plot table level statistics, histograms, correlations and...

Description Usage Arguments Details Value Examples

View source: R/showData.R

Description

showData is the basic plotting function in the toaster package, designed to produce set of standard visualizations (see parameter format) in a single call. Depending on the format it is a wrapper to other functions or simple plotting function. It does all work in a single call by combining database round-trip (if necessary) and plotting functionality.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
showData(channel = NULL, tableName = NULL, tableInfo = NULL,
  include = NULL, except = NULL, type = "numeric", format = "histgoram",
  measures = NULL, title = paste("Table", toupper(tableName), format, "of",
  type, "columns"), numBins = 30, useIQR = FALSE, extraPoints = NULL,
  extraPointShape = 15, sampleFraction = NULL, sampleSize = NULL,
  pointColour = NULL, facetName = NULL, regressionLine = FALSE,
  corrLabel = "none", digits = 2, shape = 21, shapeSizeRange = c(1, 10),
  facet = ifelse(format == "overview", TRUE, FALSE), scales = ifelse(facet &
  format == "boxplot", "free", ifelse(facet & format == "overview", "free_y",
  "fixed")), ncol = 4, coordFlip = FALSE, paletteName = "Set1",
  baseSize = 12, baseFamily = "sans", legendPosition = "none",
  defaultTheme = theme_tufte(base_size = baseSize, base_family = baseFamily),
  themeExtra = NULL, where = NULL, test = FALSE)

Arguments

channel

connection object as returned by odbcConnect

tableName

Aster table name

tableInfo

pre-built summary of data to use (parameters channel, tableName, where may not apply depending on format). See getTableSummary.

include

a vector of column names to include. Output never contains attributes other than in the list.

except

a vector of column names to exclude. Output never contains attributes from the list.

type

what type of data to visualize: numerical ("numeric"), character ("character" or date/time ("temporal")

format

type of plot to use: 'overview', 'histogram', 'boxplot', 'corr' for correlation matrix or 'scatterplot'

measures

applies to format 'overview' only. Use one or more of the following with 'numieric' type: maximum,minimum,average,deviation,0 type: distinct_count,not_null_count. By default all measures above are used per respeictive type.

title

plot title

numBins

number of bins to use in histogram(s)

useIQR

logical indicates use of IQR interval to compute cutoff lower and upper bounds for values to be included in boxplot or histogram: [Q1 - 1.5 * IQR, Q3 + 1.5 * IQR], IQR = Q3 - Q1, if FALSE then maximum and minimum are bounds (all values)

extraPoints

vector contains names of extra points to add to boxplot lines.

extraPointShape

extra point shape (see 'Shape examples' in aes_linetype_size_shape).

sampleFraction

sample fraction to use in the sampling of data for 'scatterplot'

sampleSize

if sampleFraction is not specified then size of sample must be specified for 'scatterplot'.

pointColour

name of column with values to colour points in 'scatterplot'.

facetName

name(s) of the column(s) to use for faceting when format is 'scatterplot'. When single name then facet wrap kind of faceting is used. When two names then facet grid kind of faceting is used. It overrides facet value in case of 'scatterplot'. Must be part of column list (e.g. include).

regressionLine

logical if TRUE then adds regression line to scatterplot.

corrLabel

column name to use to label correlation table: 'value', 'pair', or 'none' (default)

digits

number of digits to use in correlation table text (when displaying correlation coefficient value)

shape

shape of correlation figure (default is 21)

shapeSizeRange

correlation figure size range

facet

Logical - if TRUE then divide plot into facets for each COLUMN (defualt is FALSE - no facets). When set to TRUE and format is 'boxplot' scales defalut changes from 'fixed' to 'free'. Has no effect when format is 'corr'.

scales

Are scales shared across all facets: "fixed" - all are the same, "free_x" - vary across rows (x axis), "free_y" - vary across columns (Y axis) (default), "free" - both rows and columns (see in facet_wrap parameter scales. Also see parameter facet for details on default values.)

ncol

Number of columns in facet wrap.

coordFlip

logical flipped cartesian coordinates so that horizontal becomes vertical, and vertical, horizontal (see coord_flip).

paletteName

palette name to use (run display.brewer.all to see available palettes).

baseSize

base font size.

baseFamily

base font family.

legendPosition

legend position.

defaultTheme

plot theme to use, default is theme_bw.

themeExtra

any additional ggplot2 theme attributes to add.

where

SQL WHERE clause limiting data from the table (use SQL as if in WHERE clause but omit keyword WHERE).

test

logical: when applicable if TRUE show what would be done, only (similar to parameter test in RODBC functions like sqlQuery and sqlSave). Doesn't apply when no sql expected to run, e.g. format is 'boxplot'.

Details

All formats support parameters include and except to include and exclude table columns respectively. The include list guarantees that no columns outside of the list will be included in the results. The excpet list guarantees that its columns will not be included in the results.

Format overview: produce set of histograms - one for each statistic measure - across table columns. Thus, it allows to compare averages, IQR, etc. across all or selected columns.

Format boxplot: produce boxplots for table columns. Boxplots can belong to the same plot or can be placed inside facet each (see logical parameter facet).

Format histogram: produce histograms - one for each column - in a single plot or in facets (see logical parameter facet).

Format corr: produce correlation matrix of numeric columns.

Format scatterplot: produce scatterplots of sampled data.

Value

a ggplot object

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
if(interactive()){
# initialize connection to Lahman baseball database in Aster 
conn = odbcDriverConnect(connection="driver={Aster ODBC Driver};
                         server=<dbhost>;port=2406;database=<dbname>;uid=<user>;pwd=<pw>")

# get summaries to save time
pitchingInfo = getTableSummary(conn, 'pitching_enh')
battingInfo = getTableSummary(conn, 'batting_enh')

# Boxplots
# all numerical attributes
showData(conn, tableInfo=pitchingInfo, format='boxplot', 
         title='Boxplots of numeric columns')
# select certain attributes only
showData(conn, tableInfo=pitchingInfo, format='boxplot', 
         include=c('wp','whip', 'w', 'sv', 'sho', 'l', 'ktobb', 'ibb', 'hbp', 'fip', 
                   'era', 'cg', 'bk', 'baopp'), 
         useIQR=TRUE, title='Boxplots of Pitching Stats')
# exclude certain attributes
showData(conn, tableInfo=pitchingInfo, format='boxplot', 
         except=c('item_id','ingredient_item_id','facility_id','rownum','decadeid','yearid',
                  'bfp','ipouts'),
         useIQR=TRUE, title='Boxplots of Pitching Stats')
# flip coordinates
showData(conn, tableInfo=pitchingInfo, format='boxplot', 
         except=c('item_id','ingredient_item_id','facility_id','rownum','decadeid','yearid',
                  'bfp','ipouts'),
         useIQR=TRUE, coordFlip=TRUE, title='Boxplots of Pitching Stats')

# boxplot with facet (facet_wrap)
showData(conn, tableInfo=pitchingInfo, format='boxplot',
         include=c('bfp','er','h','ipouts','r','so'), facet=TRUE, scales='free',
         useIQR=TRUE, title='Boxplots Pitching Stats: bfp, er, h, ipouts, r, so')

# Correlation matrix
# on all numerical attributes
showData(conn, tableName='pitching_enh', tableInfo=pitchingInfo, 
         format='corr')

# correlation matrix on selected attributes
# with labeling by attribute pair name and
# controlling size of correlation bubbles
showData(conn, tableName='pitching', tableInfo=pitchingInfo, 
         include=c('era','h','hr','gs','g','sv'), 
         format='corr', corrLabel='pair', shapeSizeRange=c(5,25))

# Histogram on all numeric attributes
showData(conn, tableName='pitching', tableInfo=pitchingInfo, include=c('hr'), 
         format='histogram')

# Overview is a histogram of statistical measures across attributes
showData(conn, tableName='pitching', tableInfo=pitchingInfo, 
         format='overview', type='numeric', scales="free_y")

# Scatterplots
# Scatterplot on pair of numerical attributes
# sample by size with 1d facet (see \code{\link{facet_wrap}})
showData(conn, 'pitching_enh', format='scatterplot', 
         include=c('so', 'er'), facetName="lgid", pointColour="lgid", 
         sampleSize=10000, regressionLine=TRUE,
         title="SO vs ER by League 1980-2000",
         where='yearid between 1980 and 2000')

# sample by fraction with 2d facet (see \code{\link{facet_grid}})
showData(conn, 'pitching_enh', format='scatterplot', 
         include=c('so','er'), facetName=c('lgid','decadeid'), pointColour="lgid",
         sampleFraction=0.1, regressionLine=TRUE,
         title="SO vs ER by League by Decade 1980 - 2012",
         where='yearid between 1980 and 2012')
}

toaster documentation built on May 30, 2017, 3:51 a.m.