createPopPyramid: Create Population Pyramid type of histogram plot.

Description Usage Arguments Value Examples

View source: R/plotting.R

Description

Create population pyramid type of histogram plot: two back-to-back bar graphs on the same category class (e.g. age) placed on Y-axis and distribution (population) placed on the X-axis. Bar graphs correspond to two distinct groups, e.g. sex (male and female), baseball leagues (AL and NL), or customer types (new customers and established customers).

Usage

1
2
3
4
5
6
7
8
createPopPyramid(data, bin = "bin_start", count = "bin_count", divideBy,
  values = NULL, fillColours = c("blue", "red"), mainColour = "black",
  facet = NULL, ncol = 1, facetScales = "fixed", baseSize = 12,
  baseFamily = "sans", title = paste("Population Pyramid Histogram by",
  divideBy), subtitle = NULL, xlab = bin, ylab = count,
  legendPosition = "right", fillGuide = "legend",
  defaultTheme = theme_tufte(base_size = baseSize, base_family = baseFamily),
  themeExtra = NULL)

Arguments

data

data frame contains 2 histograms for the same bins. Bins are divided into 2 sets with parameter divideBy.

bin

name of a column containing bin labels or interval values

count

name of a column containing bin values or counts (bin size)

divideBy

name of the column to divide data into two histograms

values

two-valued vector containing values in divideBy (optional). If missing then it uses 1st 2 values from column divideBy (sorted with default order).

fillColours

2-value vector with colours for left and right histograms.

mainColour

histogram bar colour.

facet

vector of 1 or 2 column names to split up data to plot the subsets as facets. If single name then subset plots are placed next to each other, wrapping with ncol number of columns (uses facet_wrap). When two names then subset plots vary on both horizontal and vertical directions (grid) based on the column values (uses facet_grid).

ncol

number of facet columns (applies when single facet column supplied only - see parameter facet).

facetScales

Are scales shared across all subset plots (facets): "fixed" - all are the same, "free_x" - vary across rows (x axis), "free_y" - vary across columns (Y axis, default), "free" - both rows and columns (see in facet_wrap parameter scales )

baseSize

theme base font size

baseFamily

theme base font family

title

plot title.

subtitle

plot subtitle.

xlab

a label for the x axis, defaults to a description of x.

ylab

a label for the y axis, defaults to a description of y.

legendPosition

the position of legends. ("left", "right", "bottom", "top", or two-element numeric vector). "none" is no legend.

fillGuide

Name of guide object, or object itself for divideBy. Typically "legend" name or object guide_legend.

defaultTheme

plot theme settings with default value theme_tufte. More themes are available here: ggtheme (by ggplot2) and ggthemes.

themeExtra

any additional theme settings that override default theme.

Value

ggplot object

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
if(interactive()){
# initialize connection to Lahman baseball database in Aster 
conn = odbcDriverConnect(connection="driver={Aster ODBC Driver};
                         server=<dbhost>;port=2406;database=<dbname>;uid=<user>;pwd=<pw>")

pitchingInfo = getTableSummary(asterConn, tableName='pitching', 
                               where='yearid between 2000 and 2013')
battingInfo = getTableSummary(asterConn, tableName='batting', 
                              where='yearid between 2000 and 2013')

salaryHistAll = computeHistogram(asterConn, tableName='public.salaries', columnName='salary',
                                 binsize=200000, startvalue=0, 
                                 by='lgid', where='yearID between 2000 and 2013')
createPopPyramid(data=salaryHistAll, bin='bin_start', count='bin_count', divideBy='lgid', 
                 values=c('NL','AL'),
                 title="Salary Pyramid by MLB Leagues", 
                 xlab='Salary', ylab='Player Count')

salaryHist5Mil = computeHistogram(asterConn, tableName='salaries', columnName='salary', 
                                  binsize=100000, startvalue=0, endvalue=5000000,
                                  by='lgid', where='yearID between 2000 and 2013')
createPopPyramid(data=salaryHist5Mil, divideBy='lgid', values=c('NL','AL'),
                 title="Salary Pyramid by MLB Leagues (less 5M only)", 
                 xlab='Salary', ylab='Player Count')

eraHist = computeHistogram(asterConn, tableName='pitching', columnName='era', 
                           binsize=.1, startvalue=0, endvalue=10,
                           by='lgid', where='yearid between 2000 and 2013')
createPopPyramid(data=eraHist, divideBy='lgid', values=c('NL','AL'),
                 title="ERA Pyramid by MLB Leagues", xlab='ERA', ylab='Player Count')

# Log ERA
eraLogHist = computeHistogram(asterConn, tableName='pitching', columnName='era_log', 
                              binsize=.02, startvalue=-0.42021640338318984325, 
                              endvalue=2.2764618041732441,
                              by='lgid', where='yearid between 2000 and 2013 and era > 0')
createPopPyramid(data=eraLogHist, divideBy='lgid', values=c('NL','AL'),
                 title="log(ERA) Pyramid by MLB Leagues", 
                 xlab='log(ERA)', ylab='Player Count')

# Batting (BA)
battingHist = computeHistogram(asterConn, tableName='batting_enh', columnName='ba', 
                               binsize=.01, startvalue=0.01, endvalue=0.51,
                               by='lgid', where='yearid between 2000 and 2013')
createPopPyramid(data=battingHist, divideBy='lgid', values=c('NL','AL'),
                 title="Batting BA Pyramid by MLB Leages", xlab='BA', ylab='Player Count')
}

toaster documentation built on May 30, 2017, 3:51 a.m.