rmb: Multiple Barchart for relative frequencies and generalized...

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/rmb.R

Description

The rmb function basically produces a Multiple Barchart for the relative frequencies of some target categories within each combination of the explanatory variables. The weights of those combinations (that is the absolute frequencies) are represented in the total with of the corresponding barchart. The result is a graphic which allows to read the conditional target distributions exactly from the graphic without losing the information about the importance (in the sense of the number of observations) of the different combinations.

Additionally the rmb function allows to draw spineplots instead of the barcharts within each explanatory combination. On that score it can be seen as a generalization of Spineplots.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
## S3 method for class 'formula'
rmb(formula, data, col.vars = NULL, spine = FALSE,
 circular = FALSE, eqwidth = FALSE, cat.ord = NULL,   cut = NULL,
 innerval = 1,   freq.trans = NULL, num.mode = FALSE, max.scale = 1,
 use.na = FALSE,  expected = NULL, residuals = NULL, model.opt = list(),
 gap.prop = 0.2, gap.mult = 1.5, col = "hcl",col.opt = list(), label = TRUE,
  label.opt = list(),  vp = NULL, ...)
## S3 method for class 'ftable'
rmb(x, col.vars = NULL, spine = FALSE, circular = FALSE,
eqwidth = FALSE, cat.ord = NULL,  freq.trans = NULL, max.scale = 1,
use.na = FALSE, expected = NULL, residuals = NULL, model.opt = list(),
gap.prop = 0.2, gap.mult = 1.5, col = "hcl",col.opt = list(), label = TRUE,
label.opt = list(), vp = NULL, ...)

Arguments

x

Either a table or a model of class "glm" and family "poisson" or "binomial". A table must be either of class table or of class ftable. The latter also implicitly defines the the order in which the variables will be added to the plot. The arguments formula and data will be omitted. Please note that the model based version is still beta and will be improved in a future release.

formula

The formula specifying the variables in their given order with the last variable being the target variable. The left hand side defines a weighting variable. If the weights are frequencies in a variable called "Freq" this is detected automatically if no other variable is defined.

data

The dataset as a data.frame or ftable.

col.vars

Logical vector with split directions where TRUE stands for horizontal splitting. The last (target) variable is always arranged on the x-axis.

spine

If TRUE a spineplot will be drawn instead of each barchart. This is recommended for binary target variables.

circular

If TRUE a piechart will be drawn instead of each barchart. spine is set to FALSE.

eqwidth

If TRUE the bar length of the multiple barchart in the background no longer restricts the width of the barcharts/spineplots for the relative frequencies of the target variable.

cat.ord

A vector specifying the categories of the target variable which will be visualized in the specified order. The default is to use all categories.

cut

Numeric variables will be cut into this number of intervals. May also be a vector with specifications for each variable.

innerval

The function innerval is used to reduce numeric variables to an interval which is symmetric around the median contains the specified proportion of observations (or as close to this as possible).

freq.trans

This parameter allows to transform the absolute frequencies used for the underlying multiple barchart. Possible values are "log", "sqrt" or list("sqrt",k). The latter stands for the k-th root transformation.

num.mode

In the numeric mode the gaps are removed and axes typical for numeric variables are drawn. Ignored for factor variables.

max.scale

The maximum value of the probability (y-axis) scale for each combination. Unsurprisingly the default is 1. The axis will be drawn if yaxis is TRUE.

use.na

If TRUE missing values will be changed to a level "N/A" and else (which is the default) the function na.omit will be called to reduce the dataset to complete cases only.

expected

There are three possibilities how to specify this parameter:

1. A list of integer vectors denoting the interaction terms in the poisson or proportional odds model, e.g. list( c(1,2,3), c(1,4) ) for all interactions between variables 1,2 and 3 as well as between 1 and 4.
2. A logical indicating whether or not to use a model (logit independence model).
3. A vector with expected values, e.g. from a model. If residuals remains undefined the response residuals will be plotted.

If undefinded or set to FALSE only the observed values will be plotted.

residuals

If expected is a vector with expected values it is also possible to specify residuals. This is used internally by rmb.glm.

model.opt

A list with optional parameters for model specifications. Possible parameters are:

use.ecpected.values A logical specifying whether or not to use the frequencies
predicted by the model instead of the observed frequencies.
mod.type Either "poisson" or "polr".
See glm and polr.
resid.type "pearson", "deviance", working, partial
or "response". For polr models only the latter is available.
resid.display One of "values", "color" or "both". "values" will result in bars
or wedges for both expected and observed frequencies. Hence the raw residuals are shown in the graphic.
"color" will set the col argument aside and use colors on a red-blue-scale
to represent (pearson) residuals. "both" does both.
max.rat If a model is specified and resid.display = "both" the x-scales
will not be reduced to less than 1/max.rat:
The x-scales are reduced whenever an observed frequency exceeds the maximal scale.
gap.prop

The maximum proportion of the total plot width which is used for the gaps.

gap.mult

The incremental multiplier for the gaps of different dimensions. The gaps corresponding to any one variable are gap.mult times larger than those corresponding to the next variable on the same axis.

col

Either a vector defining the colors of the bars or a name specifying a palette: "hsv" and "rgb" for hsv-based rainbow colors, "hcl" for hcl-based rainbow colors (default), "div" or "diverge" for hcl-based diverging colors and finally "seq" or "sequential" for hcl-based sequential colors.Additional arguments can be specified via the col.opt argument according to the underlying functions in the colorspace package, e.g. rainbow_hcl. For the hsv-based colors see rainbow. Specifying a color or palette has no effect if an expected model is defined.

col.opt

Further options for the color palettes. See e.g. rainbow_hcl or rainbow. Other parameters are:

col2 for the color of the background/weight bars,
line.col for the color of all lines (bars, rectangles),
bg for the background color of the whole graphic,
bgs for the background color of each tile
label

Either a logical specifying whether or not to draw labels or a numeric vector defining which variables shall be labelled.

label.opt

A list with optional parameters for label specifications. Possible parameters are:

yaxis If TRUE a vertical axis will be drawn at both sides of the plot.
This is recommended when changing the max.scale argument.
boxes Should the labels be surrounded by boxes?
lab.tv Should the target variable be included in the labeling?
varnames Should the variable names be shown as labels?
abbrev An single integer value or a vector of integer values specifying the number
of characters to which the labels will automatically be abbreviated.
lab.cex The fontsize multplier.
vp

An optional viewport to plot in. vp = c(i, j) can be used as a shortcut to viewport(layout.pos.row = i, layout.pos.col = j)

...

further arguments. Usually not necessary.

Details

A similar way to regard the graphic is the following: A Multiple Barchart of the explanatory variables is drawn with bars in horizontal direction. Then within each of the resulting bars a barchart of the conditional distribution of the target variable is drawn with bars in vertical direction.

Value

invisible(TRUE)

Author(s)

Alexander Pilhoefer
Department for Computer Oriented Statistics and Data Analysis
University of Augsburg
Germany

References

Alexander Pilhoefer, Antony Unwin (2013). New Approaches in Visualization of Categorical Data: R Package extracat. Journal of Statistical Software, 53(7), 1-25. URL http://www.jstatsoft.org/v53/i07/

See Also

mosaicplot

Examples

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
  require(MASS)
    # simple example
    rmb(formula = ~Type+Infl+Cont+Sat, data = housing, gap.mult = 2,
        col.vars = c(FALSE,TRUE,TRUE,FALSE), label.opt = list(abbrev = 3))
    
    # with sqrt-transformation and horizontal splits only
    rmb(formula = ~Type+Infl+Cont+Sat, data = housing, gap.mult = 2,
        col.vars = c(TRUE,TRUE,TRUE,TRUE), freq.trans = "sqrt",
           label.opt = list(abbrev = 3) )
    
    # a generalized spineplot with the first category highlighted
    rmb(formula = ~Type+Infl+Cont+Sat, data = housing, spine = TRUE, 
        cat.ord = 1, mult = 2, col.vars = c(1,3,4), 
        freq.trans = list("sqrt",3),  label.opt = list(abbrev = 2))
  ## Not run:   
    # a generalized spineplot with all categories highlighted 
    # in a changed order
    rmb(formula = ~Type+Infl+Cont+Sat, data = housing, spine = TRUE,
        cat.ord = c(3,1,2), gap.mult = 2, col.vars = c(TRUE,FALSE,TRUE,TRUE),
        freq.trans = "sqrt",  label.opt = list(abbrev = 3))
    
    # the barchart version only for categories 1 and 3
    rmb(formula = ~Type+Infl+Cont+Sat, data = housing, 
        cat.ord = c(1,3), gap.mult = 2, col.vars = c(TRUE,FALSE,TRUE,TRUE),
        freq.trans = "sqrt",  label.opt = list(abbrev = c(4,1,1,1)))
        
        
    # with equal widths
    rmb(formula = ~Type+Infl+Cont+Sat, data = housing, eqwidth = TRUE,
        gap.mult = 2, col.vars = c(TRUE,FALSE,TRUE,TRUE), 
         label.opt = list(abbrev = 2, lab.tv = TRUE))
    
    # ----- models and residuals ----- #
    # using the logistic model: Sat by Type only
    
    #   residual shadings and expected values
    rmb(formula = ~Type+Infl+Cont+Sat, data = housing,
        gap.mult = 2, col.vars = c(TRUE,FALSE,TRUE,TRUE), 
        label.opt = list(abbrev = 3), expected = list(c(1,2,3),c(1,4)),
        model.opt = list(use.expected.values = TRUE, resid.display = "color") )
       
    #   residual values without shadings
    rmb(formula = ~Type+Infl+Cont+Sat, data = housing,
        gap.mult = 2, col.vars = c(TRUE,FALSE,TRUE,TRUE), 
        label.opt = list(abbrev = 3), expected = list(c(1,2,3),c(1,4)),
        model.opt = list( resid.display = "values") )
    
    #   residual shadings and expected values
    rmb(formula = ~Type+Infl+Cont+Sat, data = housing,
        gap.mult = 2, col.vars = c(TRUE,FALSE,TRUE,TRUE), 
        label.opt = list(abbrev = 3), expected = list(c(1,2,3),c(1,4)),
        model.opt = list(use.expected.values = TRUE, resid.display = "color") )
   
    #   barcharts with residual shadings and values
    rmb(formula = ~Type+Infl+Cont+Sat, data = housing,
        gap.mult = 2, col.vars = c(TRUE,FALSE,TRUE,TRUE), 
        label.opt = list(abbrev = 3), expected = list(c(1,2,3),c(1,4)) )
        
    #   spineplots with residual shadings and values 
    rmb(formula = ~Type+Infl+Cont+Sat, data = housing, spine = TRUE,
        gap.mult = 2, col.vars = c(TRUE,FALSE,TRUE,TRUE), 
        label.opt = list(abbrev = 3), expected = list(c(1,2,3),c(1,4)) )
        
    #   piecharts with residual shadings and values
    rmb(formula = ~Type+Infl+Cont+Sat, data = housing, circular = TRUE,
        gap.mult = 2, col.vars = c(TRUE,FALSE,TRUE,TRUE), 
        label.opt = list(abbrev = 3), expected = list(c(1,2,3),c(1,4)) )
        
     # ----- using an ftable to create the plot ----- #
     tt = xtabs(Freq~Type+Cont+Infl+Sat, data = housing)
     ft = ftable(tt, col.vars= c(1,4))
      rmb(tt)
      rmb(ft)

    # ----- using a glm model ----- #
    mod1 <- glm(Freq ~ Type*Infl*Cont + Type*Sat, data = housing, family = poisson)
    rmb(mod1, circular = TRUE,
        gap.mult = 2, col.vars = c(TRUE,FALSE,TRUE,TRUE), 
        label.opt = list(abbrev = 3), model.opt = list(use.expected.values = TRUE) )
        
        
      # ----- the numeric mode and cuts ----- #   
     data(olives)
     # only three cuts to show how it works
     rmb(~palmitoleic+stearic+Region, data = olives, cut = c(3,3,0))
     
     require(ggplot2)
     data(diamonds)
     diamonds$lprice <- log(diamonds$price)
     # a minority of extreme observations mess the display up:
     rmb(~depth+table+lprice, data = diamonds, eqwidth = TRUE, spine = TRUE, 
     cut = c(36,36,5), col = "seq", num.mode = TRUE)
     
     # we can zoom in via innerval:
      rmb(~depth+table+lprice, data = diamonds,  circular = TRUE, 
      cut = c(36,36,5), col = "div", innerval = 0.95, 
      num.mode = TRUE, freq.trans ="log")
      
           # price, carat and color
        diamonds$lprice <- log(diamonds$price)
        diamonds$lcarat <- log(diamonds$carat)     
      rmb(~lcarat+lprice+color, data = diamonds,
      cut = c(24,24,0), col = "rgb", num.mode = TRUE,
       freq.trans="sqrt", eqwidth=TRUE, max.scale=0.5)
       
## End(Not run)

extracat documentation built on July 17, 2018, 5:05 p.m.

Related to rmb in extracat...