tableplot: tableplot

View source: R/tableplot.R

tableplotR Documentation

tableplot

Description

A tableplot is a visualisation of multivariate data sets. Each column represents a variable and each row bin is an aggregate of a certain number of records. For numeric variables, a value box is plotted with minimum, mean (black line) and maximum value. If any missing values in a bin of a numeric variable appear the box left from the value box is plotted in gray. For categorical variables, a stacked bar chart is depicted of the proportions of categories. Missing values are taken into account.

Usage

tableplot(
  x,
  select = NULL,
  subset = NULL,
  bin = NULL,
  yj = NA,
  IQR_bias = 5,
  colpal = grDevices::rainbow,
  color.NA_num = "gray75",
  color.NA = "grey75",
  color.num = "lightblue",
  color.box = "deepskyblue",
  color.line = "black",
  box.lower = NULL,
  box.upper = NULL,
  box.line = NULL,
  cex.main = 1,
  cex.legend = 1,
  width = 1,
  height = 0.15
)

Arguments

x

data frame

select

numeric/character: variable to show in the plot (default: NULL)

subset

numeric: index of observations to show

bin

integer: bin numbers to which a observations belongs (default: NULL = all)

yj

numeric: Yeo Johnson coefficient (default: NA). If NA then it will be set to 0 (=log) or 1 (=identity)

IQR_bias

numeric: parameter that determines when a logarithmic scale is used when yj is set to NA. The argument IQR_bias is multiplied by the interquartile range as a test.

colpal

color palette to draw (default: rainbow)

color.NA_num

color for missing of infinity values for numeric variables (default: gray75)

color.NA

color for missing values for categorical variables (default: grey75)

color.num

color for lower box for numeric variables (default: lightblue)

color.box

color for upper box for numeric variables (default: deepskyblue)

color.line

color for line in upper box for numeric variables (default: black)

box.lower

function: determine lower border in upper box for numeric variables (default: NULL). If NULL then min(.,na.rm=TRUE) is used.

box.upper

function: determine upper border in upper box for numeric variables (default: NULL). If NULL then max(.,na.rm=TRUE) is used.

box.line

function: determine line position in upper box for numeric variables (default: NULL). If NULL then mean(.,na.rm=TRUE) is used.

cex.main

number: magnification to be used for the titles (default: 1)

cex.legend

number: magnification to be used for the legends (default: 1)

width

number: width of percentage axis (default: 1). If 1 then the width is as wide as a plot.

height

number: percentage of the height of the legends (default: 0.15)

Details

The idea and some code of the tableplot is taken from tableplot package by Martijn Tennekes and Edwin de Jonge. It differs from their package by

  • multicolumn sorting is possible, and

  • no support for 'ff' (out of memory vectors).

Value

nothing

References

Tennekes, M., Jonge, E. de, Daas, P.J.H. (2013), Visualizing and Inspecting Large Datasets with Tableplots, Journal of Data Science 11 (1), 43-58.

Examples

data("Boston", package="MASS")
tableplot(Boston, bin=sortbin(Boston))

sigbertklinke/smvgraph documentation built on Dec. 10, 2022, 9:13 a.m.