Description Usage Arguments Details Value Note References See Also Examples
A tableplot is a visualisation of (large) multivariate datasets. Each column represents a variable and each row bin is an aggregate of a certain number of records. For numeric variables, a bar chart of the mean values is depicted. For categorical variables, a stacked bar chart is depicted of the proportions of categories. Missing values are taken into account. Also supports large ffdf
datasets from the ff
package. For a quick intro, see vignette("tabplot-vignette")
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 | tableplot(
dat,
select,
subset = NULL,
sortCol = 1,
decreasing = TRUE,
nBins = 100,
from = 0,
to = 100,
nCols = ncol(dat),
sample = FALSE,
sampleBinSize = 1000,
scales = "auto",
numMode = "mb-sdb-ml",
max_levels = 50,
pals = list("Set1", "Set2", "Set3", "Set4"),
change_palette_type_at = 20,
rev_legend = FALSE,
colorNA = "#FF1414",
colorNA_num = "gray75",
numPals = "OrBu",
limitsX = NULL,
bias_brokenX = 0.8,
IQR_bias = 5,
select_string = NULL,
subset_string = NULL,
colNames = NULL,
filter = NULL,
plot = TRUE,
...
)
|
dat |
a |
select |
expression indicating the columns of |
subset |
logical expression indicing which rows to select in |
sortCol |
column name on which the dataset is sorted. It can be an index, expression name, or a character string. PS: in case of ambiguity, the character string is used like in this example: |
decreasing |
boolean that determines whether the dataset is sorted decreasingly ( |
nBins |
number of row bins |
from |
percentage from which the sorted data is shown |
to |
percentage to which the sorted data is shown |
nCols |
the maximum number of columns per tableplot. If this number is smaller than the number of columns selected in |
sample |
boolean that determines whether to sample or use the whole data. Only useful when |
sampleBinSize |
the number of sampled objects per bin, if |
scales |
determines the horizontal axes of the numeric variables in |
numMode |
character value that determines how numeric values are plotted. The value consists of the following building blocks, which are concatenated with the "-" symbol. The default value is "mb-sdb-sdl". Prior to version 1.2, "MB-ML" was the default value.
|
max_levels |
maximum number of levels for categorical variables. Categorical variables with more levels will be rebinned into |
pals |
list of color palettes. Each list item is on of the following:
If the list items are unnamed, they are applied to all selected categorical variables (recycled if necessary). The list items can be assigned to specific categorical variables, by naming them accordingly. |
change_palette_type_at |
number at which the type of categorical palettes is changed. For categorical variables with less than |
rev_legend |
logical value or vector that determines which legends are reversed. If a vector is provided, the names of the items should the names of (a selection of) the categorical variables. |
colorNA |
color for missing values for categorical variables. |
colorNA_num |
color for missing values for numeric variables. It is used when all values in a bin are missing. If a part of the values are missing, a brighter color is used (see argument |
numPals |
vector of palette names that are used for numeric variables. These names are chosen from the diverging palette names in |
limitsX |
a list of vectors of length two, where each vector contains a lower and an upper limit value. Either the names of |
bias_brokenX |
parameter between 0 en 1 that determines when the x-axis of a numeric variable is broken. If minimum value is at least |
IQR_bias |
parameter that determines when a logarithmic scale is used when |
select_string |
character equivalent of the |
subset_string |
character equivalent of the |
colNames |
deprecated; used in older versions of tabplot (prior to 0.12): use |
filter |
deprecated; used in older versions of tabplot (prior to 0.12): use |
plot |
boolean, to plot or not to plot a tableplot |
... |
layout arguments, such as |
For large dataset, we recommend to use tablePrepare
which does all the necessary preprocessing that are needed to make any tableplot of the particular dataset. The resulting object of this function is passed on to tableplot
(argument dat
). Now tableplotting is very fast, and even faster with sampling enabled (sample=TRUE
).
tabplot-object
(silent output). If multiple tableplots are generated (which can be done by either setting subset
to a categorical column name, or by restricting the number of columns with nCols
), then a list of tabplot-objects
is silently returned.
In early development versions of tabplot
(prior to version 1.0) it was possible to sort datasets on multiple columns. To increase to tableplot creation speed, this feature is dropped. For multiple sorting purposes, we recommend to use the subset
parameter instead.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 | # load diamonds dataset from ggplot2
require(ggplot2)
data(diamonds)
# default tableplot
tableplot(diamonds)
# prior to verison 1.2, the mean values of numeric variables are displayed
# without standard deviation (see ?plot.tabplot):
tableplot(diamonds, numMode = "MB-ML")
# most expensive diamonds
tableplot(diamonds,
select=c(carat, cut, color, clarity, price),
sortCol=price,
from=0,
to=5)
# for large datasets, we recommend to preprocess the data with tablePrepare:
p <- tablePrepare(diamonds)
# specific subsetting
tableplot(p, subset=price < 5000 & cut=='Ideal')
# change palettes
tableplot(p,
pals=list(cut="Set4", color="Paired", clarity=grey(seq(0, 1,length.out=7))),
numPals=c(carat="PRGn", price="BrBG"))
# create a tableplot cut category, and fix scale limits of carat, table, and price
tabs <- tableplot(p, subset=cut,
limitsX=list(carat=c(0,4), table=c(55, 65), price=c(0, 20000)), plot=FALSE)
plot(tabs[[3]], title="Very good cut diamonds")
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.