dfSummary | R Documentation |
Summary of a data frame consisting of: variable names and types, labels if any, factor levels, frequencies and/or numerical summary statistics, barplots/histograms, and valid/missing observation counts and proportions.
dfSummary( x, round.digits = 1, varnumbers = st_options("dfSummary.varnumbers"), labels.col = st_options("dfSummary.labels.col"), valid.col = st_options("dfSummary.valid.col"), na.col = st_options("dfSummary.na.col"), graph.col = st_options("dfSummary.graph.col"), graph.magnif = st_options("dfSummary.graph.magnif"), style = st_options("dfSummary.style"), plain.ascii = st_options("plain.ascii"), justify = "l", col.widths = NA, headings = st_options("headings"), display.labels = st_options("display.labels"), max.distinct.values = 10, trim.strings = FALSE, max.string.width = 25, split.cells = 40, split.tables = Inf, tmp.img.dir = st_options("tmp.img.dir"), keep.grp.vars = FALSE, silent = st_options("dfSummary.silent"), ... )
x |
A data frame. |
round.digits |
Number of significant digits to display. Defaults to
|
varnumbers |
Logical. Show variable numbers in the first column.
Defaults to |
labels.col |
Logical. If |
valid.col |
Logical. Include column indicating count and proportion of
valid (non-missing) values. |
na.col |
Logical. Include column indicating count and proportion of
missing ( |
graph.col |
Logical. Display barplots/histograms column. |
graph.magnif |
Numeric. Magnification factor for graphs column. Useful
if the graphs show up too large (then use a value such as .75) or too small
(use a value such as |
style |
Character. Argument used by |
plain.ascii |
Logical. |
justify |
String indicating alignment of columns; one of “l” (left) “c” (center), or “r” (right). Defaults to “l”. |
col.widths |
Numeric or character. Vector of column widths. If numeric,
values are assumed to be numbers of pixels. Otherwise, any CSS-supported
units can be used. |
headings |
Logical. Set to |
display.labels |
Logical. Should data frame label be displayed in the
title section? Default is |
max.distinct.values |
The maximum number of values to display frequencies for. If variable has more distinct values than this number, the remaining frequencies will be reported as a whole, along with the number of additional distinct values. Defaults to 10. |
trim.strings |
Logical; for character variables, should leading and
trailing white space be removed? Defaults to |
max.string.width |
Limits the number of characters to display in the
frequency tables. Defaults to |
split.cells |
A numeric argument passed to |
split.tables |
pander argument which determines the maximum width
of a table. Keeping the default value ( |
tmp.img.dir |
Character. Directory used to store temporary images when rendering dfSummary() with 'method = "pander"', 'plain.ascii = TRUE' and 'style = "grid"'. See Details. |
keep.grp.vars |
Logical. When using |
silent |
Logical. Hide console messages. |
... |
Additional arguments passed to |
The default value plain.ascii = TRUE
is intended to
facilitate interactive data exploration. When using the package for
reporting with rmarkdown, make sure to set this option to
FALSE
.
When trim.strings
is set to TRUE
, trimming is done
before calculating frequencies, be aware that those will
be impacted accordingly.
Specifying tmp.img.dir
allows producing results consistent with
pandoc styling while also showing png graphs. Due to the fact that
in Pandoc, column widths are determined by the length of cell contents
even if said content is merely a link to an image, using standard
R temporary directory to store the images would cause columns to be
exceedingly wide. A shorter path is needed. On Mac OS and Linux,
using “/tmp” is a sensible choice, since this directory is cleaned
up automatically on a regular basis. On Windows however, there is no such
convenient directory, so the user has to choose a directory and cleanup the
temporary images manually after the document has been rendered. Providing
a relative path such as “img”, omitting “./”, is recommended.
The maximum length for this parameter is set to 5 characters. It can be set
globally with st_options
(e.g.:
st_options(tmp.img.dir = ".")
.
It is possible to control which statistics are shown in the
Stats / Values column. For this, see the Details and
Examples sections of st_options
.
A data frame with additional class summarytools
containing as
many rows as there are columns in x
, with attributes to inform
print
method. Columns in the output data frame are:
Number indicating the order in which column appears in the data frame.
Name of the variable, along with its class(es).
Label of the variable (if applicable).
For factors, a list of their values, limited by the
max.distinct.values
parameter. For character variables, the most
common values (in descending frequency order), also limited by
max.distinct.values
. For numerical variables, common univariate
statistics (mean, std. deviation, min, med, max, IQR and CV).
For factors and character variables, the
frequencies and proportions of the values listed in the previous
column. For numerical vectors, number of distinct values, or frequency
of distinct values if their number is not greater than
max.distinct.values
.
An ASCII histogram for numerical variables, and ASCII barplot for factors and character variables.
An html encoded graph, either barplot or histogram.
Number and proportion of valid values.
Number and proportion of missing (NA and NAN) values.
Several packages provide functions for defining variable labels, summarytools being one of them. Some packages (Hmisc in particular) employ special classes for labelled objects, but summarytools doesn't use nor look for any such classes.
Dominic Comtois, dominic.comtois@gmail.com
label
, print.summarytools
data("tobacco") saved_x11_option <- st_options("use.x11") st_options(use.x11 = FALSE) dfSummary(tobacco) # Exclude some of the columns to reduce table width dfSummary(tobacco, varnumbers = FALSE, valid.col = FALSE) # Limit number of categories to be displayed for categorical data dfSummary(tobacco, max.distinct.values = 5, style = "grid") # Using stby() stby(tobacco, tobacco$gender, dfSummary) st_options(use.x11 = saved_x11_option) ## Not run: # Show in Viewer or browser - no capital V in view(); stview() is also # available in case of conflicts with other packages) view(dfSummary(iris)) # Rmarkdown-ready dfSummary(tobacco, style = "grid", plain.ascii = FALSE, varnumbers = FALSE, valid.col = FALSE, tmp.img.dir = "./img") # Using group_by() tobacco %>% group_by(gender) %>% dfSummary() ## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.