The vtable package is designed to help you quickly and efficiently look at and document your data.
There are three main functions in vtable:
vtable
, or vt
for short, shows you information about the
variables in your data set, including variable labels, in a way that
is easy to use “find in page” to search through. It was designed to
be similar to Stata’s “Variables” panel.sumtable
or st
for short, provides a table of summary
statistics. It is very similar in spirit to the summary statistics
function of stargazer::stargazer()
except that it accepts
tibble
s, handles factor variables, and makes by-group statistics
and group tests easy.labeltable
provides a table of value labels, either for variables
labelled with sjlabelled or haven or similar, or for when
you want to see how the values of one column line up with the values
of another.All three of these functions are built with the intent of being fast. Not so much fast to run, but fast to use. The defaults are intended to be good defaults, and the output by default prints to the Viewer tab (in RStudio) or the browser (outside RStudio) so you can see it immediately, and continue to look at it as you work on your data.
You could almost certainly build your own highly-customized version of
vtable
, But why do that when you can just do vt(df)
and see the
information you need to see? And there are eight million packages that
make summary statistics tables to your exact specifications if you tweak
them. But there’s a good chance that st(df)
does what you want. If you
want something real out there, that’s when you can break out the big
guns.
All three main vtable functions can produce HTML, LaTeX,
data.frame
, CSV, or knitr::kable()
output.
You can install vtable from CRAN. Note that the documentation on this site refers to the development version, and so may not work perfectly for the CRAN version. But the two will usually be the same.:
install.packages("vtable")
The development version can be installed from GitHub:
# install.packages("remotes")
remotes::install_github("NickCH-K/vtable")
I’ll just do a brief example here, using the iris
we all know and
love. Output will be to kable
since this is an RMarkdown document.
data(iris)
# Basic vtable
vt(iris)
iris
Name
Class
Values
Sepal.Length
numeric
Num: 4.3 to 7.9
Sepal.Width
numeric
Num: 2 to 4.4
Petal.Length
numeric
Num: 1 to 6.9
Petal.Width
numeric
Num: 0.1 to 2.5
Species
factor
‘setosa’ ‘versicolor’ ‘virginica’
There are plenty of options if we want to go nuts, but let’s keep it
simple and just ask for a little more with lush
vt(iris, lush = TRUE)
iris
Name
Class
Values
Missing
Summary
Sepal.Length
numeric
Num: 4.3 to 7.9
0
mean: 5.843, sd: 0.828, nuniq: 35
Sepal.Width
numeric
Num: 2 to 4.4
0
mean: 3.057, sd: 0.436, nuniq: 23
Petal.Length
numeric
Num: 1 to 6.9
0
mean: 3.758, sd: 1.765, nuniq: 43
Petal.Width
numeric
Num: 0.1 to 2.5
0
mean: 1.199, sd: 0.762, nuniq: 22
Species
factor
‘setosa’ ‘versicolor’ ‘virginica’
0
nuniq: 3
Let’s stick with iris
!
# Basic summary stats
st(iris)
Summary Statistics
Variable
N
Mean
Std. Dev.
Min
Pctl. 25
Pctl. 75
Max
Sepal.Length
150
5.843
0.828
4.3
5.1
6.4
7.9
Sepal.Width
150
3.057
0.436
2
2.8
3.3
4.4
Petal.Length
150
3.758
1.765
1
1.6
5.1
6.9
Petal.Width
150
1.199
0.762
0.1
0.3
1.8
2.5
Species
150
… setosa
50
33.3%
… versicolor
50
33.3%
… virginica
50
33.3%
Note that sumtable
allows for much more customization than vtable
since there’s a heightened chance you want it for a paper or something.
But I’ll leave that to the more detailed documentation. For now just
note it does by-group stats, either in “group.long
” format (multiple
sumtable
s stacked on top of each other), or by default, in columns,
with an option to add a group test.
Grouped sumtables
look a little nicer in formats that suport
multi-column cells like HTML and LaTeX.
These tables include multi-column cells, which are not supported in
the kable
output, but are supported by vtable
’s dftoHTML
and
dftoLaTeX
functions. They look nicer in the HTML or LaTeX output.
st(iris,
group = 'Species',
group.test = TRUE)
Summary Statistics
For this we’ll need labeled values.
data(efc, package = 'sjlabelled')
# Now shoot - how was gender coded?
labeltable(efc$e16sex)
e16sex
Label
1
male
2
female
labeltable
can also be used to see, for values of one variable, what
values are present of other variables. This is intended for use if one
variable is a recode, simplification, or lost-labels version of another,
but hey, go nuts.
labeltable(efc$e15relat,efc$e16sex,efc$e42dep)
e15relat
e16sex
e42dep
1
2, 1
3, 4, 1, 2, NA
2
2, 1, NA
3, 4, 2, 1
3
1, 2
3, 2, 1, 4
4
2, 1
4, 3, 2, 1
5
2, 1
3, 2, 1, 4
6
2, 1
4, 3, 1, 2
7
2, 1
4, 3, 2, 1
8
2, 1
3, 4, 2, 1
NA
2, NA
3, NA
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.