knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) options(rmarkdown.html_vignette.check_title = FALSE)
The first step in data exploration usually consists of univariate, descriptive analysis of all variables of interest. Tidycomm offers four basic functions to quickly output relevant statistics:
describe()
for continuous variablestab_percentiles()
for continuous variablesdescribe_cat()
for categorical variablestab_frequencies()
for categorical variableslibrary(tidycomm)
For demonstration purposes, we will use sample data from the Worlds of Journalism 2012-16 study included in tidycomm.
WoJ
describe()
outputs several measures of central tendency and variability for all variables named in the function call:
WoJ %>% describe(autonomy_selection, autonomy_emphasis, work_experience)
If no variables are passed to describe()
, all numeric variables in the data are described:
WoJ %>% describe()
Data can be grouped before describing:
WoJ %>% dplyr::group_by(country) %>% describe(autonomy_emphasis, autonomy_selection)
The returning results from describe()
can also be visualized:
WoJ %>% describe() %>% visualize()
In addition, percentiles can easily be extracted from continuous variables:
WoJ %>% tab_percentiles()
Percentiles can also be visualized:
WoJ %>% tab_percentiles(trust_parties) %>% visualize()
describe_cat()
outputs a short summary of categorical variables (number of unique values, mode, N of mode) of all variables named in the function call:
WoJ %>% describe_cat(reach, employment, temp_contract)
If no variables are passed to describe_cat()
, all categorical variables (i.e., character
and factor
variables) in the data are described:
WoJ %>% describe_cat()
Data can be grouped before describing:
WoJ %>% dplyr::group_by(reach) %>% describe_cat(country, employment)
Again, also the results from describe_cat()
can be visualized like so:
WoJ %>% describe_cat() %>% visualize()
tab_frequencies()
outputs absolute and relative frequencies of all unique values of one or more categorical variables:
WoJ %>% tab_frequencies(employment)
Passing more than one variable will compute relative frequencies based on all combinations of unique values:
WoJ %>% tab_frequencies(employment, country)
You can also group your data before. This will lead to within-group relative frequencies:
WoJ %>% dplyr::group_by(country) %>% tab_frequencies(employment)
(Compare the columns percent
, cum_n
and cum_percent
with the output above.)
And of course, also tab_frequencies()
can easily be visualized:
WoJ %>% tab_frequencies(country) %>% visualize()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.