knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
options(rmarkdown.html_vignette.check_title = FALSE)

The first step in data exploration usually consists of univariate, descriptive analysis of all variables of interest. Tidycomm offers four basic functions to quickly output relevant statistics:

library(tidycomm)

For demonstration purposes, we will use sample data from the Worlds of Journalism 2012-16 study included in tidycomm.

WoJ

Describe continuous variables

describe() outputs several measures of central tendency and variability for all variables named in the function call:

WoJ %>%  
  describe(autonomy_selection, autonomy_emphasis, work_experience)

If no variables are passed to describe(), all numeric variables in the data are described:

WoJ %>% 
  describe()

Data can be grouped before describing:

WoJ %>%  
  dplyr::group_by(country) %>% 
  describe(autonomy_emphasis, autonomy_selection)

The returning results from describe() can also be visualized:

WoJ %>% 
  describe() %>% 
  visualize()

In addition, percentiles can easily be extracted from continuous variables:

WoJ %>% 
  tab_percentiles()

Percentiles can also be visualized:

WoJ %>% 
  tab_percentiles(trust_parties) %>% 
  visualize()

Describe categorical variables

describe_cat() outputs a short summary of categorical variables (number of unique values, mode, N of mode) of all variables named in the function call:

WoJ %>% 
  describe_cat(reach, employment, temp_contract)

If no variables are passed to describe_cat(), all categorical variables (i.e., character and factor variables) in the data are described:

WoJ %>% 
  describe_cat()

Data can be grouped before describing:

WoJ %>% 
  dplyr::group_by(reach) %>% 
  describe_cat(country, employment)

Again, also the results from describe_cat() can be visualized like so:

WoJ %>% 
  describe_cat() %>% 
  visualize()

Tabulate frequencies of categorical variables

tab_frequencies() outputs absolute and relative frequencies of all unique values of one or more categorical variables:

WoJ %>%  
  tab_frequencies(employment)

Passing more than one variable will compute relative frequencies based on all combinations of unique values:

WoJ %>%  
  tab_frequencies(employment, country)

You can also group your data before. This will lead to within-group relative frequencies:

WoJ %>% 
  dplyr::group_by(country) %>%  
  tab_frequencies(employment)

(Compare the columns percent, cum_n and cum_percent with the output above.)

And of course, also tab_frequencies() can easily be visualized:

WoJ %>% 
  tab_frequencies(country) %>% 
  visualize()


joon-e/tidycomm documentation built on Feb. 24, 2024, 8:58 a.m.