Bivariate analysis of continuous and/or categorical variables"

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Tidycomm includes four functions for bivariate explorative data analysis:

library(tidycomm)

We will again use sample data from the Worlds of Journalism 2012-16 study for demonstration purposes:

WoJ

Compute contingency tables and Chi-square tests

crosstab() outputs a contingency table for one independent (column) variable and one or more dependent (row) variables:

WoJ %>% 
  crosstab(reach, employment)

Additional options include add_total (adds a row-wise Total column if set to TRUE) and percentages (outputs column-wise percentages instead of absolute values if set to TRUE):

WoJ %>% 
  crosstab(reach, employment, add_total = TRUE, percentages = TRUE)

Setting chi_square = TRUE computes a $\chi^2$ test including Cramer's $V$ and outputs the results in a console message:

WoJ %>% 
  crosstab(reach, employment, chi_square = TRUE)

Finally, passing multiple row variables will treat all unique value combinations as a single variable for percentage and Chi-square computations:

WoJ %>% 
  crosstab(reach, employment, country, percentages = TRUE)

Compute t-Tests

Use t_test() to quickly compute t-Tests for a group variable and one or more test variables. Output includes test statistics, descriptive statistics and Cohen's $d$ effect size estimates:

WoJ %>% 
  t_test(temp_contract, autonomy_selection, autonomy_emphasis)

Passing no test variables will compute t-Tests for all numerical variables in the data:

WoJ %>% 
  t_test(temp_contract)

If passing a group variable with more than two unique levels, t_test() will produce a warning and default to the first two unique values. You can manually define the levels by setting the levels argument:

WoJ %>% 
  t_test(employment, autonomy_selection, autonomy_emphasis)

WoJ %>% 
  t_test(employment, autonomy_selection, autonomy_emphasis, levels = c("Full-time", "Freelancer"))

Additional options include:

Compute one-way ANOVAs

unianova() will compute one-way ANOVAs for one group variable and one or more test variables. Output includes test statistics and $\eta^2$ effect size estimates.

WoJ %>% 
  unianova(employment, autonomy_selection, autonomy_emphasis)

Descriptives can be added by setting descriptives = TRUE. If no test variables are passed, all numerical variables in the data will be used:

WoJ %>% 
  unianova(employment, descriptives = TRUE)

You can also compute Tukey's HSD post-hoc tests by setting post_hoc = TRUE. Results will be added as a tibble in a list column post_hoc.

WoJ %>% 
  unianova(employment, autonomy_selection, autonomy_emphasis, post_hoc = TRUE)

These can then be unnested with tidyr::unnest():

WoJ %>% 
  unianova(employment, autonomy_selection, autonomy_emphasis, post_hoc = TRUE) %>% 
  dplyr::select(Var, post_hoc) %>% 
  tidyr::unnest(post_hoc)

Compute correlation tables and matrices

correlate() will compute correlations for all combinations of the passed variables:

WoJ %>% 
  correlate(work_experience, autonomy_selection, autonomy_emphasis)

If no variables passed, correlations for all combinations of numerical variables will be computed:

WoJ %>% 
  correlate()

By default, Pearson's product-moment correlations coefficients ($r$) will be computed. Set method to "kendall" to obtain Kendall's $\tau$ or to "spearman" to obtain Spearman's $\rho$ instead.

To obtain a correlation matrix, pass the output of correlate() to to_correlation_matrix():

WoJ %>% 
  correlate(work_experience, autonomy_selection, autonomy_emphasis) %>% 
  to_correlation_matrix()


Try the tidycomm package in your browser

Any scripts or data that you put into this service are public.

tidycomm documentation built on July 6, 2021, 5:07 p.m.