knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
options(rmarkdown.html_vignette.check_title = FALSE)

Tidycomm includes five functions for bivariate explorative data analysis:

library(tidycomm)

We will again use sample data from the Worlds of Journalism 2012-16 study for demonstration purposes:

WoJ

Compute contingency tables and Chi-square tests

crosstab() outputs a contingency table for one independent (column) variable and one or more dependent (row) variables:

WoJ %>% 
  crosstab(reach, employment)

Additional options include add_total (adds a row-wise Total column if set to TRUE) and percentages (outputs column-wise percentages instead of absolute values if set to TRUE):

WoJ %>% 
  crosstab(reach, employment, add_total = TRUE, percentages = TRUE)

Setting chi_square = TRUE computes a $\chi^2$ test including Cramer's $V$ and outputs the results in a console message:

WoJ %>% 
  crosstab(reach, employment, chi_square = TRUE)

Finally, passing multiple row variables will treat all unique value combinations as a single variable for percentage and Chi-square computations:

WoJ %>% 
  crosstab(reach, employment, country, percentages = TRUE)

You can also visualize the output from crosstab():

WoJ %>% 
  crosstab(reach, employment, percentages = TRUE) %>% 
  visualize()

Note that the percentages = TRUE argument determines whether the bars add up to 100% and thus cover the whole width or whether they do not:

WoJ %>% 
  crosstab(reach, employment) %>% 
  visualize()

Compute t-Tests

Use t_test() to quickly compute t-Tests for a group variable and one or more test variables. Output includes test statistics, descriptive statistics and Cohen's $d$ effect size estimates:

WoJ %>% 
  t_test(temp_contract, autonomy_selection, autonomy_emphasis)

Passing no test variables will compute t-Tests for all numerical variables in the data:

WoJ %>% 
  t_test(temp_contract)

If passing a group variable with more than two unique levels, t_test() will produce a warning and default to the first two unique values. You can manually define the levels by setting the levels argument:

WoJ %>% 
  t_test(employment, autonomy_selection, autonomy_emphasis)

WoJ %>% 
  t_test(employment, autonomy_selection, autonomy_emphasis, levels = c("Full-time", "Freelancer"))

Additional options include:

Previously, the (now deprecated) option of var.equal was also available. This has been overthrown, however, as t_test() now by default tests for equal variance (using a Levene test) to decide whether to use pooled variance or to use the Welch approximation to the degrees of freedom.

t_test() also provides a one-sample t-Test if you provide a mu argument:

WoJ %>% 
  t_test(autonomy_emphasis, mu = 3.9)

Of course, also the result from t-Tests can be visualized easily as such:

WoJ %>% 
  t_test(temp_contract, autonomy_selection, autonomy_emphasis) %>% 
  visualize()

Compute one-way ANOVAs

unianova() will compute one-way ANOVAs for one group variable and one or more test variables. Output includes test statistics, $\eta^2$ effect size estimates, and $\omega^2$, if Welch's approximation is used to account for unequal variances.

WoJ %>% 
  unianova(employment, autonomy_selection, autonomy_emphasis)

Descriptives can be added by setting descriptives = TRUE. If no test variables are passed, all numerical variables in the data will be used:

WoJ %>% 
  unianova(employment, descriptives = TRUE)

You can also compute Tukey's HSD post-hoc tests by setting post_hoc = TRUE. Results will be added as a tibble in a list column post_hoc.

WoJ %>% 
  unianova(employment, autonomy_selection, autonomy_emphasis, post_hoc = TRUE)

These can then be unnested with tidyr::unnest():

WoJ %>% 
  unianova(employment, autonomy_selection, autonomy_emphasis, post_hoc = TRUE) %>% 
  dplyr::select(Variable, post_hoc) %>% 
  tidyr::unnest(post_hoc)

Visualize one-way ANOVAs the way you visualize almost everything in tidycomm:

WoJ %>% 
  unianova(employment, autonomy_selection, autonomy_emphasis) %>% 
  visualize()

Compute correlation tables and matrices

correlate() will compute correlations for all combinations of the passed variables:

WoJ %>% 
  correlate(work_experience, autonomy_selection, autonomy_emphasis)

If no variables passed, correlations for all combinations of numerical variables will be computed:

WoJ %>% 
  correlate()

Specify a focus variable using the with parameter to correlate all other variables with this focus variable.

WoJ %>% 
  correlate(autonomy_selection, autonomy_emphasis, with = work_experience)

Run a partial correlation by designating three variables along with the partial parameter.

WoJ %>% 
  correlate(autonomy_selection, autonomy_emphasis, partial = work_experience)

Visualize correlations by passing the results on to the visualize() function:

WoJ %>% 
  correlate(work_experience, autonomy_selection) %>% 
  visualize()

If you provide more than two variables, you automatically get a correlogram (the same you would get if you convert correlations to a correlation matrix):

WoJ %>% 
  correlate(work_experience, autonomy_selection, autonomy_emphasis) %>% 
  visualize()

By default, Pearson's product-moment correlations coefficients ($r$) will be computed. Set method to "kendall" to obtain Kendall's $\tau$ or to "spearman" to obtain Spearman's $\rho$ instead.

To obtain a correlation matrix, pass the output of correlate() to to_correlation_matrix():

WoJ %>% 
  correlate(work_experience, autonomy_selection, autonomy_emphasis) %>% 
  to_correlation_matrix()

Compute linear regressions

regress() will create a linear regression on one dependent variable with a flexible number of independent variables. Independent variables can thereby be continuous, dichotomous, and factorial (in which case each factor level will be translated into a dichotomous dummy variable version):

WoJ %>% 
  regress(autonomy_selection, work_experience, trust_government)

The function automatically adds standardized beta values to the expected linear-regression output. You can also opt in to calculate up to three precondition checks:

WoJ %>% 
  regress(autonomy_selection, work_experience, trust_government,
          check_independenterrors = TRUE,
          check_multicollinearity = TRUE,
          check_homoscedasticity = TRUE)

For linear regressions, a number of visualizations are possible. The default one is the visualization of the result(s), is that the dependent variable is correlated with each of the independent variables separately and a linear model is presented in these:

WoJ %>% 
  regress(autonomy_selection, work_experience, trust_government) %>% 
  visualize()

Alternatively you can visualize precondition-check-assisting depictions. Correlograms among independent variables, for example:

WoJ %>% 
  regress(autonomy_selection, work_experience, trust_government) %>% 
  visualize(which = "correlogram")

Next up, visualize a residuals-versus-fitted plot to determine distributions:

WoJ %>% 
  regress(autonomy_selection, work_experience, trust_government) %>% 
  visualize(which = "resfit")

Or use a (normal) probability-probability plot to check for multicollinearity:

WoJ %>% 
  regress(autonomy_selection, work_experience, trust_government) %>% 
  visualize(which = "pp")

The (normal) quantile-quantile plot also helps checking for multicollinearity but focuses more on outliers:

WoJ %>% 
  regress(autonomy_selection, work_experience, trust_government) %>% 
  visualize(which = "qq")

Next up, the scale-location (sometimes also called spread-location) plot checks whether residuals are spread equally to help check for homoscedasticity:

WoJ %>% 
  regress(autonomy_selection, work_experience, trust_government) %>% 
  visualize(which = "scaloc")

Finally, visualize the residuals-versus-leverage plot to check for influential outliers affecting the final model more than the rest of the data:

WoJ %>% 
  regress(autonomy_selection, work_experience, trust_government) %>% 
  visualize(which = "reslev")


joon-e/tidycomm documentation built on Feb. 24, 2024, 8:58 a.m.