knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) options(rmarkdown.html_vignette.check_title = FALSE)
Tidycomm includes five functions for bivariate explorative data analysis:
crosstab()
for both categorical independent and dependent variablest_test()
for dichotomous categorical independent and continuous dependent variablesunianova()
for polytomous categorical independent and continuous dependent variablescorrelate()
for both continuous independent and dependent variablesregress()
for both continuous or factorial (translated into dummy dichotomous versions) independent and continuous dependent variableslibrary(tidycomm)
We will again use sample data from the Worlds of Journalism 2012-16 study for demonstration purposes:
WoJ
crosstab()
outputs a contingency table for one independent (column) variable and one or more dependent (row) variables:
WoJ %>% crosstab(reach, employment)
Additional options include add_total
(adds a row-wise Total
column if set to TRUE
) and percentages
(outputs column-wise percentages instead of absolute values if set to TRUE
):
WoJ %>% crosstab(reach, employment, add_total = TRUE, percentages = TRUE)
Setting chi_square = TRUE
computes a $\chi^2$ test including Cramer's $V$ and outputs the results in a console message:
WoJ %>% crosstab(reach, employment, chi_square = TRUE)
Finally, passing multiple row variables will treat all unique value combinations as a single variable for percentage and Chi-square computations:
WoJ %>% crosstab(reach, employment, country, percentages = TRUE)
You can also visualize the output from crosstab()
:
WoJ %>% crosstab(reach, employment, percentages = TRUE) %>% visualize()
Note that the percentages = TRUE
argument determines whether the bars add up to 100% and thus cover the whole width or whether they do not:
WoJ %>% crosstab(reach, employment) %>% visualize()
Use t_test()
to quickly compute t-Tests for a group variable and one or more test variables. Output includes test statistics, descriptive statistics and Cohen's $d$ effect size estimates:
WoJ %>% t_test(temp_contract, autonomy_selection, autonomy_emphasis)
Passing no test variables will compute t-Tests for all numerical variables in the data:
WoJ %>% t_test(temp_contract)
If passing a group variable with more than two unique levels, t_test()
will produce a warning
and default to the first two unique values. You can manually define the levels by setting the levels
argument:
WoJ %>% t_test(employment, autonomy_selection, autonomy_emphasis) WoJ %>% t_test(employment, autonomy_selection, autonomy_emphasis, levels = c("Full-time", "Freelancer"))
Additional options include:
pooled_sd
: By default, the pooled variance will be used the compute Cohen's $d$ effect size estimates ($s = \sqrt\frac{(n_1 - 1)s^2_1 + (n_2 - 1)s^2_2}{n_1 + n_2 - 2}$).
Set pooled_sd = FALSE
to use the simple variance estimation instead ($s = \sqrt\frac{(s^2_1 + s^2_2)}{2}$).paired
: Set paired = TRUE
to compute a paired t-Test instead. It is advisable to specify the case-identifying variable with case_var
when computing paired t-Tests, as this will make sure that data are properly sorted.Previously, the (now deprecated) option of var.equal
was also available. This has been overthrown, however, as t_test()
now by default tests for equal variance (using a Levene test) to decide whether to use pooled variance or to use the Welch approximation to the degrees of freedom.
t_test()
also provides a one-sample t-Test if you provide a mu
argument:
WoJ %>% t_test(autonomy_emphasis, mu = 3.9)
Of course, also the result from t-Tests can be visualized easily as such:
WoJ %>% t_test(temp_contract, autonomy_selection, autonomy_emphasis) %>% visualize()
unianova()
will compute one-way ANOVAs for one group variable and one or more test variables. Output includes test statistics, $\eta^2$ effect size estimates, and $\omega^2$, if Welch's approximation is used to account for unequal variances.
WoJ %>% unianova(employment, autonomy_selection, autonomy_emphasis)
Descriptives can be added by setting descriptives = TRUE
. If no test variables are passed, all numerical variables in the data will be used:
WoJ %>% unianova(employment, descriptives = TRUE)
You can also compute Tukey's HSD post-hoc tests by setting post_hoc = TRUE
. Results will be added as a tibble
in a list column post_hoc
.
WoJ %>% unianova(employment, autonomy_selection, autonomy_emphasis, post_hoc = TRUE)
These can then be unnested with tidyr::unnest()
:
WoJ %>% unianova(employment, autonomy_selection, autonomy_emphasis, post_hoc = TRUE) %>% dplyr::select(Variable, post_hoc) %>% tidyr::unnest(post_hoc)
Visualize one-way ANOVAs the way you visualize almost everything in tidycomm
:
WoJ %>% unianova(employment, autonomy_selection, autonomy_emphasis) %>% visualize()
correlate()
will compute correlations for all combinations of the passed variables:
WoJ %>% correlate(work_experience, autonomy_selection, autonomy_emphasis)
If no variables passed, correlations for all combinations of numerical variables will be computed:
WoJ %>% correlate()
Specify a focus variable using the with
parameter to correlate all other variables with this focus variable.
WoJ %>% correlate(autonomy_selection, autonomy_emphasis, with = work_experience)
Run a partial correlation by designating three variables along with the partial
parameter.
WoJ %>% correlate(autonomy_selection, autonomy_emphasis, partial = work_experience)
Visualize correlations by passing the results on to the visualize()
function:
WoJ %>% correlate(work_experience, autonomy_selection) %>% visualize()
If you provide more than two variables, you automatically get a correlogram (the same you would get if you convert correlations to a correlation matrix):
WoJ %>% correlate(work_experience, autonomy_selection, autonomy_emphasis) %>% visualize()
By default, Pearson's product-moment correlations coefficients ($r$) will be computed. Set method
to "kendall"
to obtain Kendall's $\tau$ or to "spearman"
to obtain Spearman's $\rho$ instead.
To obtain a correlation matrix, pass the output of correlate()
to to_correlation_matrix()
:
WoJ %>% correlate(work_experience, autonomy_selection, autonomy_emphasis) %>% to_correlation_matrix()
regress()
will create a linear regression on one dependent variable with a flexible number of independent variables. Independent variables can thereby be continuous, dichotomous, and factorial (in which case each factor level will be translated into a dichotomous dummy variable version):
WoJ %>% regress(autonomy_selection, work_experience, trust_government)
The function automatically adds standardized beta values to the expected linear-regression output. You can also opt in to calculate up to three precondition checks:
WoJ %>% regress(autonomy_selection, work_experience, trust_government, check_independenterrors = TRUE, check_multicollinearity = TRUE, check_homoscedasticity = TRUE)
For linear regressions, a number of visualizations are possible. The default one is the visualization of the result(s), is that the dependent variable is correlated with each of the independent variables separately and a linear model is presented in these:
WoJ %>% regress(autonomy_selection, work_experience, trust_government) %>% visualize()
Alternatively you can visualize precondition-check-assisting depictions. Correlograms among independent variables, for example:
WoJ %>% regress(autonomy_selection, work_experience, trust_government) %>% visualize(which = "correlogram")
Next up, visualize a residuals-versus-fitted plot to determine distributions:
WoJ %>% regress(autonomy_selection, work_experience, trust_government) %>% visualize(which = "resfit")
Or use a (normal) probability-probability plot to check for multicollinearity:
WoJ %>% regress(autonomy_selection, work_experience, trust_government) %>% visualize(which = "pp")
The (normal) quantile-quantile plot also helps checking for multicollinearity but focuses more on outliers:
WoJ %>% regress(autonomy_selection, work_experience, trust_government) %>% visualize(which = "qq")
Next up, the scale-location (sometimes also called spread-location) plot checks whether residuals are spread equally to help check for homoscedasticity:
WoJ %>% regress(autonomy_selection, work_experience, trust_government) %>% visualize(which = "scaloc")
Finally, visualize the residuals-versus-leverage plot to check for influential outliers affecting the final model more than the rest of the data:
WoJ %>% regress(autonomy_selection, work_experience, trust_government) %>% visualize(which = "reslev")
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.