In WdeNooy/UsingRTutorials: Provides learnr Tutorials for a Using R Course

# Ensure that libraries are loaded.
library(tidyverse)
library(learnr)
library(gradethis)
library(knitr)
library(kableExtra)
library(haven) #For importing SPSS data files.
library(car) #For ANOVA.  
library(papaja) #For APA formatted results tables.
library(texreg) #For pretty regression results.
library(effects) #For two-way interaction plots.
library(broom) #For cleaning up statistical results.

tutorial_options(exercise.timelimit = 20, exercise.checker = gradethis::grade_learnr)
knitr::opts_chunk$set(echo = FALSE, warning = FALSE, message = FALSE)

# Ensure that the data is loaded for the remainder of this tutorial.
glbwarm <- UsingRTutorials::glbwarm
glbwarm_spss <- UsingRTutorials::glbwarm_spss
# The estimated regression model with rstanarm. 
model_1aBayes <- UsingRTutorials::model_1aBayes

Overview

Course Content

Basic Statistical Models
Print-Quality Results Tables
Results Plots

Data Project

Finish Sprint 3
Plan the last Sprint
Updates of the SCRUM masters

Basic Statistical Models

Let us practice with some of the most common statistical analyses in R.

Consult Sections 3 and 4 in Help, My Collaborator Uses R! An Introduction to Reproducible Statistical Analyses in R and R help on the functions that we use.

Example data

Example data: glbwarm (accessible within this tutorial).

Source: Erik Nisbet; http://afhayes.com/

Inspect the variables in the Environment.

Main data types: 1. Number: govact, posemot, negemot, age. 2. Character: ideology, sex, partyid.

Inspect variable summaries.

summary(glbwarm)

t test: `t.test()`

You already know how to execute an independent-samples t test (Session 5).

There are different versions of the same function for different t tests.

Usage (in ?t.test):

t.test(x, ...)

### Default S3 method:
t.test(x, y = NULL,
       alternative = c("two.sided", "less", "greater"),
       mu = 0, paired = FALSE, 
       var.equal = FALSE,
       conf.level = 0.95, ...)

### S3 method for class 'formula'
t.test(formula, data, subset, na.action, ...)

the function with just x is for a one sample t test: specify the hypothesized population mean with argument mu =.
the function with x, y is for paired samples t tests.
the function with a formula is for two samples t tests; y must be a variable with two categories.

Use a _t_ test and the `glbwarm` data object for testing the following null hypotheses (in this order): 1. Average negative emotions about global warming (variable `negemot`) are equal for females and males (variable `sex`) in the population. 2. In the population, average negative emotions about global warming are 3.0. 3. On average, negative emotions about global warming are higher than positive emotions about global warming (`posemot`). Send the results to the screen.

# Use the `t.test()` version that matches the kind of t test you need: on one
# mean, paired samples, or independent samples.

# Note that the 'data = ' argument only works if we use the formula form 'y ~ x'.
# Independent samples t test:
t.test(negemot ~ sex, data = glbwarm)
# For the other versions, the tibble name must be used and the dollar sign to
# fuly define the variable.
# t test on one mean (complete it yourself):
t.test(glbwarm$negemot, ... )

# The code checker expects the three tests in the exact order as specified in
# the question.

t.test(negemot ~ sex, data = glbwarm)
t.test(glbwarm$negemot, mu = 3.0)
t.test(glbwarm$negemot, glbwarm$posemot, paired = TRUE)

gradethis::grade_code(
  correct = "", 
  incorrect = ""
  )

F test on Two Variances: `var.test()`

In contrast to SPSS, R only gives you what you ask for.

If you ask for a t test, you get a t test but not checks on assumptions.
You have to apply those checks yourself.

Version of the independent samples t test that we must use, depends on whether the population variances are equal for the two groups.

Use the function `var.test` to test if `govact` variances are equal for females and males in the population. Use the `glbwarm` data object and store the results as a new data object named `vartest`.

vartest <- ____

__Hint:__ Have a look at the help for function `var.test`. It is important that you get used to the way R presents help on statistical functions.

vartest <- var.test(govact ~ sex, data = glbwarm)

gradethis::grade_code(
  correct = "", 
  incorrect = ""
  )

__Remember__ - R formula: dependent variable/outcome ~ independent variable/predictor (and more).

Pull the p value from data object `vartest` that you have just created. Is the test on equal population variances statistically significant?

__Hint:__ Review Session 5 if you do not know how to do this. Remember: function `str()` is handy to see the contents (structure) of a list.

vartest$p.value

gradethis::grade_code(
  correct = "`e-08`  (scientific notation) means `* 10^-8`, that is, divided by 10 to the power 8 (100,000,000). Note that the results are stored as class htest, just like the results from `t.test()`.", 
  incorrect = "Perhaps you used double square brackets instead of the dollar sign to pull out the p value. That's OK."
  )

In R, we can use a function within an argument of another function.

Example for an independent samples t test:

var-equal argument is FALSE by default.
It must be TRUE if the p value of var.test() is larger then .05.

Integrate the F test on equal population variances in the _t_ test, such that the _t_ test automatically uses the correct version: with or without equal population variances assumed. Send the results to the screen (do not save it as a data object).

t.test(govact ~ sex, data = glbwarm, var.equal = _____ )

# You already executed the t test in this tutorial. Add the var.equal argument.
t.test(govact ~ sex, data = glbwarm)

# In the preceding exercise, you pulled the p value from the stored test result.
vartest$p.value
# Add it to the var.equal argument in such a way that a p value over .05 yields TRUE.

# Replace the stored test result by the test function itself.
t.test(govact ~ sex, data = glbwarm, var.equal = vartest$p.value > 0.05)

t.test(govact ~ sex, data = glbwarm, var.equal = var.test(govact ~ sex, data = glbwarm)$p.value > 0.05)

gradethis::grade_code(
  correct = "", 
  incorrect = ""
  )

Linear Regression: `lm()`

Usage (in ?lm):

lm(formula, data, subset, weights, na.action,
   method = "qr", model = TRUE, x = FALSE, y = FALSE, qr = TRUE,
   singular.ok = TRUE, contrasts = NULL, offset, ...)

See book p. 358-371 {Section 23.4} for using regression formulas:

including interactions and
transformations within a formula.

Use `lm()` and tibble `glbwarm` to predict support for governmental action (`govact`) from age, negative emotions and party identification. Store the results in data object `model_1`.

__Hint:__ `lm()` is not a tidyverse function, so you have to use the `data =` argument. You can supply the name of the tibble (`glbwarm`) or pipe this tibble into the `lm()` function using the dot (`.`).

model_1 <- lm(govact ~ age + negemot + partyid, data = glbwarm)

gradethis::grade_code(
  correct = "", 
  incorrect = "Perhaps, you used the independent variables in a different order within the formula. That is fine."
  )

For quick inspection, data objects for results of statistical analyses always have:

a summary() function;
a print() function .

Not for presentation of results!

Inspect the regression results (stored as `model_1`) with `summary()` and `print()`. What happened to the character variable?

Linear Regression: Two-Way Interaction

lm() takes care of:

creating dummies/indicator variables for a categorical predictor (character string or factor) - see preceding exercise;
creating interaction variables.

(This is easier than in SPSS.)

Add an interaction effect between negative emotions (numeric) and age (numeric, in decades) to the regression model. Save the results as data object `model_1a`. Show the results with `print()`. Can you interpret the interaction effect?

__Hint:__ An interaction term (`var1*var2`) in a regression formula yields the partial effects of the individual variables and their interaction effect(s).

model_1a <- lm(govact ~ age*negemot + partyid, data = glbwarm)
print(model_1a)

gradethis::grade_code(
  correct = "", 
  incorrect = "Perhaps, you used the independent variables in a different order within the formula. That is fine."
  )

Now, predict support for governmental action (`govact`) from age and an interaction effect between negative emotions (numeric) and party identification (categorical). Send the results to the screen. Can you make sense of the coefficients?

__Hint:__ R creates all dummy variables and all interaction variables. That is convenient!

lm(govact ~ age + negemot*partyid, data = glbwarm)

gradethis::grade_code(
  correct = "", 
  incorrect = ""
  )

Analysis of Variance: `lm()` and `car::Anova()`

In R, analysis of variance consists of two steps.

Step 1: ANOVA is linear regression with special contrasts (contr.sum).

Contrast contr.sum gives deviations from the mean.
In analysis of variance, (main) effects are deviations from the (grand) mean.
The contrasts = argument requires:
- a list: contrasts = list();
- with contrast type for each categorical predictor:
- contrasts = list(sex = contr.sum, partyid = contr.sum)

Estimate a regression model with support for governmental action (`govact`) predicted from respondent's sex and party identification, and the interaction between the two predictors. Use `contr.sum` contrasts and save the results as data object `model_2`.

model_2 <- lm(govact ~ sex * partyid, data = glbwarm, contrasts= ____ )

__Hint:__ The `contrasts` argument requires a list of variable name and contrast type pairs.

model_2 <- lm(govact ~ sex * partyid, data = glbwarm, contrasts=list(sex=contr.sum, partyid=contr.sum))

gradethis::grade_code(
  correct = "Have a look at the results: send model_2 to the screen.", 
  incorrect = ""
  )

Step 2: Calculate the sums of squares partition.

Functions:

stats::anova() for balanced designs.
car::Anova() for (balanced and) unbalanced designs (Type !! or III sums of squares).

Use the `Anova()` function to show the sums of squares partition with associated F tests of `model_2` on the screen.

__Hint:__ The `car` package has been loaded by the tutorial, so you do not have to include it if you use the `Anova()` function.

Anova(model_2)

gradethis::grade_code(
  correct = "", 
  incorrect = "Perhaps you used the package name in the command, which is fine."
  )

The anova functions return a data frame, which you can use as any data frame.

For example, knit it to a pretty (HTML or PDF) table with knitr::kable().

We will do that later on in this tutorial.

Missing Values

How a stat:: package functions deal with missing values depends on the na.action = argument:

na.omit (default and preferred) or na.exclude: listwise deletion;
na.fail: stops with an error.

Check and, if necessary, set the `na.action` option in the console of RStudio.

# Get the current option for na.action.
getOption("na.action")
# Set the option (if necessary).
options(na.action = "na.omit")

Print-Quality Results Tables

Off-The-Shelf Tables

There are several packages that help you to tabulate statistical results. The table below lists some of them with their characteristic features.

# Create a data frame for the contents of the table.
dt <- data.frame(
  Package = c("base, stats", "papaja", "stargazer", "texreg"),
  Models = c("all", "t test, regression, anova", "regression", "regression"),
  Format = c("plain text", "PDF, Word (HTML)", "PDF, HTML, plain", "PDF, HTML, plain"),
  Style = c("-", "APA", "div., not APA", "generic"),
  Comparison = c("-", "stacked", "side-by-side", "side-by-side"),
  Peculiarities = c("summary() and print(), only for quick inspection", "2 steps: apa_print() and apa_table()", "", "texreg(), hmtlreg(), screenreg()"),
  stringsAsFactors = FALSE
  )
names(dt)[5] <- paste0(names(dt)[5], footnote_marker_symbol(1))
dt %>%
  knitr::kable(align = "llllll", escape = F) %>% #show with kable() from the knitr package
  kable_styling(full_width = T) %>%
  row_spec(0, font_size = 18) %>%
  footnote(symbol = "Results of two or more models in one table.")

papaja:: Write APA Style Papers in RMarkdown

One of our favorite packages for Open Science projects!

See Canvas for an example.
Reference manual: http://frederikaust.com/papaja_man/
Write fully reproducible papers in R Markdown and produce perfect, APA styled knitted documents in PDF or Word.
Include graphs and tables, designed according to journal guidelines.
papaja:: integrates smoothly with .bib files and reference managers like Zotero.

Package papaja is not in the CRAN repository (or any of its mirrors).

Install `papaja` from GitHub (you must have internet connection) in RStudio.

# Execute this line of code in the RStudio console. 
remotes::install_github("crsh/papaja")

Statistical results tables in APA format require two papaja commands:

apa_print(): formats the results from various statistical methods in accordance with APA guidelines.
apa_table(): displays results as an APA format table.

The below code produces the table of regression results.

# Attach the papaja package.
library(papaja)
# Estimate the regression model (as before).
model_1a <- lm(govact ~ age*negemot + partyid, data = glbwarm)
# Format the results of the regression model.
model_1a_formatted <- apa_print(model_1a)
# Display the results as an APA formatted table.
apa_table(model_1a_formatted$table)

We will see more of papaja in Session 7.

Print-Quality Table With `texreg`

library(texreg)
model_1 <- lm(govact ~ age + negemot + partyid, data = glbwarm)
model_1a <- lm(govact ~ age*negemot + partyid, data = glbwarm)
# Table for HTML output.
texreg::htmlreg(list(model_1,model_1a), #the regression model(s) shown
        single.row = T, #coefficient and standard error on the same row
        star.symbol = "\\*", 
        doctype = F, #better for Markdown document
        html.tag = F, #better for Markdown document 
        head.tag = F, #better for Markdown document
        body.tag = F, #better for Markdown document 
        caption = "", #no caption to save space on the slide
        custom.coef.names = c(NA, "Age", "Negative emotions", "Independent", "Republican", "Age*Neg. emotions"),
        vertical.align.px = 6)
# For PDF output, use the texreg() function, with slightly different arguments (options).
# Use Help to see more arguments.

The above table is generated from the code below. What happens if you run the code?

model_1 <- lm(govact ~ age + negemot + partyid, data = glbwarm)
model_1a <- lm(govact ~ age*negemot + partyid, data = glbwarm)

# Attach the texreg package.
library(texreg)
# Table for HTML output.
texreg::htmlreg(list(model_1,model_1a), #the regression model(s) shown
        single.row = T, #coefficient and standard error on the same row
        star.symbol = "\\*", 
        doctype = F, #better for Markdown document
        html.tag = F, #better for Markdown document 
        head.tag = F, #better for Markdown document
        body.tag = F, #better for Markdown document 
        caption = "", #no caption to save space on the slide
        custom.coef.names = c(NA, "Age", "Negative emotions", "Independent", "Republican", "Age*Neg. emotions"),
        vertical.align.px = 6)
# For PDF output, use the texreg() function, with slightly different arguments (options).
# Use Help to see more arguments.

htmlreg() produces HTML code:

This code should not be treated as ordinary text when the RMarkdown document is knitted.
Instead, it must be used and formatted as HTML code.

The results='asis' code chunk option is needed to knit the html output of the code chunk as formatted text.

The full code chunk in the RMarkdown document (note the results='asis' argument):

knitr::include_graphics("images/asis.png")

And this is what the knitted text looks like:

# Create regression data objects.
model_1 <- lm(govact ~ age + negemot + partyid, data = glbwarm)
model_1a <- lm(govact ~ age*negemot + partyid, data = glbwarm)
# Attach the texreg package.
library(texreg)
# Table for HTML output.
texreg::htmlreg(list(model_1,model_1a), #the regression model(s) shown
        single.row = T, #coefficient and standard error on the same row
        star.symbol = "\\*", 
        doctype = F, #better for Markdown document
        html.tag = F, #better for Markdown document 
        head.tag = F, #better for Markdown document
        body.tag = F, #better for Markdown document 
        caption = "", #no caption to save space on the slide
        custom.coef.names = c(NA, "Age", "Negative emotions", "Independent", "Republican", "Age*Neg. emotions"),
        vertical.align.px = 6)

Functions for lm objects

htmlreg() is one example of a function that operates on lm() objects.

Other useful functions:

confint(),
coef(),
resid().

Find out what these functions do. Apply them to `model_1` and check out the options of these functions.

__Hint:__ Read the help info to these functions.

Custom Tables with `broom` and `knitr`

For full control of your table, create it with packages broom and knitr.

(broom is part of the tidyverse package)

You need 3 steps:

Use function tidy() in the broom package to extract relevant statistics from a statistical results object into a tibble.
Select and adjust values to suit your needs.
Create a table with knitr::kable() and kableExtra with all formatting options you need.

Use function `tidy()` in the `broom` package to extract relevant statistics

Use `tidy()` and data objext `model_1a` to see the regression coefficients with their standard errors, t values, p values as a tibble. Can you also get the 95% confidence intervals? Send the result to the screen.

__Hint:__ Check out help on `tidy.lm`. You are tidying the results of a linear model (`lm()`).

model_1a %>% tidy(conf.int = TRUE, conf.level = 0.95)

gradethis::grade_code(
  correct = "", 
  incorrect = ""
  )

Step 2: Select and adjust values to suit your needs.

broom produces a tibble (data frame), so you can wrangle it like any other.

Explain the code below. If you are unsure about a code element, change it and see what happens.

model_1a %>% 
  tidy(conf.int = TRUE, conf.level = 0.95) %>% 
  mutate(
    estimate = format(round(estimate, digits = 2), nsmall = 2), 
    p.value = format(round(p.value, digits = 3), nsmall = 3), 
    CI = paste0( "[", format( round(conf.low, digits = 2), nsmall = 2 ), ", ", format( round(conf.high, digits = 2), nsmall = 2 ), "]" )
    ) %>%
  select(term, estimate, p.value, CI)

If you want to use stars to mark the significance level of regression coefficients, you can add a new character variable showing the number of stars.

Find and explain the line of code that adds stars indicating the significance level.

model_1a %>% 
  tidy(conf.int = TRUE, conf.level = 0.95) %>% 
  mutate(
    estimate = format(round(estimate, digits = 2), nsmall = 2), 
    p.value = format(round(p.value, digits = 3), nsmall = 3),
    sig = case_when( p.value < .001 ~ "***", p.value < .01 ~ "**", p.value < .05 ~ "*", TRUE ~ "" ), 
    CI = paste0( "[", format( round(conf.low, digits = 2), nsmall = 2 ), ", ", format( round(conf.high, digits = 2), nsmall = 2 ), "]" )
    ) %>%
  select(term, estimate, p.value, CI)

Step 3: Create a table with `knitr::kable()` and `kableExtra`

With knitr and kableExtra, we can create a table including footnotes.

Play around with the `kable` and `kableExtra` options to see what they do.

model_1a %>% 
  tidy(conf.int = TRUE, conf.level = 0.95) %>% 
  mutate(
    estimate = format(round(estimate, digits = 2), nsmall = 2), 
    p.value = format(round(p.value, digits = 3), nsmall = 3), 
    sig = case_when( 
      p.value < .001 ~ "***", 
      p.value < .01 ~ "**", 
      p.value < .05 ~ "*", 
      TRUE ~ "" ), 
    CI = paste0( "[", format( round(conf.low, digits = 2), nsmall = 2 ), 
      ", ", format( round(conf.high, digits = 2), nsmall = 2 ), "]" )
    ) %>% 
  select(term, estimate, sig, CI) %>% #p.value dropped
  kable(digits = c(0, 2, 0, 0),
    col.names = c("Parameter", "B", "", "95% CI"),
    align = "lrlc",
    caption = "Table 1. Predicting opinions about global warming.",
    booktabs = TRUE, #nicer layout in PDF
    escape = FALSE #pay attention to special characters
    ) %>%
  kable_styling(full_width = FALSE) %>%
  row_spec(0, font_size = 16) %>%
  column_spec(1, width = "5cm") %>%
  column_spec(2, width = "3cm") %>%
  column_spec(3, width = "0.5cm") %>%
  column_spec(4, width = "5cm") %>%
  footnote(
    general_title = "",
    general = "   * p < .05. ** p < .01. *** p < .001."
    )

Some final points about tabulating results:

Special characters such as stars (*) and percentage signes (%) can be troublesome in tables. You may have to escape them with one or more backslashes (\\).
PDF output has more formatting options than HTML (or Word).
kable() does not knit nicely to Word. Knit to HTML and import HTML in Word.

Results Plots

Standard `plot()` function

A data object with statistical results usually has a plot() function:

These plots are for quick inspection rather than final presentation.
They can be very useful for checking assumptions.

Apply the `plot()` function to the result of linear regression (`model_1a`). - Which plots do you get? - Are these all plots that you can get with this function?

model_1 <- lm(govact ~ age + negemot + partyid, data = glbwarm)
model_1a <- lm(govact ~ age*negemot + partyid, data = glbwarm)

__Hint:__ See `plot.lm()` for help.

Off-The-Shelf Plots

There are many packages offering ready-to-use plots, for example:

papaja: plots for analysis of variance.
coefplot: plots regression coefficients for one or more models (ggplot2 plots)
visreg: plots regression lines (ggplot2 plots).
effects: plots regression lines (not ggplot2 plots).

Note that ggplot2 plots created by such packages can be further customized: Save the plot (e.g., p) and then add layers, themes, ... (e.g., p + theme_bw()).

Interaction Plot with the `effects` Package

You can graph interaction effects with the effects package in two steps.

# Load effects package.
library(effects)
# Step 1: Create a data object containing all effects.
eff.model2 <- effects::allEffects(model_1a)
# Step 2: Plot interaction effects.
plot(eff.model2, 'age:negemot', x.var = 'age')

Note the rug on the horizontal axis, showing the age score of all cases within a negemot group.

Custom Plots with `ggplot()`

It is not so difficult to create this plot with ggplot().

Advantage: Full control. E.g., why does the plot from the effects package skip negative emotions around three?

Create a ggplot from `glbwarm` like the above effects plot with facets for negative emotions between 1 and 1.5 (labeled `1`), between 1.5 and 2.5 (labeled `2`), between 2.5 and 3.5 (labeled `3`), between 3.5 and 4.5 (labeled `4`), between 4.5 and 5.5 (labeled `5`), between 5.5 and 6 (labeled `6`). Name the new variable `negemot_bin`.

# Create the binned negative emotions variable.
glbwarm %>%
  mutate(negemot_bin = 
  case_when(
    negemot < 1.5 ~ 1,
    negemot < 2.5 ~ 2,
    negemot < 3.5 ~ 3,
    negemot < 4.5 ~ 4,
    negemot < 5.5 ~ 5,
    negemot >= 5.5 ~ 6
    )
  )

# Pipe the tibble into ggplot() and use geom_smooth().
glbwarm %>%
  mutate(negemot_bin = 
  case_when(
    negemot < 1.5 ~ 1,
    negemot < 2.5 ~ 2,
    negemot < 3.5 ~ 3,
    negemot < 4.5 ~ 4,
    negemot < 5.5 ~ 5,
    negemot >= 5.5 ~ 6
    )
  ) %>%
  ggplot( ) +
    geom_smooth( )

# Use geom_rug() to represent all observations on the horizontal axis.

# Use facet_wrap() on the binned negative emotions variable.

glbwarm %>% mutate(negemot_bin = case_when( negemot < 1.5 ~ 1, negemot < 2.5 ~ 2, negemot < 3.5 ~ 3, negemot < 4.5 ~ 4, negemot < 5.5 ~ 5, negemot >= 5.5 ~ 6)) %>% ggplot(aes(x = age)) + geom_smooth(aes(y = govact), method = lm) + geom_rug() + facet_wrap(vars(negemot_bin))

Do you notice differences between your plot and the plot created with the effects package?

Which plot do you trust more?

More ggplot practice

It is not that difficult to create a means plot showing the results of analysis of variance.

glbwarm %>% group_by(partyid, sex) %>% 
  summarise(avg_govact = mean(govact)) %>% 
  ggplot(aes(x = partyid, y = avg_govact, 
             color = sex)) + 
    geom_line(aes(group = sex)) + 
    geom_point() +
    labs(x = "Party identification",
    y = "Gov.intervention") +
    scale_y_continuous(
      limits = c(min(glbwarm$govact), max(glbwarm$govact)),
      breaks = 1:7
    ) +
    theme_bw() +
    theme(legend.position = c(0.8, 0.8),
      legend.background = element_blank())

Use your data wrangling skills and `gglot()` to create the above means plot.

glbwarm %>% group_by(partyid, sex) %>% 
  summarise(avg_govact = mean(govact)) %>% 
  ggplot( ____ )

# First calculate the group means that must be shown.
glbwarm %>% group_by(partyid, sex) %>% 
  summarise(avg_govact = mean(govact))
# Important: You plot summaries now, not the original observations.

# Use geom_point() to show the dots.

# Use geom_line() to show the lines with the group argument.

# Use theme_bw() for the general appearance of the plot.
# More on this in Session 7.

# Use legend.position and legend.background within theme()
# for the fine details of the legend.

(Reference) Importing SPSS Data

SPSS data files have a complicated setup with variable labels and value labels.

R data frames or tibbles do not have such labels.

In case you later have SPSS data that you want to analyze in R, here are two options for importing SPSS data.

Option 1. Export from SPSS to .csv and import .csv in R.

Export data from SPSS to a CSV file with value labels for categorical variables:
- File > Export > CSV Data with Save value labels where defined instead of data values.
Use read_csv() (as you learned before) to import the CSV file.

Import the SPSS file `data/glbwarm.csv` and have a look at it.

# The CSV is available in the data directory of this tutorial.
glbwarm <- read_csv("data/glbwarm.csv")
# Inspect the variables.
str(glbwarm)

Option 2. Import SPSS .sav file directly with tidyverse package `haven`.

The tidyverse package haven contains function read_sav() (or read_spss()) for importing SPSS (and other software packages) data files.

Import the SPSS file `data/glbwarm.sav` and have a look at it.

# The SPSS system file is available in the data directory of this tutorial.
glbwarm_spss <- haven::read_sav("data/glbwarm.sav")
# Inspect the variables.
str(glbwarm_spss)

Imported categorical variables such as ideology:

Are numerical codes with labels.
So, R treats these variables as numerical.
If you don't want that, change them into factors.

Use the `haven::as_factor()` function to add a variable named `sex_fct` to tibble `glbwarm_spss`.

# Add sex as a factor to the tibble.
glbwarm_spss <- glbwarm_spss %>% 
  mutate(sex_fct = haven::as_factor(sex))
# Inspect the original and new variable.
glbwarm_spss %>% count(sex, sex_fct)

Passing the entire tibble glbwarm_spss to haven::as_factor()will change all labelled variables into factors.

I am not sure this is what you want here. Perhaps you would like to use the ideology variable as a seven-point scale.

Fancy Stuff

If you can execute regression models in R, you can also execute these using Bayesian statistics instead of traditional (frequentist) statistics.

The popularity of Bayesian statistics as an alternative to null hypothesis significance testing is growing. If you want to be among the first in your field going Bayesian, check out the short introduction provided in Help, My Collaborator Goes Bayesian! Why And How To Apply Bayesian Data Analysis. Section 3.2 of the document offers a short introduction to using the rstanarm package for Bayesian data analysis.

The below code estimates a regression model predicting support for governmental action (govact) from age, negative emotions and party identification, with an interaction effect of age with negative emotions.

# Load the rstanarm package.
library(rstanarm)
# Estimate the regression model with rstanarm.
model_1aBayes <- rstanarm::stan_glm(govact ~ age * negemot + partyid,
                                    data = glbwarm)
# Standard output to screen.
print(model_1aBayes)

# Shows only the output of print.
print(model_1aBayes)

Bayesian estimation yields a probability distribution for every parameter.

The printed summary gives the median of the probability distribution of a regression coefficient as its point estimate. In addition, it shows the Mean Average Deviation of the probability distribution: a simple type of standard deviation.

It is easy to get the posterior distributions as a data frame or tibble, so you can find any probability you like for a parameter.

# Extract the posteriors from the fitted model to a tibble.
posteriors <- as_tibble(model_1aBayes)
# Overview of variables in posteriors: each independent variable and the error term (sigma).
str(posteriors)
# Calculate the probability that the effect of negemot is larger than 0 in the population.
prob <- posteriors %>%
  summarize(b_negemot = mean(negemot > 0))
# Plot the probability distribution and display this information.
posteriors %>% 
  mutate(positive = negemot > 0) %>%
  ggplot(mapping = aes(x = negemot)) +
    geom_histogram(
      aes(fill = positive),
      boundary = 0,
      bins = 30,
      show.legend = FALSE
      ) +
    geom_text(aes(label = prob$b_negemot, x = 0.15, y = 100))

The function launch_shinystan() in the shinystan:: package (automatically loaded by package rstanarm::) offers an interactive overview of estimation and model checks and results.

shinystan::launch_shinystan(model_1aBayes)
# Note: This function does not work from within a tutorial.

Data Project: To Do

Sprint 3: Retrospective & Review.
Sprint 4: Planning, Update Project Backlog.
Remaining time: Work on the Sprint 3 Backlog.

Note

Statistical analyses are not necessary for the Data Project. The Data Project focuses on visualizations.

You can, however, use statistical analysis to detect patterns in your data that you then try to visualize. If you do that, do not use off-the-shelf plots. Show that you can create a plot that hopefully is more attractive and more informative than off-the-shelf statistical plots.

Plenary updates Sprint 3 SCRUM masters

Last 15 minutes of the session.

WdeNooy/UsingRTutorials documentation built on Jan. 25, 2023, 2:39 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

WdeNooy/UsingRTutorials
Provides learnr Tutorials for a Using R Course

In WdeNooy/UsingRTutorials: Provides learnr Tutorials for a Using R Course

Overview

Basic Statistical Models

Example data

t test: `t.test()`

F test on Two Variances: `var.test()`

Linear Regression: `lm()`

Linear Regression: Two-Way Interaction

Analysis of Variance: `lm()` and `car::Anova()`

Missing Values

Print-Quality Results Tables

Off-The-Shelf Tables

papaja:: Write APA Style Papers in RMarkdown

Print-Quality Table With `texreg`

Functions for lm objects

Custom Tables with `broom` and `knitr`

Use function `tidy()` in the `broom` package to extract relevant statistics

Step 2: Select and adjust values to suit your needs.

Step 3: Create a table with `knitr::kable()` and `kableExtra`

Results Plots

Standard `plot()` function

Off-The-Shelf Plots

Interaction Plot with the `effects` Package

Custom Plots with `ggplot()`

More ggplot practice

(Reference) Importing SPSS Data

Option 1. Export from SPSS to .csv and import .csv in R.

Option 2. Import SPSS .sav file directly with tidyverse package `haven`.

Fancy Stuff

Data Project: To Do

Note

Plenary updates Sprint 3 SCRUM masters

R Package Documentation

Browse R Packages

We want your feedback!

WdeNooy/UsingRTutorials Provides learnr Tutorials for a Using R Course

In WdeNooy/UsingRTutorials: Provides learnr Tutorials for a Using R Course

Overview

Basic Statistical Models

Example data

t test: t.test()

F test on Two Variances: var.test()

Linear Regression: lm()

Linear Regression: Two-Way Interaction

Analysis of Variance: lm() and car::Anova()

Missing Values

Print-Quality Results Tables

Off-The-Shelf Tables

papaja:: Write APA Style Papers in RMarkdown

Print-Quality Table With texreg

Functions for lm objects

Custom Tables with broom and knitr

Use function tidy() in the broom package to extract relevant statistics

Step 2: Select and adjust values to suit your needs.

Step 3: Create a table with knitr::kable() and kableExtra

Results Plots

Standard plot() function

Off-The-Shelf Plots

Interaction Plot with the effects Package

Custom Plots with ggplot()

More ggplot practice

(Reference) Importing SPSS Data

Option 1. Export from SPSS to .csv and import .csv in R.

Option 2. Import SPSS .sav file directly with tidyverse package haven.

Fancy Stuff

Data Project: To Do

Note

Plenary updates Sprint 3 SCRUM masters

R Package Documentation

Browse R Packages

We want your feedback!

WdeNooy/UsingRTutorials
Provides learnr Tutorials for a Using R Course

t test: `t.test()`

F test on Two Variances: `var.test()`

Linear Regression: `lm()`

Analysis of Variance: `lm()` and `car::Anova()`

Print-Quality Table With `texreg`

Custom Tables with `broom` and `knitr`

Use function `tidy()` in the `broom` package to extract relevant statistics

Step 3: Create a table with `knitr::kable()` and `kableExtra`

Standard `plot()` function

Interaction Plot with the `effects` Package

Custom Plots with `ggplot()`

Option 2. Import SPSS .sav file directly with tidyverse package `haven`.