In neuropsychology/psycho.R: Efficient and Publishing-Oriented Workflow for Psychological Science

library(knitr)

Let's take interest in the relationship between life satisfaction, sex and concealing (the tendency to suppress or hide one's emotions).

The Data

# import packages that we will use
library(tidyverse)
library(psycho)

# import data and select variables that we will use
df <- psycho::affective %>% 
  select(Sex, Life_Satisfaction, Concealing)

# get a summmary of the data
summary(df)

Primitve Way

# Splithalf method
df$Concealing_Binary <- ifelse(df$Concealing > median(df$Concealing), "High", "Low")

anova_results <- aov(Life_Satisfaction ~ Sex * Concealing_Binary, data=df)
analyze(anova_results)

df %>% 
  ggplot(aes(x=Concealing_Binary, y=Life_Satisfaction, colour=Sex, group=Sex)) +
  stat_summary(fun.data = "mean_se", geom="pointrange") +
  stat_summary(fun.y = "mean", geom="line")

anova_results <- aov(Life_Satisfaction ~ Sex * Concealing, data=df)
analyze(anova_results)

ANOVAs comes from a time where psychologists had to do their maths by hand. If we wanted to go deeper, we have to do something like post-hocs tests and stuff, which are messy and not the correct way to proceed. Instead, we will investigate the underlying linear regression model.

Regression

fit <- lm(Life_Satisfaction ~ Sex * Concealing, data=df)

First, note that running an anova on the linear model produces EXACTLY the same results:

analyze(anova(fit))

But we can to have a look at the model itself, which is richer than the ANOVA.

analyze(fit)

Again, three lines of "effect". One could think these a similar to the ones of ANOVA, with the main effect of Sex, the main effect of concealing and the interaction. But they are not exactly that! they refer to specific parameters (or "effects") of the model.

So an ANOVA reports each mean and a p-value that says at least two are significantly different. A regression reports only one mean (as an intercept), and the differences between that one and all other means, but the p-values evaluate those specific comparisons.

Visualize the data

The traditional way of visualizing the daa in this case, is do to a scatter plot with the dependent variable as Y, the linear predictor as X and the factor as colour. We can then draw the regression lines.

df %>% 
  ggplot(aes(x=Concealing, y=Life_Satisfaction, colour=Sex)) +
  geom_point() +
  geom_smooth(method="lm")

As you can see, this plot is rather ugly. Many points are overlapping, so we lose the vision of density. We can introduce some jitter to our points as follows:

df %>% 
  ggplot(aes(x=Concealing, y=Life_Satisfaction, colour=Sex)) +
  geom_jitter(width = 0.1, height = 0.1, alpha=0.5) +
  geom_smooth(method="lm")

It's a bit better, but the plot is still ugly, and the data points aren't really intuitive.

Visualize the model

Visualizing the data is alright for simple models, but for more complex kind of stuffs, it wouldn't work, as your model infered things that cannot be directly represented with the data on a 2D panel. One of the way of creating models visualization, one can directly plot the model's prediction.

Reference Data

refdata <- df %>% 
  select(Sex, Concealing) %>% 
  refdata()

predicted <- get_predicted(fit, newdata=refdata)

head(predicted)

Predictions

Plot

predicted %>% 
  ggplot(aes(x=Concealing, y=Life_Satisfaction_Predicted)) +
  geom_line(aes(colour=Sex)) +
  geom_ribbon(aes(fill=Sex,
                  ymin=Life_Satisfaction_CI_2.5,
                  ymax=Life_Satisfaction_CI_97.5),
              alpha=0.25)

This is what our model predicted.

Effects

refdata <- df %>% 
  select(Sex, Concealing) %>% 
  refdata()

get_predicted(fit, newdata=refdata) %>% 
  ggplot(aes(x=Concealing, y=Life_Satisfaction_Predicted)) +
  geom_line(aes(colour=Sex)) +
  geom_ribbon(aes(fill=Sex,
                  ymin=Life_Satisfaction_CI_2.5,
                  ymax=Life_Satisfaction_CI_97.5),
              alpha=0.25) + 
  geom_segment(aes(x = 0, y = 5.19, xend =0.2, yend = 5.19), size = 3, color="#f44336") +
  geom_segment(aes(x = 0, y = 5.1, xend =0, yend = 4.6), size = 3, arrow = arrow(length = unit(0.3, "inches")), color="#2196F3") +
  geom_segment(aes(x = 4, y = 4.8, xend =6, yend = 4.6), size = 3, arrow = arrow(length = unit(0.3, "inches")), color="#4CAF50") +
  geom_curve(aes(x = 6, y = 4.65, xend =6, yend = 5), size = 3, arrow = arrow(length = unit(0.3, "inches")), color="#FFC107")

These four parameters are sufficient to fully describe the model. And if they are "significantly" different from 0.

Changing Effects

Nevertheless, sometimes, these effects are not exactly what we look for. For example, I could be specifically interested in the value and significance of the blue slope (the relationship between concealing and life satisfaction in men). With this model, I only know that this slope is indeed different than the women's one. But there could be an absence of significance in men, and a strongly significant link in women. Or a significantly opposite link in men? How can we know this? How can be change the effects displayed?

Wait for the next post about changing reference levels and nested effects!

Contribute

Of course, these reporting standards should change, depending on new expert recommandations or official guidelines. The goal of this package is to flexibly adaptive to new changes and good practices evolution. Therefore, if you have any advices, opinions or such, we encourage you to either let us know by opening an issue, or even better, try to implement them yourself by contributing to the code.

Credits

This package helped you? Don't forget to cite the various packages you used :)

You can cite psycho as follows:

Makowski, (2018). The psycho Package: An Efficient and Publishing-Oriented Workflow for Psychological Science. Journal of Open Source Software, 3(22), 470. https://doi.org/10.21105/joss.00470

Previous blogposts

neuropsychology/psycho.R documentation built on Jan. 25, 2021, 7:59 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

neuropsychology/psycho.R
Efficient and Publishing-Oriented Workflow for Psychological Science

In neuropsychology/psycho.R: Efficient and Publishing-Oriented Workflow for Psychological Science

The Data

Primitve Way

Regression

Visualize the data

Visualize the model

Reference Data

Predictions

Plot

Effects

Changing Effects

Contribute

Credits

Previous blogposts

R Package Documentation

Browse R Packages

We want your feedback!

neuropsychology/psycho.R Efficient and Publishing-Oriented Workflow for Psychological Science

In neuropsychology/psycho.R: Efficient and Publishing-Oriented Workflow for Psychological Science

The Data

Primitve Way

Regression

Visualize the data

Visualize the model

Reference Data

Predictions

Plot

Effects

Changing Effects

Contribute

Credits

Previous blogposts

R Package Documentation

Browse R Packages

We want your feedback!

neuropsychology/psycho.R
Efficient and Publishing-Oriented Workflow for Psychological Science