library(knitr)

Let's take interest in the relationship between life satisfaction, sex and concealing (the tendency to suppress or hide one's emotions).

The Data

# import packages that we will use
library(tidyverse)
library(psycho)

# import data and select variables that we will use
df <- psycho::affective %>% 
  select(Sex, Life_Satisfaction, Concealing)

# get a summmary of the data
summary(df)

Primitve Way

# Splithalf method
df$Concealing_Binary <- ifelse(df$Concealing > median(df$Concealing), "High", "Low")

anova_results <- aov(Life_Satisfaction ~ Sex * Concealing_Binary, data=df)
analyze(anova_results)
df %>% 
  ggplot(aes(x=Concealing_Binary, y=Life_Satisfaction, colour=Sex, group=Sex)) +
  stat_summary(fun.data = "mean_se", geom="pointrange") +
  stat_summary(fun.y = "mean", geom="line")
anova_results <- aov(Life_Satisfaction ~ Sex * Concealing, data=df)
analyze(anova_results)

ANOVAs comes from a time where psychologists had to do their maths by hand. If we wanted to go deeper, we have to do something like post-hocs tests and stuff, which are messy and not the correct way to proceed. Instead, we will investigate the underlying linear regression model.

Regression

fit <- lm(Life_Satisfaction ~ Sex * Concealing, data=df)

First, note that running an anova on the linear model produces EXACTLY the same results:

analyze(anova(fit)) 

But we can to have a look at the model itself, which is richer than the ANOVA.

analyze(fit) 

Again, three lines of "effect". One could think these a similar to the ones of ANOVA, with the main effect of Sex, the main effect of concealing and the interaction. But they are not exactly that! they refer to specific parameters (or "effects") of the model.

So an ANOVA reports each mean and a p-value that says at least two are significantly different. A regression reports only one mean (as an intercept), and the differences between that one and all other means, but the p-values evaluate those specific comparisons.

Visualize the data

The traditional way of visualizing the daa in this case, is do to a scatter plot with the dependent variable as Y, the linear predictor as X and the factor as colour. We can then draw the regression lines.

df %>% 
  ggplot(aes(x=Concealing, y=Life_Satisfaction, colour=Sex)) +
  geom_point() +
  geom_smooth(method="lm")

As you can see, this plot is rather ugly. Many points are overlapping, so we lose the vision of density. We can introduce some jitter to our points as follows:

df %>% 
  ggplot(aes(x=Concealing, y=Life_Satisfaction, colour=Sex)) +
  geom_jitter(width = 0.1, height = 0.1, alpha=0.5) +
  geom_smooth(method="lm")

It's a bit better, but the plot is still ugly, and the data points aren't really intuitive.

Visualize the model

Visualizing the data is alright for simple models, but for more complex kind of stuffs, it wouldn't work, as your model infered things that cannot be directly represented with the data on a 2D panel. One of the way of creating models visualization, one can directly plot the model's prediction.

Reference Data

refdata <- df %>% 
  select(Sex, Concealing) %>% 
  refdata()

predicted <- get_predicted(fit, newdata=refdata)

head(predicted)

Predictions

Plot

predicted %>% 
  ggplot(aes(x=Concealing, y=Life_Satisfaction_Predicted)) +
  geom_line(aes(colour=Sex)) +
  geom_ribbon(aes(fill=Sex,
                  ymin=Life_Satisfaction_CI_2.5,
                  ymax=Life_Satisfaction_CI_97.5),
              alpha=0.25)

This is what our model predicted.

Effects

refdata <- df %>% 
  select(Sex, Concealing) %>% 
  refdata()

get_predicted(fit, newdata=refdata) %>% 
  ggplot(aes(x=Concealing, y=Life_Satisfaction_Predicted)) +
  geom_line(aes(colour=Sex)) +
  geom_ribbon(aes(fill=Sex,
                  ymin=Life_Satisfaction_CI_2.5,
                  ymax=Life_Satisfaction_CI_97.5),
              alpha=0.25) + 
  geom_segment(aes(x = 0, y = 5.19, xend =0.2, yend = 5.19), size = 3, color="#f44336") +
  geom_segment(aes(x = 0, y = 5.1, xend =0, yend = 4.6), size = 3, arrow = arrow(length = unit(0.3, "inches")), color="#2196F3") +
  geom_segment(aes(x = 4, y = 4.8, xend =6, yend = 4.6), size = 3, arrow = arrow(length = unit(0.3, "inches")), color="#4CAF50") +
  geom_curve(aes(x = 6, y = 4.65, xend =6, yend = 5), size = 3, arrow = arrow(length = unit(0.3, "inches")), color="#FFC107")

These four parameters are sufficient to fully describe the model. And if they are "significantly" different from 0.

Changing Effects

Nevertheless, sometimes, these effects are not exactly what we look for. For example, I could be specifically interested in the value and significance of the blue slope (the relationship between concealing and life satisfaction in men). With this model, I only know that this slope is indeed different than the women's one. But there could be an absence of significance in men, and a strongly significant link in women. Or a significantly opposite link in men? How can we know this? How can be change the effects displayed?

Wait for the next post about changing reference levels and nested effects!

Contribute

Of course, these reporting standards should change, depending on new expert recommandations or official guidelines. The goal of this package is to flexibly adaptive to new changes and good practices evolution. Therefore, if you have any advices, opinions or such, we encourage you to either let us know by opening an issue, or even better, try to implement them yourself by contributing to the code.

Credits

This package helped you? Don't forget to cite the various packages you used :)

You can cite psycho as follows:

Previous blogposts



neuropsychology/psycho.R documentation built on Jan. 25, 2021, 7:59 a.m.