We've ran several models, checked them, and identified confounders. Now let's tidy these models up, extract relevant results, and interpret them.
tidy
from broom package library(broom.mixed) model <- glmer(got_cvd ~ body_mass_index_scaled + sex + (1 | subject_id), data = tidied_framingham, family = binomial) # General tidying tidy(model_object)
Most statistical methods in R are developed by independent researchers, so there usually isn't an underlying consistency in presenting the method's results. They can be messy to deal with and it can be a frustrating experience when learning something new. Thankfully there is the tidy function from the broom package to help out! Tidy, which takes a model object, allows you to clean up many analyses, calculate confidence intervals for uncertainty, and, for logistic regression, calculates the odds ratio. Odds ratios are covered more in the Logistic Regression course, but briefly, it is the odds of an outcome occurring given a predictor's presence compared to the odds given the predictor's absence.
# Confidence interval tidy(model_object, conf.int = TRUE)
# A tibble: 4 x 9 effect group term estimate std.error statistic p.value conf.low conf.high <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> 1 fixed NA (Int… -12.4 0.321 -38.5 0 -13.0 -11.7 2 fixed NA body… 0.229 0.169 1.35 0.177 -0.103 0.560 3 fixed NA sexW… -0.916 0.365 -2.51 0.0122 -1.63 -0.200 4 ran_p… subj… sd__… 56.1 NA NA NA NA NA
You've used summary on a model before, which isn't the best way of accessing results. Calculating the confidence interval in base R requires extra work. To use tidy, you provide the model object, and set conf dot int to true.
What you'll get is a nice dataframe of results and confidence intervals. Confidence intervals are, very simply, a range in uncertainty around the estimate.
tidied_model <- model %>% tidy(exponentiate = TRUE, conf.int = TRUE) %>% select(effect, term, estimate, conf.low, conf.high) tidied_model
# A tibble: 4 x 5 effect term estimate conf.low conf.high <chr> <chr> <dbl> <dbl> <dbl> 1 fixed (Intercept) 0.00000430 0.00000230 0.00000807 2 fixed body_mass_index_scaled 1.26 0.902 1.75 3 fixed sexWoman 0.400 0.196 0.819 4 ran_pars sd__(Intercept) 56.1 NA NA
When you run analyses with a binary outcome, like disease status, you need to exponentiate the results. Exponentiating converts the original estimates from log-odds to odds ratios. In tidy set the exponentiate argument to true and you'll now have odds ratios. Next, we'll keep only the most important model results, the effect, term, estimate, and confidence interval. We can now easily wrangle and plot from this tidy dataframe.
So why are these particular variables important? We definitely need the effect and term column to identify the fixed and random effects. And keeping the estimate and confidence intervals gives us the magnitude and uncertainty around an association, which are vital to getting tangible and concrete answers to questions. This is especially crucial for health research, as this information can dictate future health policies.
# A tibble: 4 x 5 effect term estimate conf.low conf.high <chr> <chr> <dbl> <dbl> <dbl> 1 fixed (Intercept) 0.00000430 0.00000230 0.00000807 2 fixed body_mass_index_scaled 1.26 <-- 0.902 <-- 1.75 <-- 3 fixed sexWoman 0.400 0.196 0.819 4 ran_pars sd__(Intercept) 56.1 <-- NA NA
estimate
for fixed
effect is the "marginal" (population-level) effect term
is estimate
odds in CVD, adjusting for other term
estimate
ranges from conf.low
to conf.high
estimate
for ran_pars
effect indicates variation between subjects Let's interpret these results. The estimates for the fixed effect rows are the values for the population level averages.
The estimate value itself is the estimated odds when the predictor or term increases by one unit, after controlling for the other predictors, in this case sex. So, because BMI is scaled, we say that a one standard deviation increase in BMI is associated with a one point twenty-six times higher risk for getting CVD.
We need to also consider the confidence interval. We interpret this by saying that for BMI, the estimated odds ranges from a zero point nine lower risk to a one point seventy-five higher risk.
Lastly, the estimate for the random effect indicates the variation between subjects. In this case, there is a lot, at fifty-six! We expect this, though, since individuals vary a lot.
"... conclusions and ... decisions should not be based on [if] a p-value passes a threshold." ...
"p-value [is] not ... a good measure of evidence ... [and] does not measure the size of an effect or the importance"
Example: Odds ratio of 0.8 (0.59, 1.01 95% CI), p>0.05 ("not significant"), but uncertainty could reach 0.59 times lower odds of disease
DOI to statement: https://doi.org/10.1080/00031305.2016.1154108
You may have noticed that I didn't discuss the p-value. Why? Because it provides little to no clinical or public health utility. The American Statistical Association released a statement highlighting the problems with the p-value, stating they are not good evidence for a hypothesis or the importance of a finding.
For example, a drug lowers risk of a disease by zero point eight times, but the p-value is above zero point zero five. Normally studies would say this is not significant. But the uncertainty at the lower end is an odds of zero point fifty-nine times lower risk for disease, so it could be clinically meaningful!
Alrighty, let's tidy up some models!
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.