library(learnr) knitr::opts_chunk$set(echo = FALSE)
The idea when plotting conditional means is to show how the outcome, or variable of interest, varies as a function of predictors.
Run the code below to load these packages and data. Today we will be working with tidyverse. To handle colors, we'll need the package RColorBrewer. We will use the HSLS data from the ds package.
library(ds) library(tidyverse) library(RColorBrewer) data("hsls_clean")
Our primary outcome of interest will be student GPA. We can quickly summarize this variable using summarize and functions like mean and sd.
#Use the summarize function, with mean and sd options hsls_clean%>% _____(mean_gpa=____(gpa,na.rm=TRUE), sd_gpa=___(gpa,na.rm=TRUE))
#Solution hsls_clean%>% summarize(mean_gpa=mean(gpa,na.rm=TRUE), sd_gpa=sd(gpa,na.rm=TRUE))
Univariate graphics, like a bar graph for an individual variable, help us to depict characteristics and understand how they distributed across the sample.
Use the geom_bar function to create a bar graph with the number of student athletes and non-athletes (the sports variable).
#First, use group_by to group the data by characteristics of the sports variable #Then use the ggplot function to graph the groups of the sports variable
#First, use group_by to group the data by characteristics of the sports variable #Then use the ggplot function to graph the groups of the sports variable hsls_clean%>% _____(sports)%>% count()%>% _____(aes(x=sports,y=n,fill=as_factor(sports)))+ geom_bar(stat="identity")
#Solution hsls_clean%>% group_by(sports)%>% count()%>% ggplot(aes(x=sports,y=n,fill=as_factor(sports)))+ geom_bar(stat="identity")
Bar graphs are great for factor variables, but do not make sense for continuous variables, like SES.
Generate a histogram for SES.
#Use ggplot and geom_histogram
#Use ggplot and geom_histogram hsls_clean%>% _____(aes(x=ses))+ _____()
#Solution hsls_clean%>% ggplot(aes(x=ses))+ geom_histogram()
Next, graph a density plot to show the continuous distribution of ses.
#This one is just like the histogram, but with geom_density specified instead of geom_histogram
#Use the geom_density function hsls_clean%>% ggplot(aes(x=ses))+ _____()
#Solution hsls_clean%>% ggplot(aes(x=ses))+ geom_density()
Before we depict conditional means of student GPA as a function of another variable, a quick review:
Calculate the average GPA by the number of hours students spend on extracurricular activities each week:
#Group the data by extracurricular hours, then use the summarize function to find the average GPA for each level
#Group the data by extracurricular hours, then use the summarize function to find the average GPA for each level hsls_clean%>% _____(hours_extracurricular)%>% _____(mean_gpa_ap=mean(gpa,na.rm=TRUE))
#Solution hsls_clean%>% group_by(hours_extracurricular)%>% summarize(mean_gpa_ap=mean(gpa,na.rm=TRUE))
Now plot the average GPA by extracurricular hours in a bar graph (don't forget to make the bars different colors!):
#Start with the conditional means code we just did, then use ggplot and geom_bar (just like the sports bar graph above) #Add the fill option to ggplot to make the bars different colors
#Start with the conditional means code we just did (group_by and summarize), then use ggplot and geom_bar (just like the sports bar graph above) #Add the fill option to ggplot to make the bars different colors hsls_clean%>% _____(hours_extracurricular)%>% _____(mean_gpa=mean(gpa,na.rm=TRUE))%>% _____(aes(x=hours_extracurricular,y=mean_gpa,____=hours_extracurricular))+ _____(stat="identity",position="dodge")
#Solution hsls_clean%>% group_by(hours_extracurricular)%>% summarize(mean_gpa=mean(gpa,na.rm=TRUE))%>% ggplot(aes(x=hours_extracurricular,y=mean_gpa,fill=hours_extracurricular))+ geom_bar(stat="identity",position="dodge")
How does GPA conditional on extracurricular hours look as a dot plot?
#This code follows the same pattern as the conditional bar graph #Change geom_bar from the prior example to geom_point
#This code follows the same pattern as the conditional bar graph #Change geom_bar from the prior example to geom_point hsls_clean%>% group_by(hours_extracurricular)%>% summarize(mean_gpa=mean(gpa,na.rm=TRUE))%>% ggplot(aes(x=hours_extracurricular,y=mean_gpa,fill=hours_extracurricular))+ _____()
#Solution hsls_clean%>% group_by(hours_extracurricular)%>% summarize(mean_gpa=mean(gpa,na.rm=TRUE))%>% ggplot(aes(x=hours_extracurricular,y=mean_gpa,fill=hours_extracurricular))+ geom_point()
Are there differences in the conditional mean of GPA and extracurricular hours by sex?
Plot a bar graph for GPA by both extracurricular hours and sex. The location of the bars should be the number of extracurricular hours and bar color should indicate sex.
#Start with the code above to make the GPA/extracurricular bar graph #Group by both hours_extracurricular and sex #Since the color should be different by sex, fill by sex
#Start with the code above to make the GPA/extracurricular bar graph #Group by both hours_extracurricular and sex #Since the color should be different by sex, fill by sex hsls_clean%>% group_by(hours_extracurricular, _____)%>% summarize(mean_gpa=mean(gpa,na.rm=TRUE))%>% ggplot(aes(x=hours_extracurricular,y=mean_gpa,fill=_____))+ _____(stat="identity",position="dodge")
#Solution hsls_clean%>% group_by(hours_extracurricular, sex)%>% summarize(mean_gpa=mean(gpa,na.rm=TRUE))%>% ggplot(aes(x=hours_extracurricular,y=mean_gpa,fill=sex))+ geom_bar(stat="identity",position="dodge")
Now, I know what you are thinking: might there be a difference in the relationship between GPA, extracurricular participation, and sex for student athletes? Let's find out!
Use faceting (making multiple graphs side by side) to depict GPA by extracurricular hours, sports participation, and sex. Arrange the position based on extracurricular hours, color by sports participation, and faceting by sex.
#Start with the code above to make the GPA/extracurricular/sex bar graph #Group by hours_extracurricular, sex, and sports #Since the color should be different by sports this time, fill by sports #Add the facet_wrap function by sex to make two graphs, one for male and one for female
#Start with the code above to make the GPA/extracurricular/sex bar graph #Group by hours_extracurricular, sex, and sports #Since the color should be different by sports this time, fill by sports #Add the facet_wrap function by sex to make two graphs, one for male and one for female hsls_clean%>% group_by(hours_extracurricular, sports, _____)%>% summarize(mean_gpa=mean(gpa,na.rm=TRUE))%>% ggplot(aes(x=hours_extracurricular,y=mean_gpa,fill=_____))+ geom_bar(stat="identity",position="dodge")+ ______(~sex)
#Solution hsls_clean%>% group_by(hours_extracurricular, sports, sex)%>% summarize(mean_gpa=mean(gpa,na.rm=TRUE))%>% ggplot(aes(x=hours_extracurricular,y=mean_gpa,fill=sports))+ geom_bar(stat="identity",position="dodge")+ facet_wrap(~sex)
We can make this graph look better with some fine tuning. For example, flippling the graph on the side makes it easier to read. You could also change the colors. Try some different options to make the graph look great:
#Solution hsls_clean%>% group_by(hours_extracurricular, sports, sex)%>% summarize(mean_gpa=mean(gpa,na.rm=TRUE))%>% ggplot(aes(x=hours_extracurricular,y=mean_gpa,fill=sports))+ geom_bar(stat="identity",position="dodge")+ facet_wrap(~sex)+ coord_flip()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.