In mattblackwell/qsslearnr: Learnr tutorials based on Quantitative Social Science

library(gradethis)
library(learnr)
library(qsslearnr)
tutorial_options(exercise.checker = gradethis::grade_learnr)
knitr::opts_chunk$set(echo = FALSE)
tut_reptitle <- "QSS Tutorial 3: Output Report"
data(STAR, package = "qss")
star <- STAR

Handling Missing Data in R

Small class size data

In this chapter, you'll analyze data from the STAR project, which is a four-year randomized trial on the effectiveness of small class sizes on education performance. The star data frame as been loaded into your space so that you can play around with it a bit.

Exercises

Use the head function on the star to see what the data looks like.

head(star)

grade_code()

Use the dim function on the star to see what the dimensions of the data look like.

dim(star)

grade_code()

Use the summary function on the star to get a sense for each variable.

summary(star)

grade_code()

Handling missing data

You probably noticed that there were some NA values in the data when you used the head() function. These are missing values, where the value for that unit on that variable is missing or unknown. These values pose problems when we are trying to calculate quantities of interest like means or medians because R doesn't know how to handle them.

The first tool in your toolkit for missing data is the is.na() function. When you pass a vector x to is.na(x), it will return a vector of the same length where each entry is TRUE if the value of x is NA and FALSE otherwise. Using logicals, you can easily get the opposite vector !is.na(x) which is TRUE when x is observed and FALSE when x is missing.

Exercises

Use the is.na and head functions to show whether or not the first 6 values of the g4math variable from the star data frame are missing.

head(is.na(star$g4math))

grade_result(
  pass_if(~ identical(.result, head(is.na(star$g4math))))
)

Use the is.na and sum functions to show how many values of the g4math variable are missing.

sum(is.na(star$g4math))

grade_code()

Use the is.na and mean functions to show what proportion of the g4math variable is missing.

mean(is.na(star$g4math))

grade_code()

Calculating means in the fact of missing data

Missing values makes it difficult to calculate numerical quantities of interest like the mean, median, standard deviation, or variance. Many of these function will simply return NA if there is a single missing value in the vector. We can instruct many function to ignore the missing values and do their calculation on just the observed data by using the na.rm = TRUE argument. For instance, suppose we have x <- c(NA, 1,2,3), then mean(x) will return NA, but mean(x, na.rm = TRUE) will return 2.

Exercises

Try to calculate the mean of the g4math variable in the star data frame without setting na.rm.

mean(star$g4math)

grade_code(
  correct = "This isn't that useful though!"
)

Try to calculate the mean of the g4math variable when setting na.rm = TRUE.

mean(star$g4math, na.rm = TRUE)

grade_code()

Visualizing Data

Barplots

The barplot is a useful way to visualize a categorical or factor variable. In this exercise, you are going to visualize the classtype variable from the star data frame, which can take on the following values:

1 = small class
2 = regular class
3 = regular class with aid

Exercises

Use the table function to create a vector of counts classcounts for each category of the classtype in the star data.
Pass this classcounts vector to the barplot function.

## creat a vector called classcounts that has
## the counts of each category of classtype


## pass classcounts to barplot

classcounts <- table(star$classtype)

## pass classcounts to barplot
barplot(classcounts)

grade_code("Awesome. The graph is looking a little unhelpful, though. Let's spruce it up.")

Making the barplot readable

The default barplot usually isn't all that readable.

Exercises

Create a vector of character labels called classnames that has the values "Small class" (for 1), "Regular class" (for 2), and "Regular class with aid" (for 3).
Call barplot with the heights as before, but now passing classnames to the names.arg argument and use "Number of students" as the ylab argument.
Make sure to separate the arguments in the barplot call with commas.

## create a vector giving the counts of each category of classtype
classcounts <- table(star$classtype)

## create a vector of labels called classnames


## pass classcounts to barplot and set the y-axis label

classcounts <- table(star$classtype)

## create a vector of labels
classnames <- c("Small class", "Regular class", "Regular class with aid")

## pass classcounts to barplot and set the y-axis label
barplot(classcounts, names.arg = classnames, ylab = "Number of students")

Histograms

For quantitative (numerical) variables, the barplot won't work because there are too many unique values. In this case, you will often use a histogram to visualize the a numerical variable.

Exercises

Use the hist function to create a histogram for the g4math variable in the star data frame.
Be sure to use the freq = FALSE argument to the hist function to get the histogram.
Make sure to separate the arguments in function calls with commas.
Remember that you can access a particular variable using the $.

## create a histogram of g4math

## create a histogram of g4math
hist(star$g4math, freq = FALSE)

grade_code("Great job, though the graph is a bit spartan. Let's make it more readable.")

Sprucing up the histogram

There are several arguments you can pass to hist that will improve its readability:

main: a character string that prints a main title for the plot.
xlab, ylab: character strings that set the labels for the x (horizontal) and y (vertical) axes
xlim, ylim: numeric vectors of length 2 to specify the interval for the x and y axes.

Exercises

Create a histogram (with freq = FALSE) where you (a) set the y-axis to be between 0 and 0.015 using the ylim argument, (b) include an informative x-axis label using the xlab argument, and (c) include a title for the plot using the main argument.
Make sure to separate the arguments in function calls with commas.

## create the histogram with the specifications given in the instructions

## create the histogram with the specifications given in the instructions
hist(star$g4math, freq = FALSE, xlab = "Score", main = "Distribution of fourth-grade math scores", ylim = c(0,0.015))

Adding lines and text to a plot

We'll often want to add more information to a plot to make it even more readable. You can do that with commands that add to the current plot, such as abline and text. abline(v = 1) will add a vertical line to the plot at the specified value (1 in this example).

Exercises

Use the abline function to add a vertical line at the mean of the g4math variable from the star data. Remember, there are missing values in that variable so be sure to use the na.rm = TRUE argument to drop them.

hist(star$g4math, freq = FALSE, xlab = "Score", main = "Distribution of fourth-grade math scores", ylim = c(0,0.015))

## add a vertical line at the mean of the variable

hist(star$g4math, freq = FALSE, xlab = "Score", main = "Distribution of fourth-grade math scores", ylim = c(0,0.015))

## add a vertical line at the mean of the variable
abline(v = mean(star$g4math, na.rm = TRUE))

grade_code()

Adding text to a plot

We'll sometimes want to add text to a plot to make it more informative. text(x,y,z) adds a character string z centered at point on the (x, y) on the plot. You can use the axis labels to see where you might want to add these parts of the plot.

Exercise

Use the text function to add the string Average Score to the plot at the point (750, 0.014).
Make sure to separate the arguments in function calls with commas.

hist(star$g4math, freq = FALSE, xlab = "Score", main = "Distribution of fourth-grade math scores", ylim = c(0,0.015))
abline(v = mean(star$g4math, na.rm = TRUE))

## add the text "Average Score" at the specified location

hist(star$g4math, freq = FALSE, xlab = "Score", main = "Distribution of fourth-grade math scores", ylim = c(0,0.015))

## add a vertical line at the mean of the variable
abline(v = mean(star$g4math, na.rm = TRUE))

## add the text "Average Score" at the specified location
text(x = 750, y = 0.014, "Average Score")

grade_code()

Submit

submission_ui

submission_server()

mattblackwell/qsslearnr documentation built on Sept. 17, 2022, 6:25 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

mattblackwell/qsslearnr
Learnr tutorials based on Quantitative Social Science

In mattblackwell/qsslearnr: Learnr tutorials based on Quantitative Social Science

Handling Missing Data in R

Small class size data

Exercises

Handling missing data

Exercises

Calculating means in the fact of missing data

Exercises

Visualizing Data

Barplots

Exercises

Making the barplot readable

Exercises

Histograms

Exercises

Sprucing up the histogram

Exercises

Adding lines and text to a plot

Exercises

Adding text to a plot

Exercise

Submit

R Package Documentation

Browse R Packages

We want your feedback!

mattblackwell/qsslearnr Learnr tutorials based on Quantitative Social Science

In mattblackwell/qsslearnr: Learnr tutorials based on Quantitative Social Science

Handling Missing Data in R

Small class size data

Exercises

Handling missing data

Exercises

Calculating means in the fact of missing data

Exercises

Visualizing Data

Barplots

Exercises

Making the barplot readable

Exercises

Histograms

Exercises

Sprucing up the histogram

Exercises

Adding lines and text to a plot

Exercises

Adding text to a plot

Exercise

Submit

R Package Documentation

Browse R Packages

We want your feedback!

mattblackwell/qsslearnr
Learnr tutorials based on Quantitative Social Science