library(gradethis) library(learnr) library(qsslearnr) tutorial_options(exercise.checker = gradethis::grade_learnr) knitr::opts_chunk$set(echo = FALSE) tut_reptitle <- "QSS Tutorial 1: Output Report" data(resume, package = "qss")
quiz( caption = "", question( "Suppose a variable is binary, that is, it takes on values of either 0 or 1 (for example, female gender). Which of the following is the same as its sample mean?", answer("the sample median"), answer("the sample proportion of 1s", correct = TRUE), answer("neither of these") ), question( "What kind of value is `FALSE`?", answer("character"), answer("logical", correct = TRUE), answer("binary"), answer("numeric") ), question( "In order to calculate the mean of a variable we have used the `length()` function in the denominator. The `length()` of a vector is equivalent to:", answer("the number of elements", correct = TRUE), answer("the height"), answer("the maximum") ), question( "How are factor variables different from categorical variables?", answer("They are the same", correct = TRUE), answer("Factor variables contain numeric values"), answer("Categorical variables tend to have more levels or categories") ) )
In this tutorial, we are going to be working with the resume data from Section 2.1 of QSS. This data comes from an experiment where researchers sent fictitious resumes with different names that implied different race and gender combinations to see if potential employers were more likely to call back names associated with different racial groups and genders.
Let's first explore the data a bit. It's stored as resume
.
head
function to show the first six lines of the resume
data.## print the first 6 lines of the data
head(resume)
grade_code()
resume
data.dim(resume)
grade_code()
summary
function to show a summary of the data.summary(resume)
grade_code()
To help you analyze this data, you can use a cross tabulation. Cross tabulation (or contingency table) is a table that quickly summarizes categorical data. For instance, in the resume data, we have a sex
variable that tells us whether or not the fictitious resume had a male or a female name.
table
function, create a cross tab of the sex
and call
variables in the resume data.$
.grade_result( pass_if(~ identical(.result, table(resume$sex, resume$call))), pass_if(~ identical(.result, table(resume$call, resume$sex))) )
Pretty soon, you'll be doing more complicated subsetting in R. To do this, it's helpful to understand a special type of object in R: the logical. There are two values associated with this type of object: TRUE
and FALSE
(where the uppercase is very important).
x
that contains two TRUE
values and two FALSE
values in that order.## creat a vector with two TRUE values and two FALSE values x <- ## take the sum of this vector
x <- c() sum(x)
grade_result_strict( pass_if(~ identical(x, c(TRUE, TRUE, FALSE, FALSE))), pass_if(~ identical(.result, 2L)) )
z
that contains one TRUE
values and three FALSE
values in that order.## creat a vector with one TRUE values and three FALSE values z <- ## take the mean of this vector
z <- c() mean(z)
grade_result_strict( pass_if(~ identical(z, c(TRUE, FALSE, FALSE, FALSE))), pass_if(~ identical(.result, 0.25)) )
We often combine logical statements using AND (&
) and OR (|
) in R. For AND statements, both expressions have to be true for the whole expression to be true:
TRUE & FALSE
, FALSE & TRUE
, and FALSE & FALSE
are FALSE
TRUE & TRUE
is TRUE
For OR statements, either statement being true makes the whole expression true:
TRUE | FALSE
, FALSE | TRUE
, and TRUE | TRUE
are TRUE
FALSE | FALSE
is FALSE
question("What does expression `(TRUE | FALSE) & TRUE` evaluate to?", answer("`TRUE`", correct = TRUE), answer("`FALSE`"), answer("`NA`") )
There are several relational operators that allow us to compare objects in R. The most useful of these are the following:
>
greater than, >=
greater than or equal to<
less than, <=
less than or equal to==
equal to!=
not equal toWhen we use these to compare two objects in R, we end us with a logical object. You can also compare a vector to a particular number.
10 > 5
grade_code()
x
is greater than or equal to 0.## x vector x <- c(-2, -1, 0, 1, 2) ## test which values of x are greater than or equal to 0
grade_result( fail_if(~ identical(.result, x > 0), "Did you forget the 'or equal to' part of the comparison?"), pass_if(~ identical(.result, x >= 0)) )
You can use the same logical statements you have been using to create subsets of a data frame. These can often be helpful because we'll want to calculate various quantities of interest for different subsets of the data. For this exercise, we will use the resume
data frame made up of the variables firstname
, sex
, race
, and call
. As a reminder, here is what the data look like:
resume
subset
function to create a subset of the resume
data frame that is only female names that sound white. Save this subset as resume.wf
head
function to print out the first 6 lines of this subset.call
variable in this subset.## create the subset for white female names and ## assign it to resume.wf resume.wf <- ... ## print the first 6 lines of the subset ## calculate the mean of the callback variable (call)
resume.wf <- subset(resume, subset = (race == "white" & sex == "female")) head(resume.wf) mean(resume.wf$call)
grade_result( pass_if(~ identical(.result, mean(subset(resume, subset = (race == "white" & sex == "female"))$call))) )
You can use the same ideas as in the last step to create a different subset of the data corresponding to white-sounding female names. Then, you can compare the average callback for the white-female names to the average callback for the black-female names. This will give you a sense of how the employer callback rate varies by racial group of the applicant for females.
resume
data for black-sounding female names.call
variable in the white-sounding name subset and the black-sounding name subset.## create the subset for white female names resume.wf <- subset(resume, subset = (race == "white" & sex == "female")) ## create the subset for black female names resume.bf <- ## calculate the difference in callback means
## create the subset for white female names resume.wf <- subset(resume, subset = (race == "white" & sex == "female")) ## create the subset for black female names resume.bf <- subset(resume, subset = (race == "black" & sex == "female")) ## compare the difference in means mean(resume.wf$call) - mean(resume.bf$call)
grade_code("You just analyzed an experiment! Way to go!")
submission_ui
submission_server()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.