library(gradethis) library(learnr) library(qsslearnr) tutorial_options(exercise.checker = gradethis::grade_learnr) knitr::opts_chunk$set(echo = FALSE) tut_reptitle <- "QSS Tutorial 2: Output Report" data(resume, package = "qss")
question('Which of the following approaches to identifying causal relationships is considered the "gold standard" in many scientific disciplines?', answer("Randomized controlled trials"), answer("Randomized experiments"), answer("Either of these", correct = TRUE) )
For ethical and logistical reasons, social scientists are often unable to conduct randomized controlled trials (RCTs). Therefore, they must conduct observational studies in which data on naturally occurring events are collected and analyzed.
question("With observational studies, it is often hard to establish that changes in one variable caused changes in another variable. In other words, observational studies have less __________ compared to RCTS.", answer("Internal validity", correct = TRUE), answer("External validity"), answer("Generalizability"), answer("All of these"))
In this exercise, you have the ages of a sample of 15 people, stored in the ages
vector. We can use these relational operators to create a logical vector which indicates which ages fall within a specific range. In particular, we can find out which respondents are college-aged (18-22).
college.aged
, which indicates which observations in ages
are greater than or equal to 18 and less than or equal to 22. Be sure to use parentheses to separate out the two logical statements.college.aged
vector to determine how many 18-22 year olds there are in the sample.## check the value of the ages vector ages <- c(31, 20, 43, 45, 41, 46, 28, 49, 61, 19, 39) ## create a logical vector called college.aged ## that is TRUE for someone between 18-22, inclusive ## find the number of college.aged respondents
college.aged <- (ages >= 18) & (ages <= 22) sum(college.aged)
grade_result_strict( pass_if(~ identical(college.aged, (ages >= 18) & (ages <= 22))), pass_if(~ identical(.result, sum((ages >= 18) & (ages <= 22)))) )
In the last exercise, you used logical statements to create a vector that told us whether each entry in the ages
vector is in the 18-22 year-old range. We can now use that information to figure out what the actual ages of the respondents in that range are.
college.aged
logical vector that has already been created to subset ages
to the value only between 18 and 22, inclusive.mean
function to calculate the average age of the respondents in this subset. To do this, use the bracket subsetting and the college.aged
logical vector that has already been created.ages <- c(31, 20, 43, 45, 41, 46, 28, 49, 61, 19, 39)
## here's the college.aged logical vector college.aged <- (ages >= 18) & (ages <= 22) ## calculate the average age among the college-aged in the sample
mean(ages[college.aged])
grade_result( pass_if(~ identical(.result, mean(ages[college.aged]))) )
For this exercise, we'll use the resume data once again.
What if we wanted to create a new vector that depends on whether a statement is true or false? For example, suppose you wanted to create an indicator variable for whether or not a specific resume had the name "Carrie." From, the last few examples, you know that resume$firstname == "Carrie"
will give you a vector of TRUE
and FALSE
values based on whether or not the name for that unit is "Carrie." We can then use this to get create a new variable using the ifelse(X, Y, Z)
command. This command takes a logical vector as X
and returns a new vector of the same length as X
that has the value Y
if that value in X
is TRUE and Z
if that value in X
is FALSE.
ifelse
function to create a new variable called carrie
that is 1 if the resume name (firstname
) is "Carrie"
and 0 otherwise.resume
using the head
function to see the new variable.## create a new variable called carrie resume$carrie <- ## print the first 6 lines of the updated resume
## create a new variable called carrie resume$carrie <- ifelse(???, 1, 0) ## print the first 6 lines of the updated resume head(resume)
## create a new variable called carrie resume$carrie <- ifelse(resume$firstname == "Carrie", 1, 0) ## print the first 6 lines of the updated resume head(resume)
grade_result_strict( pass_if(~ identical(resume$carrie, ifelse(resume$firstname == "Carrie", 1, 0))) )
You have seen that creating subsets can be helpful for calculating different quantities or statistics for specific subgroups in the data. When there is more than 1 or 2 subgroups of interest, however, this can be a cumbersome process. For that reason, it's helpful to know about factor variables. Basically, a factor variable is a categorical variable that takes a finite number of distinct values.
Any variable can be turned into a factor by calling the as.factor()
function like so:
mydata$myvar <- as.factor(mydata$myvar)
This will take the variable myvar
and create a factor variable with levels that are observed in that variable. Most often, you will convert a character variable to a factor.
type
character variable. Fill in the last values of race
and sex
and add the label WhiteMale
to this last type.resume$type
variable to a factor variable using as.factor()
.resume$carrie <- ifelse(resume$firstname == "Carrie", 1, 0)
## fill in the last line of code to create a character vector for the type of ## application that was sent resume$type <- NA resume$type[resume$race == "black" & resume$sex == "female"] <- "BlackFemale" resume$type[resume$race == "black" & resume$sex == "male"] <- "BlackMale" resume$type[resume$race == "white" & resume$sex == "female"] <- "WhiteFemale" resume$type[resume$race == ??? & resume$sex == ???] <- ## turn the character vector into a factor resume$type <-
## fill in the last line of code to create a character vector for the type of ## application that was sent resume$type <- NA resume$type[resume$race == "black" & resume$sex == "female"] <- "BlackFemale" resume$type[resume$race == "black" & resume$sex == "male"] <- "BlackMale" resume$type[resume$race == "white" & resume$sex == "female"] <- "WhiteFemale" resume$type[resume$race == ??? & resume$sex == ???] <- "WhiteMale" ## turn the character vector into a factor resume$type <- as.factor(...)
## fill in the last line of code to create a character vector for the type of ## application that was sent resume$type <- NA resume$type[resume$race == "black" & resume$sex == "female"] <- "BlackFemale" resume$type[resume$race == "black" & resume$sex == "male"] <- "BlackMale" resume$type[resume$race == "white" & resume$sex == "female"] <- "WhiteFemale" resume$type[resume$race == "white" & resume$sex == "male"] <- "WhiteMale" ## turn the character vector into a factor resume$type <- as.factor(resume$type)
grade_code("Fantastic, you got that factor loaded up and ready to go. Now, let's see what you can do with it.")
Imagine that you wanted to calculate the average callback for each level of type
. You could create a subset for each level of type
and then use mean
on each one of those subsets. But that would take 8 lines of code!
A more efficient way to do this task would be to use the tapply(X, INDEX, FUN)
function, which allows you to compute a function (FUN
) on subsets of the data (X
) defined by a factor variable (INDEX
). For instance, suppose we had a grades
data frame that had student exam grades out of 100 in the exam
variable and a factor variable called section
that reported which section they were enrolled in. Then we could calculate the average exam score within sections as:
tapply(grades$exam, grades$section, mean)
table()
function on the type
variable to see how many fictitious applications were sent out with each type of name.resume$carrie <- ifelse(resume$firstname == "Carrie", 1, 0) resume$type <- NA resume$type[resume$race == "black" & resume$sex == "female"] <- "BlackFemale" resume$type[resume$race == "black" & resume$sex == "male"] <- "BlackMale" resume$type[resume$race == "white" & resume$sex == "female"] <- "WhiteFemale" resume$type[resume$race == "white" & resume$sex == "male"] <- "WhiteMale" ## turn the character vector into a factor resume$type <- as.factor(resume$type)
## get the number of observations for each level of the type variable
table(resume$type)
grade_code()
tapply()
function to calculate the mean
of the call
variable in each level of type
.## use the `tapply` function to calculate the mean in each level of type
## use the `tapply` function to calculate the mean in each level of type tapply(resume$call, resume$type, mean)
grade_code("Great work, you have the skills you need to analyze experiments and observational data!")
submission_ui
submission_server()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.