In mattblackwell/qsslearnr: Learnr tutorials based on Quantitative Social Science

library(gradethis)
library(learnr)
library(qsslearnr)
library(tidyverse)
tutorial_options(exercise.checker = gradethis::grade_learnr)
knitr::opts_chunk$set(echo = FALSE)
tut_reptitle <- "QSS Tidyverse Tutorial 2: Output Report"
data(resume, package = "qss")

Conceptual Questions

question('Which of the following approaches to identifying causal relationships is considered the "gold standard" in many scientific disciplines?',
              answer("Randomized controlled trials"),
              answer("Randomized experiments"),
              answer("Either of these", correct = TRUE)
         )

For ethical and logistical reasons, social scientists are often unable to conduct randomized controlled trials (RCTs). Therefore, they must conduct observational studies in which data on naturally occurring events are collected and analyzed.

question("With observational studies, it is often hard to establish that changes in one variable caused changes in another variable. In other words, observational studies have less __________ compared to RCTS.",
         answer("Internal validity", correct = TRUE),
         answer("External validity"),
         answer("Generalizability"),
         answer("All of these"))

More Logicals in R

Complex relationals

In this exercise, you have the ages of a sample of 15 people, stored in the ages vector. We can use these relational operators to create a logical vector which indicates which ages fall within a specific range. In particular, we can find out which respondents are college-aged (18-22).

Exercise

Create a logical vector, called college.aged, which indicates which observations in ages are greater than or equal to 18 and less than or equal to 22. Be sure to use parentheses to separate out the two logical statements.
Take the sum of the college.aged vector to determine how many people around 18-22 year old there are in the sample.

## check the value of the ages vector
ages <- c(31, 20, 43, 45, 41, 46, 28, 49, 61, 19, 39)

## check the value of the ages vector
ages <- c(31, 20, 43, 45, 41, 46, 28, 49, 61, 19, 39)
## create a logical vector called college.aged
## that is TRUE for someone between 18-22, inclusive
## find the number of college.aged respondents

college.aged <- (ages >= 18) & (ages <= 22)
sum(college.aged)

grade_result_strict(
  pass_if(~ identical(college.aged, (ages >= 18) & (ages <= 22))),
  pass_if(~ identical(.result, sum((ages >= 18) & (ages <= 22))))
)

Subsetting based on logicals

In the last exercise, you used logical statements to create a vector that told us whether each entry in the ages vector is in the 18-22 year-old range. We can now use that information to figure out what the actual ages of the respondents in that range are.

In base R, you can use the brackets for subsetting the data, and apply functions (such as mean to calculate the average) to this subset. For more details, please refer to the base R version of this tutorial.

However, tidyverse functions for subsetting the data, such as filter() or select() cannot be applied to vectors. In this sense, if you would like to subset vectors, base R works better than tidyverse.

Another useful tool for subsetting is the mutate function. mutate function allows you to create a new column in the original data frame. For example, if you want to add to a column to mydata, you should put the original data frame as the first argument, and the second argument would be the formula for creating a new variable.

Exercise

Using the UNpop data set from the first tutorial, let's create a new data frame withe a new column world.pop.mill by dividing the world.pop by 1000.

UNpop <- data.frame(
  year = seq(1950, 2010, by = 10),
  world.pop = c(2525779, 3026003, 3691173, 4449049, 5320817, 6127700, 6916183)
)

UNpop.mill <- UNpop %>%

UNpop.mill <- UNpop %>%
  mutate(world.pop.mill = world.pop / 1000)

grade_code()

Using simple conditional statements

For this exercise, we'll use the resume data once again.

What if we wanted to create a new vector that depends on whether a statement is true or false? For example, suppose you wanted to create an indicator variable for whether or not a specific resume had the name "Carrie." We can create a new variable using the if_else(X, Y, Z) command. This command takes a logical vector as X and returns a new vector of the same length as X that has the value Y if that value in X is TRUE and Z if that value in X is FALSE.

Exercise

Use the if_else and mutate functions to create a new variable called carrie that is 1 if the resume name (firstname) is "Carrie" and 0 otherwise.
Print the first six lines of resume using the head function to see the new variable.

## create a new variable called carrie
## print the first 6 lines of the updated resume
resume <- resume %>%
  mutate(carrie = )

## create a new variable called carrie
## print the first 6 lines of the updated resume
resume <- resume %>%
  mutate(carrie = if_else(???, ???, ???))
???()

## create a new variable called carrie
## print the first 6 lines of the updated resume
resume <- resume %>%
  mutate(carrie = if_else(firstname == "Carrie", 1, 0)) 

head(resume)

grade_this_code()

Factors variables in R

Factor variables

You have seen that creating subsets can be helpful for calculating different quantities or statistics for specific subgroups in the data. When there is more than 1 or 2 subgroups of interest, however, this can be a cumbersome process. For that reason, it's helpful to know about factor variables. Basically, a factor variable is a categorical variable that takes a finite number of distinct values.

Any variable can be turned into a factor by calling the as.factor() function like so:

mydata$myvar <- as.factor(mydata$myvar)

This will take the variable myvar and create a factor variable with levels that are observed in that variable. Most often, you will convert a character variable to a factor.

Exercise

Finish the code below that creates the type character variable. Fill in the last values of race and sex and add the label WhiteMale to this last type.
Convert the type variable to a factor variable using mutate() and as.factor().

resume <-
  resume %>%
  mutate(carrie = if_else(firstname == "Carrie", 1, 0))

## fill in the last line of code to create a character vector for the type of
## application that was sent
## turn the character vector into a factor
resume <- resume %>%
  mutate(type = if_else(race == "black" & sex == "female", "BlackFemale", ""),
         type = if_else(race == "black" & sex == "male", "BlackMale", type),
         type = if_else(race == "white" & sex == "female", "WhiteFemale", type),
         type = if_else(race == ??? & sex == ???, ???, ???)) %>%
  mutate()

## fill in the last line of code to create a character vector for the type of
## application that was sent
## turn the character vector into a factor
resume <- resume %>%
  mutate(type = if_else(race == "black" & sex == "female", "BlackFemale", ""),
         type = if_else(race == "black" & sex == "male", "BlackMale", type),
         type = if_else(race == "white" & sex == "female", "WhiteFemale", type),
         type = if_else(race == ??? & sex == ???, "WhiteMale", type)) %>%
  mutate(type = ???(type))

## fill in the last line of code to create a character vector for the type of
## application that was sent
## turn the character vector into a factor
resume <- resume %>%
  mutate(type = if_else(race == "black" & sex == "female", "BlackFemale", ""),
         type = if_else(race == "black" & sex == "male", "BlackMale", type),
         type = if_else(race == "white" & sex == "female", "WhiteFemale", type),
         type = if_else(race == "white" & sex == "male", "WhiteMale", type)) %>%
  mutate(type = as.factor(type))

grade_code("Fantastic, you got that factor loaded up and ready to go. Now, let's see what you can do with it.")

For creating factor variables, it is sometimes easier to use another useful command, case_when(). This function uses the tilde ~ operator, which assigns the value after the tilde to the new variable type. This is especially useful, when the conditions cannot cover every observation: in that case, you can add TRUE ~ "other" at the end so that the remaining observations will be assigned to the other value.

resume <- resume %>%
  mutate(

  )

resume <- resume %>%
  mutate(type = case_when(race == "black" & sex == "female" ~ "BlackFemale",
                          race == "white" & sex == "female" ~ "WhiteFemale",
                          race == "black" & sex == "male" ~ "BlackMale",
                          race == "white" & sex == "male" ~ "WhiteMale",
                          TRUE ~ "other"
  ))

grade_code()

Using factors

With the tidyverse, we do not often need to use factor variables. Instead, we can use group_by() and other tidyverse functions that we mentioned before, to compute a function on subsets of the data.

Exercise

Use the count() function on the type variable in resume data frame to see how many fictitious applications were sent out with each type of name.

resume <- resume %>%
  mutate(carrie = if_else(firstname == "Carrie", 1, 0)) %>%
  mutate(type = if_else(race == "black" & sex == "female", "BlackFemale", ""),
         type = if_else(race == "black" & sex == "male", "BlackMale", type),
         type = if_else(race == "white" & sex == "female", "WhiteFemale", type),
         type = if_else(race == "white" & sex == "male", "WhiteMale", type)) %>%
  mutate(type = as.factor(type))

## get the number of observations for each level of the type variable
resume %>%

resume %>%
  count(type)

grade_code()

Use the group_by(), select(), and summarize() functions to calculate the mean of the call variable in each level of race and sex in the resume data frame.

## use the `group_by()`, `select()`, and `summarize()` functions to calculate the mean in each level of race and sex
resume %>%

## use the `group_by()`, `select()`, and `summarize()` functions to calculate the mean in each level of race and sex
resume %>%
  group_by(sex, race) %>%
  select(???) %>%
  summarize(callback = ???)

## use the `group_by()`, `select()`, and `summarize()` functions to calculate the mean in each level of race and sex
resume %>%
  group_by(sex, race) %>%
  select(sex, race, call) %>%
  summarize(callback = mean(call))

grade_code("Great work, you have the skills you need to analyze experiments and observational data!")

Submit

submission_ui

submission_server()

mattblackwell/qsslearnr documentation built on Sept. 17, 2022, 6:25 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

mattblackwell/qsslearnr
Learnr tutorials based on Quantitative Social Science

In mattblackwell/qsslearnr: Learnr tutorials based on Quantitative Social Science

Conceptual Questions

More Logicals in R

Complex relationals

Exercise

Subsetting based on logicals

Exercise

Using simple conditional statements

Exercise

Factors variables in R

Factor variables

Exercise

Using factors

Exercise

Submit

R Package Documentation

Browse R Packages

We want your feedback!

mattblackwell/qsslearnr Learnr tutorials based on Quantitative Social Science

In mattblackwell/qsslearnr: Learnr tutorials based on Quantitative Social Science

Conceptual Questions

More Logicals in R

Complex relationals

Exercise

Subsetting based on logicals

Exercise

Using simple conditional statements

Exercise

Factors variables in R

Factor variables

Exercise

Using factors

Exercise

Submit

R Package Documentation

Browse R Packages

We want your feedback!

mattblackwell/qsslearnr
Learnr tutorials based on Quantitative Social Science