stack_multi: Unnest multi-answer questions on the Stack Overflow Developer...

Description Usage Arguments Value Examples

Description

Several questions on the Developer Survey allow multiple responses, which appear in the dataset delimited by semi-colons. This function is a shortcut for unnesting those columns into a tidy format: that is, one row per user-answer pair. This can then be joined with the stack_survey dataset, or with other stack_multi columns.

Usage

1
stack_multi(columns = NULL)

Arguments

columns

Columns to unnest. If NULL, will unnest all multi-response columns

Value

A tbl_df with three columns:

respondent_id

An ID to the stack_survey dataset

column

If multiple columns are given,

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
library(dplyr)

tech_multi <- stack_multi("tech_do")
tech_multi

tech_counts <- tech_multi %>%
  count(tech = answer, sort = TRUE)
tech_counts

# look at the typical development environments of data scientists
stack_survey %>%
  filter(occupation == "Data scientist") %>%
  inner_join(stack_multi("dev_environment")) %>%
  count(answer, sort = TRUE)

# find connected technologies and environments
tech_env_pairings <- tech_multi %>%
  select(respondent_id, tech = answer) %>%
  inner_join(stack_multi("dev_environment")) %>%
  count(tech, environment = answer) %>%
  ungroup()

tech_env_pairings %>%
  arrange(-n)

# fractions of tech X that use environment Y
tech_env_pairings %>%
  rename(paired = n) %>%
  inner_join(tech_counts, by = "tech") %>%
  mutate(percent = paired / n) %>%
  arrange(desc(percent))

dgrtwo/stacksurveyr documentation built on May 15, 2019, 8:20 a.m.