The materials required to reproduce our research are included in this repo.
# Quickstart
git clone https://github.com/lupyanlab/programming-questionnaire.git
cd programming-questionnaire/
cp qualtrics.yml.template qualtrics.yml
# edit qualtrics.yml to enter your Qualtrics API token
make # downloads all data and installs it in an R package called "programmingquestionnaire"
install.packages(c("tidyverse", "devtools", "qualtRics", "WikidataR", "RSQLite"))
The survey was created in Qualtrics. In order to obtain the survey data, you must have a Qualtrics account, and have access to the survey. Once you have an account and have access to the survey, you can download the survey data from within R by authenticating your identity with your Qualtrics API key.
After obtaining your Qualtrics API key, place it in a file named "qualtrics.yml" in the project root directory (e.g., "programming-questionnaire/qualtrics.yml"). To create the qualtrics.yml file from a template, copy the file "qualtrics.yml.template" to the expected location. Then edit the file, replacing YOUR_API_TOKEN_HERE with your Qualtrics API key.
cp qualtrics.yml.template qualtrics.yml
# edit qualtrics.yml to replace YOUR_API_TOKEN_HERE with your Qualtrics API token.
To download the survey data from Qualtrics, source the functions in
"R/qualtrics.R". This gives you the two primary functions,
get_qualtrics_responses
and get_qualtrics_questions
. The argument
to these functions is the name of the Qualtrics survey being downloaded.
Note: The R package qualtRics
is required for downloading the data.
source("R/qualtrics.R")
authenticate_qualtrics() # authenticates with qualtrics.yml
qualtrics <- get_qualtrics_responses("programming questionnaire")
questions <- get_qualtrics_questions("programming questionnaire")
Meta-data on programming languages was collected from the Wikidata service.
Note: The R package "WikidataR" is required for downloading language info.
source("R/wikidata.R")
languages <- c("python", "java", "go")
paradigms <- get_programming_paradigms(languages)
The relationships between languages and programming paradigms lends itself nicely to graph-based analysis. To load the languages and their paradigms collected from Wikidata into a graph database, follow the steps below which are required to run the "bin/load-neo4j.R" script.
brew install neo4j # install the Neo4j graph database with homebrew
neo4j start # start the db, open a browser to localhost:7474, and set a password
export NEO4J_PASSWORD=mysecretpassword
bin/load-neo4j.R # load language data in the graph db
The results of the annual StackOverflow Developer Survey can be downloaded from here: insights.stackoverflow.com/survey
After the results have been downloaded, move them into the expected directory:
mv ~/Downloads/developer_survey_2017.zip ./data-raw/stack-overflow-developer-survey-2017.zip
unzip ./data-raw/stack-overflow-developer-survey-2017.zip -d ./data-raw/stack-overflow-developer-survey-2017
The data were processed to yield the following tables. To create all tables and store them in a SQLite DB, run the "make" command. See the Makefile for more targets of the "make" command.
Note: The R package "RSQLite" is required for storing the data in a SQLite DB.
make programming-questionnaire.sqlite # creates "programming-questionnaire.sqlite" with all tables
qualtrics : Raw responses in wide format as if downloaded directly from Qualtrics.
questions : Survey question data as obtained from the Qualtrics API.
responses : Response data in long format.
languages : Programming languages represented in the the sample.
questionnaire : Responses to agreement and free response questions in wide format.
language_paradigms : Information about programming languages taken from Wikipedia.
Examples of how to read tables in from a SQLite DB in both R and python are included below.
# in R
library(dplyr)
con <- DBI::dbConnect(RSQLite::SQLite(), "programming-questionnaire.sqlite")
table_name <- "responses"
responses <- tbl(con, table_name) %>% collect()
# in python3
import sqlite3
import pandas
con = sqlite3.connect("programming-questionnaire.sqlite")
table_name = 'responses'
responses <- pandas.read_sql_query(f'select * from {table_name}', con)
SQLite wrapper functions are stored in "R/sqlite.R".
source("R/sqlite.R")
responses <- collect_table("responses") # expects "programming-questionnaire.sqlite" to exist
The commands for unpacking the sqlite database, compiling the *.rda files, and installing it as an R package are stored in R scripts in the "bin/" directory. First, make sure the required packages are installed.
install.packages(c("tidyverse", "devtools", "RSQLite"))
Then run the commands in these three R scripts.
Rscript bin/unpack-sqlite.R
Rscript bin/compile-rda.R
Rscript bin/install-r-package.R
Now you can load the programmingquestionnaire R package, and view and load specific datasets.
library("programmingquestionnaire")
help(package = "programmingquestionnaire") # view datasets
data("responses") # load "responses" data
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.