README.md

Measuring the Cost and Value of Open Source Software (OSS)

Setup

  1. Get a github personal access token from: https://github.com/settings/tokens
  2. Setup your .Renviron file (to make it easier run: usethis::edit_r_environ(), or edit ~/.Renviron directly)
  3. add a like this to the file: GH_TOSS_TOKEN='YOUR_TOKEN'
    • You can use GH_TOSS_TOKEN='1c06459fc9b515e2a5aa748b06913f3495068a45', but may not work since its not your own token.
  4. add your database password DB_PASSWORD='PASSWORD_IS_PROBABLY_YOUR_PID'
  5. make sure the file ends in an empty new line
  6. restart your R session so the environment variables are picked up

Running the CRAN analysis

The cran pull is taken from the packages listed on: https://cran.r-project.org/web/packages/available_packages_by_name.html

Getting CRAN projects

Rscript ./src/01-data_collection/scrape/CRAN/01-cran_scrape.R

From CRAN to github

Rscript ./src/01-data_collection/scrape/CRAN/02_parse_cran.R
Rscript ./src/01-data_collection/scrape/CRAN/03_missingness.R
Rscript ./src/01-data_collection/scrape/CRAN/04_CI_Checks.R
Rscript ./src/01-data_collection/scrape/CRAN/05_CI_OSI_subsets.R

Script descriptions

Note: some code in this script breaks because we cannot use the “sdalr” package to get a table from the old SDAL database (lines 9 - 12) .

Also, there are two other scripts in the repo right now (chk.R and dependencies.R). I’m pretty certain that Bayoan wrote these scripts either to check my work or to do a little bit of work on CRAN. Either way I don’t think I used/edited anything in those scripts.

#load our list of keys, and identify the set of packages that we have missed
keys <- readRDS('./data/oss/working/CRAN_2018/name_slug_keys.RDS') # from script 12
Analysis <- readRDS('./data/oss/working/CRAN_2018/Analysis.RDS') # from uploads (script 11)
missed <- setdiff(keys$slug, Analysis$slug) #this should be 220 packages
After identifying what we missed we get the information and bind it back to our master analysis table

Running the Python pip analysis

Some of the "original" data sets cannot be located, but they have been saved in the database.

Script descriptions

Running the Javascript CDN analysis

Scripts found under 01-data_collection/scrape/CDN/01-scrape_w_API.R

Script descriptions

Running the code.gov analysis

The Data is collected from the API from the following script (you only want to run this once)

Rscript ./src/01-data_collection/scrape/code_gov/use_api/01-get_repos.R

The data preparation

Rscript ./src/02-data_processing/code_gov/01-add_columns.R

Exploratory reports

Repository domain counts

Looks at the repositoryURL, domains, and licesnses for code.gov.

Rscript -e "rmarkdown::render(here::here('./src/exploratory/code_gov/repository_domains.Rmd'), output_dir = here::here('./output/code_gov'))"
Rscript -e "bad_html <- './src/exploratory/code_gov/repository_domains.html'; if (file.exists(here::here(bad_html))) file.remove(here::here(bad_html))"

Getting github information



team-oss/scrape-cran documentation built on Dec. 23, 2021, 8:42 a.m.