Pkg.jl
is the standard package manager for Julia 1.0 and newer. It is a stdlib
of the Julia language. Packages in Julia can be installed from any source that has a valid repository. However, we only consider packages that have been registered. A registered package is one that is discoverable and installable from the official registry (presently METADATA.jl). The Julia ecosystem provides a few tools obtaining relevant information about packages and their status. A few examples include (1) continous integration through Travis C.I., code coverage through Codecov, documentation through Documenter.jl and hosted through Github Pages, and additional continous integration for releases through the PackageEvaluator.jl. In addition, since the majority of repositories are Github repositories additional information such as LICENSE
, contributors (and contributions), and other characteristics can be easily communicated through the platform / interface (e.g., using shields in the README
file). Dependencies in Julia are described in the REQUIRE
file which is a component of any Julia package. One can access this file to parse the dependencies.
A Julia package is a repository which usually lives in Github; only two packages out of 2,032 were hosted in a different platform (one in Gitlab and one in colberg.org). In some cases, repositories have been deleted making the package and metadata lost. An analysis of attrition found that these cases were only 5 out of 2,032.
pacman::p_load(docstring, sdalr, configr, dplyr, DBI, purrr, stringr, data.table, dtplyr, httr, jsonlite)
The registry METADATA.jl
contains all registered packages (Name and Repository). The repository has been cloned (you might need to clone it in your server) to ~/oss/data/oss/original/Julia/METADATA.jl
. It is better to work with a local copy since there are too many files for the Github API to handle and by using version control one can retrieve the exact version used in the language.
owner_name_repo = function(){ #' Obtains the Owner, Name, and Repository for all registered packages. #' #' @description Owner, name, and repositories for all registered packages found #' from `~/oss/data/oss/original/Julia/METADATA.jl/`. To obtain the latest, #' make sure the repository is synched. #' @usage owner_name_repo() #' @return Writes the data to 'id.julia_packages' packages = str_c('~/oss/data/oss/original/Julia/METADATA.jl/', list.files(path = '~/oss/data/oss/original/Julia/METADATA.jl'), '/url') # Collect the URL for all packages excluding the `Rproj` and `README.md` files. packages = packages[!str_detect(string = packages, pattern = '(.Rproj|.md)')] repositories = packages %>% map_chr(.f = function(file) { suppressWarnings(read.table(file = file, as.is = TRUE)) %>% getElement(name = 'V1') }) %>% str_replace(pattern = 'git://', replacement = 'https://') %>% str_replace(pattern = '(?<=.jl).git$', replacement = '') output = repositories %>% map_df(.f = function(repo) { output = repo %>% GET() output = data.table(repository = output$url, available = output$status_code != 404L) return(output) }) %>% mutate(owner = str_extract(string = repository, pattern = '(?<=.(com|org)/).*(?=/)'), name = str_extract(string = repository, pattern = str_extract(string = repository, pattern = str_c('(?<=', owner, '/).*(?=.jl)'))), remote_platform = str_extract(string = repository, pattern = '(?<=://)().*(.com|.org)')) %>% unique() %>% mutate(name = ifelse(test = is.na(x = name), yes = str_extract(string = repository, pattern = str_extract(string = repository, pattern = str_c('(?<=', owner, '/).*(?=$)'))), no = name)) %>% data.table() %>% setcolorder(neworder = c('owner', 'name', 'repository', 'available')) conn = con_db(pass = get_my_password()) dbWriteTable(con = conn, name = 'julia_packages', value = output, row.names = FALSE, overwrite = TRUE) on.exit(dbDisconnect(conn = conn)) } owner_name_repo() conn = con_db(pass = get_my_password()) julia_packages = dbReadTable(conn = conn, name = 'julia_packages') dbDisconnect(conn)
For each package we want to obtain certain variables. One variable of interest is the license, attribution, and year. In order to obtain the license information we used the Github API to obtain the license type. When Github has detected a license type we use the identified license type. When a repository has no license file, we assume the default Berne Convention applicable "All rights reserved" license. For the majority of the projects, there is a license file which has not been identified, in those cases we employed a license classification tool: the Ruby gem Licensee.
In order to assess the quality of our methodology, we downloaded all the license files for Github repositories and saved them locally to inspect and extract additional information. From the license text we extract the attribution (e.g., John Smith, Contributors to this repository) and year (e.g., 2015).
fetch_license_text = function(julia_pkgs) { #' Verifies the Github detected license for the repository and the text. #' #' @description Uses the Github API to get the detected license and source. #' @param `julia_pkgs` a data.frame with the owner and name (without .jl). #' @usage owner_name_repo(julia_pkgs) #' @return Writes the data to 'id.prj_licenses' # Helper helper = function(owner, name) { output = data.table(name = name, license = NA) license_file = str_c('https://api.github.com/repos/', owner, '/', name, '.jl/license') %>% # This is my Personal Token for this project. GET(add_headers(Authorization = 'token d77961efcd1dc0ae2b9ebb2fe6c9349e1a9c3da0')) %>% content(as = 'text', encoding = 'UTF-8') %>% fromJSON() if (all(names(license_file) %in% c('message', 'documentation_url'))) { if (license_file$message %in% 'Not Found') { # Berne Convention standard terms return(value = data.table(name = name, license_type = 'BC', license_text = 'All rights reserved')) } } license_text = license_file$download_url %>% GET() %>% content(as = 'text', encoding = 'UTF-8') output = data.table(name = name, license_type = ifelse(test = is.null(license_file$license$spdx_id), yes = NA, no = license_file$license$spdx_id), license_text = license_text) return(value = output) } output = pmap_dfr(.l = julia_pkgs %>% filter(available & (remote_platform %in% 'github.com')) %>% select(owner, name), .f = helper) conn = con_db(pass = get_my_password()) dbWriteTable(conn = conn, name = 'prj_licenses', value = output, row.names = FALSE, overwrite = TRUE) on.exit(dbDisconnect(conn = conn)) } fetch_license_text(julia_pkgs = julia_pkgs)
Additional information gathered for the repository is a description (from the README file), tags (from Github repository), package status (from shields, PackageEvaluator, and Travis C.I.).
basic_information = function() { #' Package name, description, License, and repository. #' @description Obtains the basic information for Julia packages. #' @usage basic_information() #' @return data.table with basic information. parse_license_owner = function(pkgvertest) { #' Parse License and owner. #' #' @description Parses the License and ownder if package availble for #' supported release. It returns a data.table with NA for the values when no #' version for current release. #' #' @param pkgvertest character. A pkgvertest text #' @usage parse_License_owner(pkgvertest) #' @return data.table with License and owner. obj = str_split(string = pkgvertest, pattern = '\n')[[1]] if (any(str_detect(string = obj, pattern = 'Julia v0.6'))) { license = str_sub(obj[1], end = -2) owner = str_extract(string = obj[2], pattern = '(?<=Owner: ).*') output = data.table(License = license, Owner = owner) } else { output = data.table(License = NA, Owner = NA) } return(output) } response = read_html('https://pkg.julialang.org/') name = response %>% html_nodes('.pkgnamedesc a') %>% html_text() description = response %>% html_nodes('h4') %>% html_text() repository = response %>% html_nodes('.pkgnamedesc a') %>% html_attr('href') %>% str_replace(pattern = 'http://github.com/', replacement = 'https://github.com/') license_owner = response %>% html_nodes('.pkgvertest') %>% html_text() %>% str_trim() output = response %>% html_nodes('.pkgvertest') %>% html_text() %>% str_trim() %>% map_df(.f = parse_license_owner) %>% cbind(name, description, repository) %>% data.table(key = 'name') %>% setnames(old = 'name', new = 'Name') %>% setnames(old = 'description', new = 'Description') %>% setnames(old = 'repository', new = 'Repository') %>% setcolorder(neworder = c('Name','License','Description','Owner','Repository')) return(output) } basic_information = basic_information()
Depreciated packages are those identified by any of the three conditions:
Has been migrated to JuliaArchive
It is explicitly stated in the definition
It redirects to a different package
basic_information = basic_information %>% mutate(Package_Status = 'Maintained') %>% mutate(Package_Status = ifelse(test = (Owner == 'JuliaArchive') | str_detect(string = Description, pattern = '(?i)(deprecated)'), yes = 'Deprecated', no = Package_Status))
In order to exclude pre-production packages, depreciated, or abandoned packages, I use the following criteria:
It passes all tests for the current released version (Julia 0.6) based on PackageEvaluator
At some point it has worked with Julia 0.6 (some release has work with some version of dependencies)
Out of the last 25 builds using Travis C.I., some job has been successful with Julia 0.6 (some branch works even if it has not been released)
If a package meets any of the three criterias it is assumed to be production ready and maintained.
pkg_eval = function(name) { #' Works with current version. #' #' @description Passed PackageEvaluator with flying colors in current release. #' @usage pkg_eval(name) #' @return logit indicator str_c('https://pkg.julialang.org/logs/', name, '_0.6.log') %>% GET() %>% content(as = 'text') %>% str_detect(pattern = str_c(name, ' tests passed')) } some_release = function(name) { #' Has the package ever worked with any currently supported versions? #' #' @description Verifies if any release passed its tests for Julia 0.6. #' @usage some_release(name) #' @return logit indicator str_c('https://pkg.julialang.org/detail/', name) %>% GET() %>% content(as = 'text') %>% str_remove_all(pattern = '\n') %>% str_detect(pattern = '(?<=<h4>Julia v0.6</h4>\\s{4}<pre>).*(?=</pre>\\s{8}<h4>)') } stable_last = function(repo) { #' Does the last working build job (within the last 25 jobs) include 0.6? #' #' @description Has it passed with Julia 0.6. #' @usage stable_last(name) #' @return logit indicator repo = str_extract(string = repo, pattern = '(?<=.com/).*.jl?') if (is.na(x = repo)) { return(FALSE) } builds = GET(url = str_c('https://api.travis-ci.org/repos/', repo, '/builds'), add_headers(c(Accept = 'application/json', Authorization = 'token DRt1TjjDmPG4wX8bq0YqVg'))) %>% content(as = 'text', encoding = 'UTF-8') %>% fromJSON() if (is_empty(x = builds)) { return(FALSE) } builds = builds %>% filter(result == 0L) if (is_empty(x = builds)) { return(FALSE) } builds = builds %>% head(1L) %>% getElement(name = 'id') output = '0.6' %in% (GET(url = str_c('https://api.travis-ci.org/builds/', builds), add_headers(c(Accept = 'application/json', Authorization = 'token DRt1TjjDmPG4wX8bq0YqVg'))) %>% content(as = 'text', encoding = 'UTF-8') %>% fromJSON() %>% getElement(name = 'config') %>% getElement(name = 'julia')) return(output) }
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.