source(file.path(usethis::proj_get(), "vignettes",  "_common.R"))

Setup

  1. Install github.observatory by using:
install.packages("remotes")
remotes::install_github("harell/github.observatory")
  1. Set AWS credentials in order to get access to the curated data
Sys.setenv(
    AWS_ACCESS_KEY_ID = "<your-access-key-id>",
    AWS_SECRET_ACCESS_KEY= "<your-secret>",
    AWS_REGION = "ap-southeast-2"
)
  1. github.observatory gives easy access to the project database through a Repository object named Ecosystem. To access the database, instantiate a Ecosystem object.
ecos <- Ecosystem$new()
  1. Instantiate a Recommendation Agent. See Agent Functions for details.
agent <- Agent$new(ecos)

Terminology

Entities

message("Among the two options to identify a **User**, `user_id` is preferable as it stays the same throughout the life of GitHub. See the *Mapping Entities* section for how to locate `user_id` with `user_login`.")

Roles

Ecosystem Overview

invisible(
    available_tables <- ls(ecos, sorted = FALSE)
    |> setdiff(c("finalize", "initialize"))
    |> purrr::keep(~stringr::str_detect(.x, "^read_"))
    |> stringr::str_remove("^read_")
)

print_tables <- function() glue::glue_collapse(
  paste0("`", available_tables, "`"), 
  sep = ", ", last = " and "
)

The Ecosystem gives access to r print_tables() tables.

message("See the tables content at the Appendix")

Mapping Entities

user_id to user_login

Finding a user_id with user_login

user_login <- "harell"
(
    user_id <- ecos$read_USER()
    |> dplyr::filter(login %in% user_login)
    |> dplyr::pull(id)
)

package (CRAN package name) to repo_id

cran_package <- "dplyr"
(
    repo_id <- ecos$read_REPO()
    |> dplyr::filter(package %in% cran_package)
    |> dplyr::pull(id)
)

Agent Functions

The Recommendation Agent has five functions:

  1. recommend_repos_to_user Given a user_id suggests n repos the user might like;
  2. recommend_users_to_user Given a user_id suggests n users the user might like;
  3. query_repos_graph Given a repo_id and a method, find all linked packages in degrees degrees of separation;
  4. query_users_graph Given a user_id and a method, find all linked users in degrees degrees of separation; and
  5. query_package_stats Given a CRAN package name, and a particular statistic (a function of the data sample), return the value of the requested attribute.

Recommend Repos to a User

  1. Recommend 5 Repos to a user (see Agent help file for supported methods)
suggested_repos <- agent$recommend_repos_to_user(user_id, n = 5, method = "random")
print(suggested_repos)
  1. Add details to the recommended Repos
(
    suggested_repos
    |> dplyr::left_join(ecos$read_REPO(), by = c("repo_id" = "id"))
)

Recommend Users to a User

  1. Recommend 5 Users to a user (see Agent help file for supported methods)
suggested_users <- agent$recommend_users_to_user(user_id, n = 5, method = "random")
print(suggested_users)
  1. Add details to the recommended Users
(
    suggested_users
    |> dplyr::left_join(ecos$read_USER(), by = c("user_id" = "id"))
)

Query Repos Graph

To query the repo graph, you need to supply three input arguments:

Query package dependencies

  1. Find the dependencies of a package
repo_dep <- agent$query_repos_graph(repo_id, degrees = 1, method = "depends")
print(repo_dep)
  1. Add details to the Package dependencies
(
    repo_dep
    |> dplyr::left_join(ecos$read_REPO(), by = c("to" = "id"))
)

Query package reverse dependencies

  1. Find the reverse dependencies of a package
repo_rev_dep <- agent$query_repos_graph(repo_id, degrees = 1, method = "reverse")
print(repo_rev_dep)
  1. Add details to the Package dependencies
(
    repo_rev_dep
    |> dplyr::left_join(ecos$read_REPO(), by = c("from" = "id"))
)

Query Users Graph

To query the repo graph, you need to supply three input arguments:

Query user followers

  1. Find who is following user_id
user_followers <- agent$query_users_graph(user_id, degrees = 1, method = "followers")
print(user_followers)
  1. Add details to user_id followers
(
    user_followers
    |> dplyr::left_join(ecos$read_USER(), by = c("from" = "id"))
)

Query user following

  1. Find who is user_id following
user_following <- agent$query_users_graph(user_id, degrees = 1, method = "following")
print(user_following)
  1. Add details to those who user_id is following
(
    user_following
    |> dplyr::left_join(ecos$read_USER(), by = c("to" = "id"))
)

Query Package Stats

Monthly Downloads

  1. Find how many times dplyr was downloaded every month
package_downloads <- agent$query_package_stats("dplyr", statistic = "monthly downloads")
package_downloads |> tail(n = 12) |> print()
  1. Plot the monthly downloads
plot(
    package_downloads$date, package_downloads$downloads, 
    main = "Monthly downloads of `dplyr` from RStudio CRAN mirror",
    type = "b", xlab = "Date", ylab = "Monthly Downloads"
)

Appendix

Ecosystem Tables Overview

knitr::include_graphics("https://imgur.com/i6eN9td.png")
for(tbl_name in available_tables){
  cat("\n\n##", tbl_name,"\n")
  invisible(
    ecos[[paste0("read_", tbl_name)]]()
    |> dplyr::glimpse()
  )
}


harell/github.explorer documentation built on Aug. 21, 2022, 8:39 p.m.