knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
metricminer
is an R package that helps you mine metrics on common places on the web through the power of their APIs.
It also helps make the data in a format that is easily used for a dashboard or other purposes.
It will have an associated dashboard template and tutorials to help you fully use the data you retrieve with metricminer
(but these are still under development!)
You can read the metricminer package documentation here.
Currently metricminer
supports mining data from:
metricminer
attempts to retrieve API data for you and give you it to you in a format that is a tidy data.frame.
this means metricminer has to be opinionated about what metrics it returns so it fits in a useful and human ready to read data frame.
If you find that the data returned is not what you need you have two options (these options can be pursued concurrently):
dataformat
argument to "raw"
to see the original, unedited JSON formatted data as it was returned from the API. Then you can personally look for the data that you want and extract it.metricminer
.You can install metricminer from CRAN.
install.packages("metricminer")
If you want the development version (not advised) you can install using the remotes
package to install from GitHub.
if (!("remotes" %in% installed.packages())) { install.packages("remotes") } remotes::install_github("fhdsl/metricminer")
library(metricminer)
To start, you need to authorize()
the package to access your data. If you run authorize()
you will be asked which app you'd like to authorize and whether you'd like to cache that auth information. If you already know which app you'd like to authorize, like google
for example, you can run authorize("google")
.
Then follow the instructions on the upcoming screens and select the scopes you feel comfortable sharing (you generally just need read permissions for metricminer to be able to collect data).
authorize()
If you want to clear out authorizations and caches stored by metricminer
you can run:
delete_creds()
You can retrieve metrics from a repository on GitHub doing this:
authorize("github") metrics <- get_github_repo_summary(repo = "fhdsl/metricminer")
authorize("github") metrics <- get_github_repo_timecourse(repo = "fhdsl/metricminer")
You can retrieve calendly events information using this type of workflow:
authorize("calendly") user <- get_calendly_user() events <- list_calendly_events(user = user$resource$uri)
You can retrieve Google Analytics data for websites like this.
First you have to retrieve your account information after you've authorized.
authorize("google") accounts <- get_ga_user()
Then you need to retrieve the properties (aka usually the websites you are tracking) underneath that account.
properties_list <- get_ga_properties(account_id = accounts$id[1])
Just need to shave off the properties/
bit from this string.
property_id <- gsub("properties/", "", properties_list$properties$name[1])
Now we can collect some stats.
In Google Analytics metrics
are your basic numbers (how many visits to your website, etc.).
metrics <- get_ga_stats(property_id, stats_type = "metrics")
Whereas dimensions
are more a list of events that have happened. So here's a list of people that have logged on.
dimensions <- get_ga_stats(property_id, stats_type = "dimensions")
Lastly, we have a third option of collecting link_clicks
and the links they have clicked. This is also known as a dimension according to Google analytics, but often it isn't compatible for us to download link click data at the same time as other dimension data so in metricminer
we collect them separately.
link_clicks <- get_ga_stats(property_id, stats_type = "link_clicks")
You can retrieve Google form information and responses like this:
authorize("google") form_url <- "https://docs.google.com/forms/d/1Z-lMMdUyubUqIvaSXeDu1tlB7_QpNTzOk3kfzjP2Uuo/edit" form_info <- get_google_form(form_url)
If you have used Slido for interactive slide sessions and collected that info and exported it to your googledrive you can use metricminer
to collect that data as well.
drive_id <- "https://drive.google.com/drive/folders/0AJb5Zemj0AAkUk9PVA" slido_data <- get_slido_files(drive_id)
If you have a YouTube channel and the URL is https://www.youtube.com/watch?v=oMVVeZjHJ48
Then you can extract stats for the videos on that YouTube channel using that URL.
authorize("google") youtube_video_stats <- get_youtube_video_stats("oMVVeZjHJ48") youtube_playlist_stats <- get_youtube_playlist_stats("PL9bqxQvtZgAMblZJhg7e0_ThDD-pN4UqA")
Maybe you just want to retrieval it ALL. We have som wrapper functions that will attempt to do this for you. These functions are a bit more precarious/risky in that there may be reasons certain websites/repos/events/data may not be able to be collected. So collecting repositories one by one will allow you more insight into what is happening.
However, these bulk retrieval functions may help you if you want to grab ALL of your accounts data in one swoop. Just make sure to carefully look over and curate that data after it is attempted to be collected. You may find some retrievals are empty for potentially good reasons (for example if a google form has no responses to collect it will show up with "no responses" in the respective part of the list).
From GitHub you can attempt to collect repository metrics from all repositories from an account.
authorize("github") all_repos_metrics <- get_multiple_repos_metrics(owner = "fhdsl")
If you want to do this by giving a list of specific repositories you want data from you can just provide a vector of those repository's names like this:
repo_names <- c("fhdsl/metricminer", "jhudsl/OTTR_Template") some_repos_metrics <- get_multiple_repos_metrics(repo_names = repo_names)
Similar to single website retrieval we need to authorize the package.
authorize("google") accounts <- get_ga_user()
Then we can provide the account id to get_multiple_ga_metrics
and it will attempt to grab all stats for all website properties underneath the provided account.
account_stats_list <- get_multiple_ga_metrics(account_id = 209776907) stats_list <- stats_list <- get_multiple_ga_metrics(property_ids = c(422671031, 422558989))
As always, we need to authorize the app.
authorize("google")
We can retrieve a list of form ids using googledrive
R package.
form_list <- googledrive::drive_find( shared_drive = googledrive::as_id("0AJb5Zemj0AAkUk9PVA"), type = "form")
Now we can provide this vector of form ids to get_multiple_forms
multiple_forms <- get_multiple_forms(form_ids = form_list$id)
If you'd like to authorize non-interactively (whether on GitHub actions or locally) you can set your tokens using Sys.setenv()
You can go here to get an API key. You likely will have to login first.
Then you can store this by putting your API key in this type of command:
Sys.setenv(METRICMINER_CALENDLY = "Put calendly token here")
Now in your script if you run the following, you will have authorization to Calendly.
auth_from_secret("calendly", token = Sys.getenv("METRICMINER_CALENDLY"))
Similar steps can be done for the GitHub personal access token.
First go here to get a GitHub PAT. You will likely have to login first.
Then you can run this command but put your GitHub PAT there.
Sys.setenv(METRICMINER_GITHUB_PAT = "Put GitHub PAT here")
Now in your script if you run the following, you will have authorization to GitHub.
# Authorize GitHub auth_from_secret("github", token = Sys.getenv("METRICMINER_GITHUB_PAT"))
For Google you can authorize from secret by doing the normal interactive way using authorize("google")
but storing the result like this:
token <- authorize("google")
Then you can use this object to extract two secrets by printing them out like this:
token$credentials$access_token
token$credentials$refresh_token
Then you can set these in your environment doing the same steps as before:
Sys.setenv(METRICMINER_GOOGLE_ACCESS = "Google access token here") Sys.setenv(METRICMINER_GOOGLE_REFRESH = "Google refresh token here")
Now in your script if you run the following you will have authorization to Google Apps.
# Authorize Google auth_from_secret("google", refresh_token = Sys.getenv("METRICMINER_GOOGLE_REFRESH"), access_token = Sys.getenv("METRICMINER_GOOGLE_ACCESS"), cache = TRUE )
In GitHub you can run metricminer
using authorization if you use the above steps to retrieve the necessary keys but then store them each as GitHub Secrets.
Read here about how to store GitHub secrets
You'll need the secrets to be stored as the respective key name we've referenced above:
METRICMINER_CALENDLY METRICMINER_GITHUB_PAT METRICMINER_GOOGLE_REFRESH METRICMINER_GOOGLE_ACCESS
Then in your GitHub action yaml you'll need something like this to extract and authorize these secrets in the environment.
- name: Authorize metricminer env: METRICMINER_CALENDLY: ${{ secrets.METRICMINER_CALENDLY }} METRICMINER_GITHUB_PAT: ${{ secrets.METRICMINER_GITHUB_PAT }} METRICMINER_GOOGLE_ACCESS: ${{ secrets.METRICMINER_GOOGLE_ACCESS }} METRICMINER_GOOGLE_REFRESH: ${{ secrets.METRICMINER_GOOGLE_REFRESH }} run: | # Authorize Calendly auth_from_secret("calendly", token = Sys.getenv("METRICMINER_CALENDLY")) # Authorize GitHub auth_from_secret("github", token = Sys.getenv("METRICMINER_GITHUB_PAT")) # Authorize Google auth_from_secret("google", refresh_token = Sys.getenv("METRICMINER_GOOGLE_REFRESH"), access_token = Sys.getenv("METRICMINER_GOOGLE_ACCESS"), cache = TRUE ) ### Now run the R commands you want here or call an R script in a later step. shell: Rscript {0}
sessionInfo()
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.