knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "man/figures/README-", out.width = "100%" )
The goal of bqschol is to provide an interface to SUB Göttingen's big scholarly datasets stored on Google Big Query.
This package is of internal use.
You can install the development version from GitHub with:
# install.packages("remotes") remotes::install_github("njahn82/bqschol")
Connect to dataset with Unpaywall snapshots
library(bqschol) my_con <- bqschol::bgschol_con( dataset = "cr_history", path = "~/hoad-private-key.json")
You need to have a service account token to make use of this package!
The package provides wrapper for the most common table operations
bgschol_list()
: List tablesbgschol_tbl()
: Access tables withbgschol_query()
: Perform of a SQL query and retrieve resultsbgschol_execute()
: Execute a SQL query on the databaseLet's start by listing all Crossref snapshots on SUB Göttingen's Big Query project
bgschol_list(my_con)
We can determine the top publisher by type as of April 2018. Note that we only stored Crossref records published later than 2007.
cr_instant_df <- bgschol_tbl(my_con, table = "cr_apr18") cr_instant_df %>% #top publishers dplyr::group_by(publisher) %>% dplyr::summarise(n = dplyr::n_distinct(doi)) %>% dplyr::arrange(desc(n))
For more complex tasks, we use SQL.
cc_query <- c("SELECT publisher, COUNT(DISTINCT(DOI)) AS n FROM `api-project-764811344545.cr_history.cr_apr18`, UNNEST(license) AS license WHERE REGEXP_CONTAINS(license.URL, 'creativecommons') GROUP BY publisher ORDER BY n DESC LIMIT 10") bgschol_query(my_con, cc_query)
bgschol_execute()
is when new tables shall be created or dropped in
Big Query.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.