knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "man/figures/README-", out.width = "100%" )
The goal of bqschol is to provide an interface to SUB Göttingen's big scholarly datasets stored on Google Big Query.
This package is for internal use.
You can install the development version from GitHub with:
# install.packages("remotes") remotes::install_github("njahn82/bqschol")
Connect to dataset with Crossref metadata snapshots
library(bqschol) my_con <- bqschol::bgschol_con( dataset = "cr_history")
Ideally you have a service account token stored as in a json to make use of this package. If not available, your Google account credentials will be requested via the web browser.
The package provides wrapper for the most common table operations
bgschol_list()
: List tablesbgschol_tbl()
: Access tables with https://dplyr.tidyverse.org/bgschol_query()
: Perform of a SQL query and retrieve resultsbgschol_execute()
: Execute a SQL query on the databaseLet's start by listing yearly Crossref historic snapshots.
bgschol_list(my_con)
We can determine the top publisher by type as of April 2018. Note that we only stored Crossref records published later than 2007.
library(dplyr) cr_instant_df <- bgschol_tbl(my_con, table = "cr_apr18") cr_instant_df %>% #top publishers dplyr::group_by(publisher) %>% dplyr::summarise(n = dplyr::n_distinct(doi)) %>% dplyr::arrange(desc(n))
For more complex tasks, we use SQL.
cc_query <- c("SELECT publisher, COUNT(DISTINCT(DOI)) AS n FROM cr_apr18, UNNEST(license) AS license WHERE REGEXP_CONTAINS(license.URL, 'creativecommons') GROUP BY publisher ORDER BY n DESC LIMIT 10") bgschol_query(my_con, cc_query)
bgschol_execute()
is when new tables shall be created or dropped in
Big Query.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.