This article describes how glitter works, and why. At this stage of glitter history, feedback and feature requests are most welcome!
knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(glitter)
The glitter package helps writing SPARQL queries by implementing an internal domain-specific language: with glitter, you write code that mostly looks like R code, and end up with a SPARQL query. For instance:
library("glitter") query <- spq_init() %>% spq_add("?item wdt:P31 wd:Q13442814") %>% spq_add("?item rdfs:label ?itemTitle") %>% spq_filter(str_detect(str_to_lower(itemTitle), 'wikidata')) %>% spq_filter(lang(itemTitle) == "en") %>% spq_head(n = 5) query
The R code should therefore be easier to write, and read. The function names and syntax are meant to remind of the tidyverse, and of base R.
Code using glitter will feature:
spq_init()
;spq_add()
to add SPARQL triple patterns to the query;spq_set()
to define helper values like spq_set(species = c('wd:Q144','wd:Q146', 'wd:Q780'), mayorcode = "wd:Q30185")
;spq_filter()
, spq_select()
to filter results;spq_arrange()
, spq_head()
, spq_offset()
to order and trim results;spq_perform()
to send the query and return the results.The query object is a list with elements such as the variables (vars), filters, etc. Later we might make it an actual class, maybe an R6 one?
It is built by the different calls to spq_
functions.
The SPARQL query string is assembled by spq_assemble()
.
Later we might add some linting at that stage.
Under the hood, glitter uses
...
arguments into quosures before handling them. Useful references were the Metaprogramming section of the Advanced R book by Hadley Wickham as well as the documentation of the rlang package.More details in the next sections.
spq_add()
works differently from the other spq_
functions because it looks closer to SPARQL.
Clearly something like spq_add(query, "?item wdt:P31 wd:Q13442814")
does not look like R code.
The motivation for this is:
Triple patterns are parsed by decompose_triples
that uses string manipulation.
Now, if one wants to go full DSL, it is possible, via spq_filter()
and spq_mutate()
.
The triple pattern in spq_add(query, "?item wdt:P31 wd:Q13442814")
means finding items that are an instance of ("wdt:P31") of a scholarly article ("wd:Q13442814").
With glitter, you can also write it
spq_init() %>% spq_filter(item == wdt::P31(wd::Q13442814))
This looks more like a normal tidyverse pipeline. Note that the namespacing here is done the R way i.e. wdt::P31
as opposed to "wdt:P31".
Similary,
spq_init() %>% spq_add("wd:Q331676 wdt:P1843 ?statement")
adds a variable that is "wdt:P1843" of Sonchus oleraceus ("wd:Q331676"). It can be written:
spq_init() %>% spq_mutate(statement = wdt::P1843(wd::Q331676))
spq_
functionsThe other spq_
functions spq_arrange()
, spq_select()
, spq_mutate()
, spq_mutate()
, spq_filter()
, spq_summarize()
are the core of the DSL.
They have ...
as arguments where three different things can be passed:
spq_filter(query, lang(itemTitle) == "en")
;spq_filter(query, 'lang(itemTitle) == "en"')
;spq()
(for copy-pasting from SPARQL examples), for instance spq_filter(query, spq('lang(?itemTitle)="en"'))
.The names of their other arguments starts with a dot to prevent name clashes.
How do we differentiate these three things that users can pass?
rlang::enquos(...)
.spq_treat_argument()
. "Like" as the more complex behavior of spq_filter()
and spq_mutate()
, that can accept R-looking snippets that will be translated to either triple patterns or not, warrants a bit more logic.spq_treat_argument()
we try evaluating the argument via rlang::eval_tidy()
. spq
, then it means we can use the string as is, it was SPARQL.'lang(itemTitle) == "en"'
).rlang::expr_text(arg) %>% stringr::str_replace("^~", "")
(example: the user wrote lang(itemTitle) == "en"
).tibble
called all_correspondences
:head(glitter::all_correspondences)
So all instances of n(blabla)
become COUNT(blabla)
.
We also transform argument names.
Look at the "SELECT" statement below, the str_c()
function becomes GROUP_CONCAT()
and its argument SEPARATOR
.
Also note that the argument comes after a colon, not a comma like in R.
spq_init() %>% spq_summarise(authors = str_c(name, sep = ', '))
Later, we need to document these correspondences better, and we need to stress test the DSL with more cases using arguments.
spq_filter()
and spq_mutate()
spq_filter()
receives R-looking fragments that are translated into SPARQL snippets for FILTER... or triple patterns.
spq_mutate()
receives R-looking fragments that are translated into SPARQL snippets for SELECT... or triple patterns.
At the moment the detection of which is which is based on ::
: if the R-looking fragment contains ::
, we assume it will become a triple pattern.
Later, we need to make this more robust as the function spq_set()
makes it easier to create synonyms for any subject/verb/object via SPARQL VALUES.
When we assume spq_filter()
/spq_mutate()
has received an R-looking fragment meant to be translated to a triple pattern, it is parsed so, not forgetting the order is not the same in the two cases:
spq_mutate(object = verb(subject))
;spq_filter(subject == verb(object))
.The examples using something like
spq_init() %>% spq_filter(item == wdt::P31(wd::Q13442814))
got flagged as if wdt were a dependency to be stated. This is understandable. To bypass it these examples are not examples, they are R chunks in a section called "Some examples". This means they aren't checked. Thankfully we have similar code in the real tests!
The issue tracker of glitter is quite representative of future work, as well as all sentences starting with "Later" in this article. As stated at the very beginning of this article, your ideas and comments are welcome.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.