knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(glitter)
Imagine you are tasked with exploring the Linked Data Service LINDAS provided by the Swiss Federal Archives. You might be able to read examples and docs if present. But in any case, you could get inspired by the Chapter 11 "SPARQL cookbook" of the "Learning SPARQL" book by Bob DuCharme to explore the dataset. Let's go through an example.
Depending on the dataset (or triplestore, in our context) you're working with, some queries might just ask too much of the service so proceed with caution.
When in doubt, add a spq_head()
in your query pipeline, to ask less at a time, or use spq_count()
to get a sense of how many results there are in total.
In the code below we'll ask for 10 triples.
Note that we use the endpoint
argument of spq_init()
to indicate where to send the query, as well as the request_type
argument.
How can one know whether a service needs request_type = "body-form"
?
library("glitter") query_basis = spq_init( endpoint = "https://ld.admin.ch/query", request_control = spq_control_request( request_type = "body-form" ) ) query_basis %>% spq_add("?s ?p ?o") %>% spq_head(n = 10) %>% spq_perform() %>% knitr::kable()
This first query is helpful in that it shows you can do a query! Its results however can be... more or less helpful.
The classes occurring in the database will provide information as to the kind of data you will find there. This can be as varied (across triplestores, or even in a single triplestore) as people, places, buildings, trees, or even things that are more abstract like concepts, philosophical currents, historical periods, etc.
At this point you might think you need to use some prefixes in your query.
If these prefixes are present in glitter::usual_prefixes
, you don't need to do anything.
If they're not, use glitter::spq_prefix()
.
query_basis %>% spq_add("?class a rdfs:Class") %>% spq_head(n = 10) %>% spq_perform() %>% knitr::kable()
How many classes are defined in total? This query might be too big for the service.
nclasses = query_basis %>% spq_add("?class a rdfs:Class") %>% spq_count() %>% spq_perform() nclasses
There are r nclasses$n
classes declared in the triplestore.
Not so many that we could not get them all in one query, but definitely too many to show them all here!
Let us examine a few of these classes:
query_basis %>% spq_add("?class a rdfs:Class") %>% spq_head(n = 10) %>% spq_perform() %>% knitr::kable()
Until now we could still be very in the dark as to what the service provides.
A class might be declared although very few or even no items fall under it. Getting classes which do have instances actually corresponds to a another triple pattern, "?item is an instance of ?class", a.k.a. "?item a ?class":
query_basis %>% spq_add("?instance a ?class") %>% spq_select(- instance) %>% spq_arrange(class) %>% spq_head(n = 10) %>% spq_select(class, .spq_duplicate = "distinct") %>% spq_perform() %>% knitr::kable()
The number of items falling into each class actually gives an even better overview of the contents of a triplestore:
query_basis %>% spq_add("?instance a ?class") %>% spq_select(class, .spq_duplicate = "distinct") %>% spq_count(class, sort = TRUE) %>% # count items falling under class spq_head(20) %>% spq_perform() %>% knitr::kable()
In this case the class names are quite self explanatory but if they were not we could use
query_basis %>% spq_add("?instance a ?class") %>% spq_select(class, .spq_duplicate = "distinct") %>% spq_label(class) %>% # label class to get class_label spq_count(class, class_label, sort = TRUE) %>% # group by class and class_label to count spq_head(20) %>% spq_perform() %>% knitr::kable()
Note that you could instead use spq_add("?property a rdfs:Property")
but in this case it returned nothing.
query_basis %>% spq_add("?property a owl:DatatypeProperty") %>% spq_head(n = 10) %>% spq_perform() %>% knitr::kable()
How many properties are defined in total? This query might be too big for the service.
query_basis %>% spq_add("?property a owl:DatatypeProperty") %>% spq_count() %>% spq_perform()
Similarly to counting instances for classes, we wish to get a sense of the properties that are actually used in the triplestore.
query_basis %>% spq_add("?s ?property ?o") %>% spq_select(- s, - o) %>% spq_select(property, .spq_duplicate = "distinct") %>% spq_head(10) %>% spq_perform() %>% knitr::kable()
query_basis %>% spq_prefix(prefixes = c("schema" = "http://schema.org/"))%>% spq_add("?s schema:addressRegion ?value") %>% spq_count(value, sort = TRUE) %>% spq_head(10) %>% spq_perform() %>% knitr::kable()
One of the properties is https://gont.ch/longName
.
Which class uses it?
query_basis %>% spq_prefix(prefixes = c("gont" = "https://gont.ch/")) %>% spq_add("?s gont:longName ?o") %>% spq_add("?s a ?class") %>% spq_select(-o, -s) %>% spq_select(class, .spq_duplicate = "distinct") %>% spq_head(10) %>% spq_perform() %>% knitr::kable()
The items falling into a given class are likely to be the subject (or object) of a common set of properties. One might wish to explore the properties actually associated to a class.
For instance, in LINDAS, what properties are the schema:Organization class associated to?
query_basis %>% spq_prefix(prefixes = c("schema" = "http://schema.org/")) %>% spq_add("?s a schema:Organization") %>% spq_add("?s ?property ?value") %>% spq_select(-value, -s, .spq_duplicate = "distinct") %>% spq_perform() %>% knitr::kable()
And what about the properties that the schema:PostalAddress class are associated to?
query_basis %>% spq_prefix(prefixes = c("schema" = "http://schema.org/")) %>% spq_add("?s a schema:PostalAddress") %>% spq_add("?s ?property ?value") %>% spq_select(-value, -s, .spq_duplicate = "distinct") %>% spq_perform() %>% knitr::kable()
Let us examine whether there exists in LINDAS some data related to water, through the search of string "hydro" or "Hydro" :
query_basis %>% spq_add("?s ?p ?o") %>% spq_filter(str_detect(o, "[Hh]ydro")) %>% spq_select(-s, .spq_duplicate = "distinct") %>% spq_head(10) %>% spq_perform() %>% knitr::kable()
To wrap it up, let us now use the LINDAS triplestore for an actual data query: we could for instance try and collect all organizations which have "swiss" in their name:
query_basis %>% spq_prefix(prefixes = c("schema" = "http://schema.org/")) %>% spq_add("?s a schema:Organization") %>% spq_add("?s schema:name ?name") %>% spq_filter(str_detect(name, "swiss")) %>% spq_head(10) %>% spq_perform() %>% knitr::kable()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.