knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(glitter)

Imagine you are tasked with exploring the Linked Data Service LINDAS provided by the Swiss Federal Archives. You might be able to read examples and docs if present. But in any case, you could get inspired by the Chapter 11 "SPARQL cookbook" of the "Learning SPARQL" book by Bob DuCharme to explore the dataset. Let's go through an example.

A word of caution

Depending on the dataset (or triplestore, in our context) you're working with, some queries might just ask too much of the service so proceed with caution. When in doubt, add a spq_head() in your query pipeline, to ask less at a time, or use spq_count() to get a sense of how many results there are in total.

Asking for a subset of all triples

In the code below we'll ask for 10 triples. Note that we use the endpoint argument of spq_init() to indicate where to send the query, as well as the request_type argument.

How can one know whether a service needs request_type = "body-form"?

library("glitter") 
query_basis = spq_init(
  endpoint = "https://ld.admin.ch/query",
  request_control = spq_control_request(
    request_type = "body-form"
  )
)
query_basis %>%
  spq_add("?s ?p ?o") %>%
  spq_head(n = 10) %>%
  spq_perform() %>%
  knitr::kable()

This first query is helpful in that it shows you can do a query! Its results however can be... more or less helpful.

Classes

Find which classes are declared

The classes occurring in the database will provide information as to the kind of data you will find there. This can be as varied (across triplestores, or even in a single triplestore) as people, places, buildings, trees, or even things that are more abstract like concepts, philosophical currents, historical periods, etc.

At this point you might think you need to use some prefixes in your query. If these prefixes are present in glitter::usual_prefixes, you don't need to do anything. If they're not, use glitter::spq_prefix().

query_basis %>%
  spq_add("?class a rdfs:Class") %>%
  spq_head(n = 10) %>%
  spq_perform() %>%
  knitr::kable()

How many classes are defined in total? This query might be too big for the service.

nclasses = query_basis %>%
  spq_add("?class a rdfs:Class") %>%
  spq_count() %>%
  spq_perform()

nclasses

There are r nclasses$n classes declared in the triplestore. Not so many that we could not get them all in one query, but definitely too many to show them all here! Let us examine a few of these classes:

query_basis %>%
  spq_add("?class a rdfs:Class") %>%
  spq_head(n = 10) %>%
  spq_perform() %>%
  knitr::kable()

Until now we could still be very in the dark as to what the service provides.

Which classes have instances?

A class might be declared although very few or even no items fall under it. Getting classes which do have instances actually corresponds to a another triple pattern, "?item is an instance of ?class", a.k.a. "?item a ?class":

query_basis %>%
  spq_add("?instance a ?class") %>%
  spq_select(- instance) %>%
  spq_arrange(class) %>%
  spq_head(n = 10) %>%
  spq_select(class, .spq_duplicate = "distinct") %>%
  spq_perform() %>%
  knitr::kable()

Which classes have the most instances?

The number of items falling into each class actually gives an even better overview of the contents of a triplestore:

query_basis %>%
  spq_add("?instance a ?class") %>%
  spq_select(class, .spq_duplicate = "distinct")  %>%
  spq_count(class, sort = TRUE) %>%                        # count items falling under class
  spq_head(20) %>%
  spq_perform() %>%
  knitr::kable()

In this case the class names are quite self explanatory but if they were not we could use

query_basis %>%
  spq_add("?instance a ?class") %>%
  spq_select(class, .spq_duplicate = "distinct")  %>%
  spq_label(class) %>%                                   # label class to get class_label
  spq_count(class, class_label, sort = TRUE) %>%         # group by class and class_label to count
  spq_head(20) %>%
  spq_perform() %>%
  knitr::kable()

Properties

Find which properties are declared

Note that you could instead use spq_add("?property a rdfs:Property") but in this case it returned nothing.

query_basis %>%
  spq_add("?property a owl:DatatypeProperty") %>%
  spq_head(n = 10) %>%
  spq_perform() %>%
  knitr::kable()

How many properties are defined in total? This query might be too big for the service.

query_basis %>%
  spq_add("?property a owl:DatatypeProperty") %>%
  spq_count() %>%
  spq_perform()

What properties are used?

Similarly to counting instances for classes, we wish to get a sense of the properties that are actually used in the triplestore.

query_basis %>%
  spq_add("?s ?property ?o")  %>%
  spq_select(- s, - o) %>%
  spq_select(property, .spq_duplicate = "distinct") %>%
  spq_head(10) %>%
  spq_perform() %>%
  knitr::kable()

What values does a given property have?

query_basis  %>%
  spq_prefix(prefixes = c("schema" = "http://schema.org/"))%>%
  spq_add("?s schema:addressRegion ?value") %>%
  spq_count(value, sort = TRUE) %>%
  spq_head(10) %>%
  spq_perform() %>%
  knitr::kable()

Which class use a particular property?

One of the properties is https://gont.ch/longName. Which class uses it?

query_basis %>%
  spq_prefix(prefixes = c("gont" = "https://gont.ch/")) %>%
  spq_add("?s gont:longName ?o") %>%
  spq_add("?s a ?class") %>%
  spq_select(-o, -s) %>%
  spq_select(class, .spq_duplicate = "distinct") %>%
  spq_head(10) %>%
  spq_perform() %>%
  knitr::kable()

What data is stored about a class's instances?

The items falling into a given class are likely to be the subject (or object) of a common set of properties. One might wish to explore the properties actually associated to a class.

For instance, in LINDAS, what properties are the schema:Organization class associated to?

query_basis %>%
  spq_prefix(prefixes = c("schema" = "http://schema.org/")) %>%
  spq_add("?s a schema:Organization") %>%
  spq_add("?s ?property ?value") %>%
  spq_select(-value, -s, .spq_duplicate = "distinct") %>%
  spq_perform() %>%
  knitr::kable()

And what about the properties that the schema:PostalAddress class are associated to?

query_basis %>%
  spq_prefix(prefixes = c("schema" = "http://schema.org/")) %>%
  spq_add("?s a schema:PostalAddress") %>%
  spq_add("?s ?property ?value") %>%
  spq_select(-value, -s, .spq_duplicate = "distinct") %>%
  spq_perform() %>%
  knitr::kable()

Which data or property name includes a certain substring?

Let us examine whether there exists in LINDAS some data related to water, through the search of string "hydro" or "Hydro" :

query_basis %>%
  spq_add("?s ?p ?o") %>%
  spq_filter(str_detect(o, "[Hh]ydro")) %>%
  spq_select(-s, .spq_duplicate = "distinct") %>%
  spq_head(10) %>%
  spq_perform() %>%
  knitr::kable()

An example query based on what we now know

To wrap it up, let us now use the LINDAS triplestore for an actual data query: we could for instance try and collect all organizations which have "swiss" in their name:

query_basis %>%
  spq_prefix(prefixes = c("schema" = "http://schema.org/")) %>%
  spq_add("?s a schema:Organization") %>%
  spq_add("?s schema:name ?name") %>%
  spq_filter(str_detect(name, "swiss")) %>%
  spq_head(10) %>%
  spq_perform() %>%
  knitr::kable()


lvaudor/glitter documentation built on Jan. 30, 2024, 1:34 a.m.