This vignette explains how to use the package RDF4R by means of examples. In order to fully utilize the package capabilities, one needs to have access to an RDF graph database. We have made available a public endpoint to allow the users of the package to experiment. Since write access is enabled, please be considerate and don't issue catastrophic commands.
library(rdf4r) graphdb = rdf4r::basic_triplestore_access( server_url = "http://graph.openbiodiv.net:7777", user = "dbuser", password = "public-access", repository = "obkms_i6" ) graphdb get_protocol_version(graphdb) list_repositories(graphdb)
The above has created an object that stores the information needed to access the database. You need to supply it to the access functions. All examples in this vignette use this access.
The purpose of this example is to convert a simple SPARQL lookup query to an R function. The publicly accessible endpoint happens to store some biological information but for the purposes of this example knowledge of biological taxonomy or ontology of the store is irrelevant. You only need to know that the database stores information about biological papers that contain references to biological names. We are looking for papers that mention the biological name Drosophila, which a genus of flies. The important thing to note here is the parametrization of the SPARQL query.
p_query = "PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX dwc: <http://rs.tdwg.org/dwc/terms/> PREFIX openbiodiv: <http://openbiodiv.net/> PREFIX dwciri: <http://rs.tdwg.org/dwc/iri/> PREFIX pkm: <http://proton.semanticweb.org/protonkm#> PREFIX fabio: <http://purl.org/spar/fabio/> PREFIX po: <http://www.essepuntato.it/2008/12/pattern#> PREFIX dc: <http://purl.org/dc/elements/1.1/> SELECT (SAMPLE(?name) AS ?name) (SAMPLE(?genus) AS ?genus) (SAMPLE(?title) AS ?title) WHERE { ?name rdfs:label %label ; rdf:type openbiodiv:LatinName ; dwc:genus ?genus. ?article rdf:type fabio:JournalArticle ; po:contains/pkm:mentions ?name ; dc:title ?title . } GROUP BY ?article "
Note that this almost valid SPARQL with one exception - the %label
string on the first line of the WHERE clause. You parameterize SPARQL queries by specifying a %
in front of the tokens that you would like to become the arguments of your R function. Let's now construct a function that looks up a genus (a biological rank) by a string that you supply.
genus_lookup = rdf4r::query_factory(p_query = p_query, access_options = graphdb)
query_factory
takes two arguments: the parameterized query and an object with access options for an endpoint and returns an R function whose arguments are the parameters of the parameterized SPARQL query and which executes the SPARQL query against the endpoint specified in access_options
and returns formatted results as a dataframe.
genus_lookup("\"Drosophila\"")
Note that we have enclosed the string "\"Drosophila\"" in escaped quotes as only that would make the replacement of the parameter in the parameterized SPARQL query a valid SPARQL. Had the parameter been a resource identiefier, we would not have needed the quotes. In a later example, we will show how we can get around this hassle by utilzing the built-in classes for literals and resource identifiers.
Excercise: Try experimenting with genus_lookup
by looking up information about some other genera (Eupolybothrus - a millepede, Myotis - a bat).
We want to model to model Table 3-1 from Semantic Web for the Working Ontologist. To make things more interesting we will use the prefix http://rdflib-rdf4r.net/ for all instances that we create and a dummy ontology (not actually defined) with the prefix http://art-ontology.net/ to reify the example classes and properties.
Literals
lking_lear = literal(text_value = "King Lear", lang = "en") las_you_like_it = literal(text_value = "As You Like It", lang = "en") lhamlet = literal(text_value = "Hamlet", lang = "en") lothello = literal(text_value = "Othello", lang = "en") lsonnet_78 = literal(text_value = "Sonnet 78", lang = "en") lastrophil = literal(text_value = "Astrophil and Stella", lang = "en") ledward2 = literal(text_value = "Edward II", lang = "en") lhero = literal(text_value = "Hero and Leander", lang = "en") lgreensleeves = literal(text_value = "Greensleeves", lang = "en") lshakespeare = literal(text_value = "Shakespeare") lsir_phillip_sidney = literal(text_value = "Sir Phillip Sidney") lchristopher_marlowe = literal(text_value = "Christopher Marlowe") lhenry_8_rex = literal(text_value = "Henry VII Rex") l1599 = literal(text_value = "1599", xsd_type = xsd_integer) l1603 = literal(text_value = "1603", xsd_type = xsd_integer) l1609 = literal(text_value = "1609", xsd_type = xsd_integer) l1590 = literal(text_value = "1590", xsd_type = xsd_integer) l1592 = literal(text_value = "1592", xsd_type = xsd_integer) l1593 = literal(text_value = "1593", xsd_type = xsd_integer) l1525 = literal(text_value = "1525", xsd_type = xsd_integer)
Here, we repeatedly called the literal
function, which is a constructor of objects of class literal
, with different arguments. literal
can construct phrases in English (or any other language) with the lang
argument. It can construct pure strings (by omitting the lang argument) of type xsd:string
. It can also construct literals of a number of defined semantic types by using the xsd_type
argument. In order to see which types are available execute ?semantic_elements
. Note that the XSD types are implemented as resource identifiers (class identifier
), which allows you to implement additional types that are not provided. Behind the scenes the URL of the resource identifier will be appended to the representation (?represent
) of the literal as text through pasting of "^^" and then the URL.
In other words, for the work titles, we use the argument lang="en"
telling the literal constructor that the literal value is in English, whereas for the names, we omit this argument. As per semantic web conventions, when the argument is omitted, and no type is explicitly specified, it is assumed that the literal is a string (xsd:string
). For the literals containing years, on the other hand, we explicitly specify an integer type; otherwise they would have parsed as strings as well. All of this can be seen by inspecting the individual lists (objects of class literal
are lists):
lhamlet str(lhamlet) represent(lhamlet) lshakespeare str(lshakespeare) represent(lshakespeare) l1603 str(l1603) represent(l1603)
Identifiers
We need resource identifiers for our resources, i.e. playwrights, works of art, as well for the classes of which those resources are instances of. To make things simpler, we use a fictional ontology with the prefix http://art-ontology.net/. Let's hardcode identifiers for the ontology classes:
prefixes = c( rdfs = "http://www.w3.org/2000/01/rdf-schema#", rdf = "http://www.w3.org/1999/02/22-rdf-syntax-ns#", example = "http://rdflib-rdf4r.net/", art = "http://art-ontology.net/" ) eg = prefixes[3] art = prefixes[4] artist = identifier(id = "Artist", prefix = art) play = identifier(id = "Play", prefix = art) poem = identifier(id = "Poem", prefix = art) song = identifier(id = "Song", prefix = art) wrote = identifier(id = "wrote", prefix = art) has_year = identifier(id = "has_year", prefix = art)
Let's inspect one class and one property:
artist str(artist) represent(artist) wrote str(wrote) represent(wrote)
Note that each identifier
object is a list where the field $uri
gives the URI of the resource and the field qname
gives the shortened name (QNAME) with respect to the prefix stored in $prefix
. Also note that both literal
and identifier
are representable, i.e. we have defined a the generic represent
on both of the classes that outputs a proper string representation of the literal or resource identifier that can be used in a serialization.
We also need resource identifiers for our entitites such as Shakespaere, Christopher Malrowe, etc. Semantic Web best practices discourage the liberal minting of identifiers for resources for which somebody has already minted an identifier. Instead, we want to look them up in a database, and only mint if they are not found. For this, RDF4R offers factory functions to create lookup/mint functions.
p_query = "SELECT DISTINCT ?id WHERE { ?id rdfs:label %label }" simple_lookup = query_factory(p_query, access_options = graphdb) lookup_or_mint_id = identifier_factory(fun = list(simple_lookup), prefixes = prefixes, def_prefix = eg) idking_lear = lookup_or_mint_id(list(lking_lear)) idas_you_like_it = lookup_or_mint_id(list(las_you_like_it)) idhamlet = lookup_or_mint_id(list(lhamlet)) idothello = lookup_or_mint_id(list(lothello)) idsonnet78 = lookup_or_mint_id(list(lsonnet_78)) idastrophil = lookup_or_mint_id(list(lastrophil)) idedward2 = lookup_or_mint_id(list(ledward2)) idhero = lookup_or_mint_id(list(lhero)) idgreensleeves = lookup_or_mint_id(list(lgreensleeves)) idshakespeare = lookup_or_mint_id(list(lshakespeare)) idsir_phillip_sidney = lookup_or_mint_id(list(lsir_phillip_sidney)) idchristopher_marlowe = lookup_or_mint_id(list(lchristopher_marlowe)) idlhenry_8_rex = lookup_or_mint_id(list(lhenry_8_rex))
identifer_factory
's first argument, fun
is a list of (lookup) functions that will be tried. identifier_factory
returns an identifier constructor function, in our case we named it lookup_or_mind_id
. The lookup functions need to return a single column (labeled e.g. ?id
). They will be tried in order and if any of them returns a unique solution, it will be returned by the constructor function to create an identifier object. If none of the lookup functions returns a unique solution, a new identifier will be minted.
Let's inspect
idshakespeare
Creating RDF
Now to create the RDF representation:
classics_rdf = ResourceDescriptionFramework$new() classics_rdf$add_triple(subject = idshakespeare, predicate = wrote, object = idking_lear) classics_rdf$add_triple(subject = idking_lear, predicate = rdfs_label, object = lking_lear) classics_rdf$add_triple(subject = idshakespeare, predicate = wrote, object = idas_you_like_it) classics_rdf$add_triple(subject = idas_you_like_it, predicate = rdfs_label, object = las_you_like_it) classics_rdf$add_triple(subject = idas_you_like_it, predicate = has_year, object = l1599) classics_rdf$add_triple(subject = idas_you_like_it, predicate = rdf_type, object = play)
The easiest way to inspect the ResourceDescriptionFramework
object is to actually serialize it. Before we serialize, however, we need to specify the subgraph where the triples should be stored with $set_context(id)
. We will reuse the example for that.
classics_rdf$set_context(identifier(id = "classic_example", prefix = eg)) cat(classics_rdf$serialize())
Submitting RDF to the triple store
Now that we have created some RDF (classics_rdf
), we are ready to submit it to the endpoint (graphdb
). We can either submit it directly via add_data
, or we can use the add_data_factory
to create a submitter function.
# via add_data add_data(classics_rdf$serialize(), access_options = graphdb) simple_lookup(represent(lking_lear)) simple_lookup(represent(lking_lear)) p_query_describe = "PREFIX example: <http://rdflib-rdf4r.net/> SELECT ?p ?o WHERE { %resource ?p ?o . }" describe = query_factory(p_query = p_query_describe, access_options = graphdb) describe(represent(idshakespeare)) describe(represent(idas_you_like_it))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.