class: left, top, inverse background-image: url(img/uglyduckling.jpg)

RDF: Ugly Duckling

options(htmltools.dir.version = FALSE)
knitr::opts_chunk$set(comment = NA)
library(dplyr)
library(tidyr)
library(rdflib)
library(jsonlite)
library(tibble)
options(max.print = 50)

mtcars <- mtcars %>% rownames_to_column("Model")

source(system.file("examples/tidy_schema.R", package="rdflib"))
cat(readLines(system.file("extdata/ex2.xml", package="rdflib")), sep= "\n")

class: center, middle, inverse

The Semantic Web is the Future of the Internet...


class: center, middle, inverse

... and always will be.

-- Peter Norvig,
Director of Research,
Google Inc.


class: center, top, inverse background-image: url(img/steampunk.jpg)

RDF as Steampunk?


class: center, top, inverse background-image: url(img/tetris.jpg)

All Data are Tabular


class: center, top, inverse background-image: url(img/tetris-lose.jpg)

All Data are Definitely Not Tabular


class: center, middle, inverse background-image: url(img/factory-farm.jpg)

Factory Farm Data


class: center, middle, inverse background-image: url(img/organic-farm.png)

Or Handcrafted, Organic Data?


class: center, middle, inverse

Heterogeneous Data is Hard


class: center, top, inverse background-image: url(img/field-notes.jpg)

Heterogeneous Data in Ecology


class: center, top, inverse background-image: url(img/neon.png)

Heterogeneous Data in Ecology


class: center, top, inverse background-image: url(img/integration.png)

Ecological Metadata Language


class: center, middle background-image: url(img/codemeta.png)

CodeMeta


class: center, top, inverse background-image: url(img/no-data-lake.jpg)

The Data Lake


class: center, top, inverse background-image: url(img/data-lake.jpg)

The Data Lake


class: center, middle, inverse

From Schema on Write

To Schema on Read


class: center, middle, inverse

All Data Really is Tabular


class: center, middle, inverse

tidyr::gather() all the things!


class: left, top, inverse

tidyr::gather() all the things!

mtcars %>% 
  rowid_to_column("id") %>% 
  gather(property, value, -id)

class: center, middle, inverse

Atomizing your data


class: left, middle

Row, Column, Cell

knitr::kable(head(mtcars, 20), "html")

class: left, middle, inverse

Object, Property, Value

toJSON(mtcars, pretty = TRUE)

class: left, middle, inverse

Subject, Predicate, Object

rdf_ex <- as_rdf(mtcars, prefix = "mtcars:")
rdf_ex

class: left, top, inverse

Triples


class: center, middle, inverse background-image: url(img/no-data-lake.jpg)

Into the Lake: Data Frames

triplestore <- rdf()

as_rdf(mtcars, triplestore, "mtcars:")
as_rdf(iris, triplestore, "iris:")

class: left, middle, inverse background-image: url(img/no-data-lake.jpg)

Into the Lake: Lists

Example JSON data returned from the GitHub API

github.json <- system.file("extdata/github.json", package="rdflib")
cat(readLines(github.json, n = 20), sep="\n")

class: left, middle, inverse background-image: url(img/no-data-lake.jpg)

Into the Lake: Lists

events <- read_json(github.json)
events <- read_json("https://api.github.com/users/cboettig/events")
as_rdf(events, triplestore, "gh:")

class: left, middle

Schema on read: SPARQL

rdf_query(triplestore,
'SELECT  ?Model ?mpg ?cyl ?disp  ?hp
WHERE {
 ?s <mtcars:Model>  ?Model ;
    <mtcars:mpg>  ?mpg ;
    <mtcars:cyl>  ?cyl ; 
    <mtcars:disp>  ?disp ;
    <mtcars:hp>  ?hp 
}')

class: left, middle

Schema on read: SPARQL

rdf_query(triplestore,
'SELECT  ?Model ?mpg ?cyl ?disp  ?hp
WHERE {
 ?s <mtcars:Model>  ?Model ;
    <mtcars:mpg>  ?mpg ;
    <mtcars:cyl>  ?cyl ; 
    <mtcars:disp>  ?disp ;
    <mtcars:hp>  ?hp 
}')

class: left, middle

Data Rectangling

rdf_query(triplestore, 
'SELECT ?type ?user ?repo ?when
WHERE {
?s <gh:type> ?type ;
   <gh:created_at> ?when ;
   <gh:repo> ?repo_id ;
   <gh:actor> ?actor .
?actor <gh:login> ?user .
?repo_id <gh:name> ?repo
}')

class: left, middle

Data Rectangling

rdf_query(triplestore, 
'SELECT ?type ?user ?repo ?when
WHERE {
?s <gh:type> ?type ;
   <gh:created_at> ?when ;
   <gh:repo> ?r ;
   <gh:actor> ?actor .
?r <gh:name> ?repo .
?actor <gh:login> ?user .
}')

class: left, middle

Data Rectangling: Graph Queries

df <- rdf_query(triplestore, 
'SELECT DISTINCT ?property ?value
WHERE {
?s <gh:url> "https://api.github.com/repos/cboettig/noise-phenomena" .
?parent ?p ?s .
?parent ?property ?value
}')

class: left, middle

Data Rectangling: Graph Queries

df

class: center, middle, inverse

Potential Issues


class: center, middle, inverse

U say URL

I say IRI


class: left, middle, inverse

Internationalized Resource Identifiers


class: center, middle, inverse

Unique variable/column names


class: center, middle, inverse

Data types


class: center, middle, inverse

Subject IRIs


class: center, middle, inverse

Object Types and Resource Nodes


class: center, middle, inverse

Practical Issues


class: center, middle, inverse

Explore & Contribute



ropensci/rdflib documentation built on Jan. 19, 2024, 4:57 a.m.