Create an ontology

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(knitr)
library(ontologics)
library(dplyr, warn.conflicts = FALSE)

Any work with an ontology would either start by reading it in from an already existing database, or by creating a new ontology from scratch.

Even though this package is still under development, we do already provide a function that can read in an ontology from an *.rds file (one that is optimized for the usage within R), and can write to any format that is useful for triplestores or the semantic web. This vignette focuses on the basic building blocks for creating a new ontology and you can find more on how to map new concepts from external ontologies, and how to export an ontology so that it's interoperable with the semantic web.

An existing ontology

# read in example ontology
crops <- load_ontology(path = system.file("extdata", "crops.rds", package = "ontologics"))

crops   # ... has a pretty show-method

The onto class is an S3 class with the 3 slots @sources, @classes and @concepts, each of which are reflected by an entry in the show-method. Often the classes in an ontology have a hierarchical order, but this is not obligatory. In any case, the first three levels of the hierarchical structure together with the number of concepts of each level and the description is shown here. Moreover, the five most frequent concepts are shown together with a visual representation of the frequency distribution of all concepts at the first three levels.

The three main slots are represented by a function that allows to add new items to this slot (new_source, new_class and new_concept) and an additional function allows to create mappings between your focal ontology and any external ontology (new_mappings). There is more detailed information about the architecture of the onto-class in the vignette Ontology database description.

New ontology

A new ontology is built by calling the function start_ontology(). This requires a bunch of meta-data that will be stored in the ontology and which serve the purpose of properly linking also this ontology to other linked open data.

lulc <- start_ontology(name = "land_surface_properties",
                       version = "0.0.1",
                       path = tempdir(), 
                       code = ".xx",
                       description = "showcase of the ontologics R-package", 
                       homepage = "https://github.com/luckinet/ontologics", 
                       license = "CC-BY-4.0")

lulc

These information are stored in the @sources slot, just like any other external data source. It is recommended to always set the code for building IDs with a leading symbol that can't be transformed into a numeric/integer, to avoid problems in case the ontology is opened in a spreadsheet program that may automatically do this transformation without asking or informing the author.

kable(lulc@sources)

Next, classes and their hierarchy need to be defined. Each concept is always a combination of a code, a label and a class. The code must be unique for each unique concept, but the label or the class can have the same value for two concepts. For instance, the concept football can have the class game or the class object and then mean two different things, despite having the same label.

# currently it is only possible to set one class at a time
lulc <- new_class(
  new = "landcover", 
  target = NA, 
  description = "A good definition of landcover",
  ontology = lulc)

lulc <- new_class(
    new = "land-use", 
    target = "landcover", 
    description = "A good definition of land use",
    ontology = lulc)

# the class IDs are derived from the code that was previously specified 
kable(lulc@classes$harmonised[, 1:6])

Then, new concepts that have these classes can be defined. In case classes are chosen that are not yet defined, you'll get a warning.

lc <- c(
  "Urban fabric", "Industrial, commercial and transport units",
  "Mine, dump and construction sites", "Artificial, non-agricultural vegetated areas",
  "Temporary cropland", "Permanent cropland", "Heterogeneous agricultural areas",
  "Forests", "Other Wooded Areas", "Shrubland", "Herbaceous associations",
  "Heterogeneous semi-natural areas", "Open spaces with little or no vegetation",
  "Inland wetlands", "Marine wetlands", "Inland waters", "Marine waters"
)

lulc <- new_concept(
  new = lc,
  class = "landcover",
  ontology = lulc
)

kable(lulc@concepts$harmonised[, 1:5])

An ontology is different from a vocabulary in that concepts that are contained in an ontology are related semantically to one another. For example, concepts can be nested into other concepts. Hence, let's create also a second level of concepts that depend on the first level.

lu <- tibble(
  concept = c(
    "Fallow", "Herbaceous crops", "Temporary grazing",
    "Permanent grazing", "Shrub orchards", "Palm plantations",
    "Tree orchards", "Woody plantation", "Protective cover",
    "Agroforestry", "Mosaic of agricultural-uses",
    "Mosaic of agriculture and natural vegetation",
    "Undisturbed Forest", "Naturally Regenerating Forest",
    "Planted Forest", "Temporally Unstocked Forest"
  ),
  broader = c(
    rep(lc[5], 3), rep(lc[6], 6),
    rep(lc[7], 3), rep(lc[8], 4)
  )
)



lulc <- get_concept(label = lu$broader, ontology = lulc) %>% 
  left_join(lu %>% select(label = broader), .) %>% 
  new_concept(
    new = lu$concept,
    broader = .,
    class = "land-use",
    ontology = lulc
  )

kable(lulc@concepts$harmonised[, 1:5])

Here we see that get_concept() was used to extract those broader concepts, into which the new level is nested. This is to ensure that a valid concept is provided, i.e., one that has already been included into the ontology.



Try the ontologics package in your browser

Any scripts or data that you put into this service are public.

ontologics documentation built on May 31, 2023, 6:53 p.m.