knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(dataset)
The defined()
function in the dataset
package allows you to create semantically enriched vectors that retain human-readable metadata — including labels, measurement units, definitions (e.g. URIs), and namespaces.
This vignette demonstrates how to create, manipulate, and interpret defined vectors, and how they integrate seamlessly into data frames and tidy workflows.
defined
ClassThe defined()
constructor enriches a vector by attaching additional attributes that convey semantic meaning. It builds upon the foundation of labelled vectors and introduces three further metadata elements:
A unit of measurement
(e.g. "million dollars")
A concept
, which can be a textual reference or ideally a URI
A namespace
, which enables the construction of meaningful, resolvable identifiers for values or categories
Let’s inspect the metadata attached to a defined vector representing GDP values:
gdp_1 <- defined( c(3897, 7365), label = "Gross Domestic Product", unit = "million dollars", concept = "http://data.europa.eu/83i/aa/GDP" ) cat("The print method:\n") print(gdp_1) cat("And the summary:\n") summary(gdp_1)
When summary()
is called on a defined vector, its label and unit (if available) are displayed above the summary statistics.
The defined()
class extends the attributes of a labelled vector with a unit (of measure), a concept definition and a namespace.
attributes(gdp_1) cat("Get the label only: ") var_label(gdp_1) cat("Get the unit only: ") var_unit(gdp_1) cat("Get the concept definition only: ") var_concept(gdp_1)
What happens if we try to concatenate a semantically under-specified new vector to the GDP vector?
a <- defined(1:3, label = "Length", unit = "metres") b <- defined(4:6, label = "Length", unit = "metres") c(a, b)
gdp_2 <- defined(2034, label = "Gross Domestic Product")
You will get an intended error message that some attributes are not compatible. You certainly want to avoid that you are concatenating figures in euros and dollars, for example.
Attempting to concatenate the under-specified gdp_2
vector with gdp_1
will trigger an error:
c(gdp_1, gdp_2)
Error in `vec_c()`: ! Can't combine `..1` <haven_labelled_defined> and `..2` <haven_labelled_defined>. ✖ Some attributes are incompatible.
This error is intentional — it ensures that values with mismatched or incomplete semantic context (e.g., a different currency unit or an undefined concept) do not silently contaminate the dataset.
We can resolve this by explicitly defining the missing unit and definition for gdp_2 so that it matches gdp_1:
var_unit(gdp_2) <- "million dollars"
var_concept(gdp_2) <- "http://data.europa.eu/83i/aa/GDP"
With matching metadata, concatenation now succeeds:
summary(c(gdp_1, gdp_2))
Namespaces allow defined values — such as country codes — to be expanded into resolvable URIs. This is especially powerful for linked data and machine-readable classification systems.
country <- defined(c("AD", "LI", "SM"), label = "Country name", concept = "http://data.europa.eu/bna/c_6c2bb82d", namespace = "https://www.geonames.org/countries/$1/" )
The namespace attribute allows each value in a vector to become a resolvable URI — useful in linked data and semantic web contexts.
The point of using a namespace is that it can point to a both human- and machine readable definition of the ID column, or any attribute column in the datasets. (Attributes in a statistical datasets are characteristics of the observations or the measured variables.)
The namespace acts as a template: $1 is replaced by the actual value of each element, producing links like: - https://www.geonames.org/countries/AD/ in the case of Andorra, - https://www.geonames.org/countries/LI/ for Lichtenstein, and - https://www.geonames.org/countries/SM/ for San Marino.
In addition, the definition URI — http://publications.europa.eu/resource/authority/bna/c_6c2bb82d — resolves to a machine-readable classification of country names, helping to align datasets with official vocabularies and standards.
Working with character vectors:
countries <- defined( c("AD", "LI"), label = "Country code", namespace = "https://www.geonames.org/countries/$1/" ) countries as_character(countries)
gdp_1[1:2] gdp_1 > 5000 as.vector(gdp_1) as.list(gdp_1)
Coerce back the labelled country vector to a character vector:
as_character(country) as_character(c(gdp_1, gdp_2))
And to numeric:
as_numeric(c(gdp_1, gdp_2))
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.