knitr::opts_chunk$set(echo = TRUE) devtools::load_all() library(magrittr)
An R package to read, write, and edit Data Package data and metadata. Unlike other existing R packages dpmr and datapkg, dpkg can be used to build and document Data Packages entirely within R. Please note that this is a work in progress and function naming and functionality may drift based on feedback from the community.
This package is not on CRAN. To install in R, use devtools:
devtools::install_github("ezwelty/dpkg")
To build a data package, assemble the data and add metadata to the various elements:
data <- data.frame( id = 1L %>% set_field(title = "Identifier"), value = 1.1, added = Sys.Date() ) # Data Resource (list of Fields) dr <- data %>% set_resource( name = "data", path = "data/data.csv" ) # Data Package (list of Resources) dp <- list(dr) %>% set_package( name = "data-package" )
You can preview the package metadata:
get_package(dp) %>% str()
Write the package to file:
dir <- tempdir() write_package(dp, path = dir)
And read the package back in:
read_package(dir)
In dpkg, the contents of a data package is stored as a list of one or more data resources (each a list) of one or more fields (each typically an atomic vector). For example:
dp <- list( dr = data.frame( id = 1L, value = 1.1, added = Sys.Date() ) )
Package, resource, and field ("data objects") metadata can be set or updated using the set_* functions (set_package, set_resource, set_field), which come in a <- flavor:
set_field(dp$dr$id) <- field(title = "Unique identifier", constraints = constraints(unique = TRUE))
and a pipe-friendly flavor:
dp$dr$id %<>% set_field(title = "Identifier", constraints = NULL)
As seen above with the use of field and constraints, a suite of helper functions are available to assist in the building of metadata:
package, resource, fieldschema, foreignKey, constraints, license, source, contributorData object metadata is stored as attributes. Although in base R attributes are lost in many common operations, this package provides protection from this by making metadata resilient to [, [[, subset, and append.
To preview a package, metadata can be retrieved from data objects using the get_* functions (get_package, get_resource, get_field). Missing properties are filled with their default values:
name: The name of the object in a list (resource).type: The type corresponding to the object class.character -> "string"numeric -> "number"integer -> "integer"logical -> "boolean"Date -> "date"POSIXt -> "datetime""string"format: The default format for that type.date -> "%Y-%m-%d"datetime -> "%Y-%m-%dT%H-%M-%SZ"unit: Units set by units deparsed to product power form.name: The name of the object in a list (package).schema$fields: Field metadata from the elements of the object.resources: Resource metadata from the elements of the object.get_field(dp$dr$id) %>% str() get_resource(dp$dr) %>% str() get_package(dp) %>% str()
write_package writes package data and metadata to disk using the following rules for each resource:
format: If missing, checks path file extension and mediatype. Only "csv" ("text/csv") and "json" ("application/json") are supported.path: If not set, the data is saved in the metadata (datapackage.json) as either an inline JSON object (format: "json" or missing) or a CSV string (format: "csv"). For writing, path must be a single, local, relative path.tmpdir <- file.path(tempdir(), "example")
Resource as an inline JSON object:
set_resource(dp$dr) <- package(format = "json", path = NULL) get_resource(dp$dr)$data write_package(dp, path = tmpdir) list.files(tmpdir)
unlink(tmpdir, recursive = TRUE)
Resource as an inline CSV string:
set_resource(dp$dr) <- package(format = "csv", path = NULL) get_resource(dp$dr)$data write_package(dp, path = tmpdir) list.files(tmpdir)
unlink(tmpdir, recursive = TRUE)
Resource as a JSON file:
set_resource(dp$dr) <- package(format = "json", path = "data/data.json") get_resource(dp$dr)$data write_package(dp, path = tmpdir) list.files(tmpdir, recursive = TRUE)
unlink(tmpdir, recursive = TRUE)
Resource as a CSV file:
set_resource(dp$dr) <- package(format = "csv", path = "data/data.csv") get_resource(dp$dr)$data write_package(dp, path = tmpdir) list.files(tmpdir, recursive = TRUE)
unlink(tmpdir, recursive = TRUE)
read_package reads package data and metadata into the same structure described above, but unlike write_package, it supports both local and remote paths. The resources argument can be used to read a subset of the package's resources (or all if NULL, the default).
dp <- read_package( "https://raw.githubusercontent.com/columbia-glacier/optical-surveys-1985/master", resources = c("station", "velocity") ) get_package(dp) %>% str() dp$station head(dp$velocity)
read_package_github accepts a shorthand GitHub repository address.
dp <- read_package_github("columbia-glacier/optical-surveys-1985", "station")
Only types string, number, integer, boolean, date, and datetime are implemented (see table-schema/field-descriptors). Add support for the remaining types:
type = objecttype = arraytype = time (via package hms)type = year (already supported via type = date and format = "%Y")type = yearmonth (already supported via type = date and format = "%Y-%m")type = duration (already supported via type = numeric and unit)type = geopointtype = geojsonAdditionally:
constraints propertytype = string, validate values against format propertypath like "data/data.csv.gz" to/from compressed filespath to a JSON fileAdd the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.