knitr::opts_chunk$set(echo = TRUE) devtools::load_all() library(magrittr)
An R package to read, write, and edit Data Package data and metadata. Unlike other existing R packages dpmr and datapkg, dpkg can be used to build and document Data Packages entirely within R. Please note that this is a work in progress and function naming and functionality may drift based on feedback from the community.
This package is not on CRAN. To install in R, use devtools:
devtools::install_github("ezwelty/dpkg")
To build a data package, assemble the data and add metadata to the various elements:
data <- data.frame( id = 1L %>% set_field(title = "Identifier"), value = 1.1, added = Sys.Date() ) # Data Resource (list of Fields) dr <- data %>% set_resource( name = "data", path = "data/data.csv" ) # Data Package (list of Resources) dp <- list(dr) %>% set_package( name = "data-package" )
You can preview the package metadata:
get_package(dp) %>% str()
Write the package to file:
dir <- tempdir() write_package(dp, path = dir)
And read the package back in:
read_package(dir)
In dpkg
, the contents of a data package is stored as a list of one or more data resources (each a list) of one or more fields (each typically an atomic vector). For example:
dp <- list( dr = data.frame( id = 1L, value = 1.1, added = Sys.Date() ) )
Package, resource, and field ("data objects") metadata can be set or updated using the set_*
functions (set_package
, set_resource
, set_field
), which come in a <-
flavor:
set_field(dp$dr$id) <- field(title = "Unique identifier", constraints = constraints(unique = TRUE))
and a pipe-friendly flavor:
dp$dr$id %<>% set_field(title = "Identifier", constraints = NULL)
As seen above with the use of field
and constraints
, a suite of helper functions are available to assist in the building of metadata:
package
, resource
, field
schema
, foreignKey
, constraints
, license
, source
, contributor
Data object metadata is stored as attributes. Although in base R attributes are lost in many common operations, this package provides protection from this by making metadata resilient to [
, [[
, subset
, and append
.
To preview a package, metadata can be retrieved from data objects using the get_*
functions (get_package
, get_resource
, get_field
). Missing properties are filled with their default values:
name
: The name of the object in a list (resource).type
: The type corresponding to the object class.character
-> "string"
numeric
-> "number"
integer
-> "integer"
logical
-> "boolean"
Date
-> "date"
POSIXt
-> "datetime"
"string"
format
: The default format for that type.date
-> "%Y-%m-%d"
datetime
-> "%Y-%m-%dT%H-%M-%SZ"
unit
: Units set by units deparsed to product power form.name
: The name of the object in a list (package).schema$fields
: Field metadata from the elements of the object.resources
: Resource metadata from the elements of the object.get_field(dp$dr$id) %>% str() get_resource(dp$dr) %>% str() get_package(dp) %>% str()
write_package
writes package data and metadata to disk using the following rules for each resource:
format
: If missing, checks path
file extension and mediatype
. Only "csv" ("text/csv") and "json" ("application/json") are supported.path
: If not set, the data is saved in the metadata (datapackage.json
) as either an inline JSON object (format:
"json" or missing) or a CSV string (format:
"csv"). For writing, path
must be a single, local, relative path.tmpdir <- file.path(tempdir(), "example")
Resource as an inline JSON object:
set_resource(dp$dr) <- package(format = "json", path = NULL) get_resource(dp$dr)$data write_package(dp, path = tmpdir) list.files(tmpdir)
unlink(tmpdir, recursive = TRUE)
Resource as an inline CSV string:
set_resource(dp$dr) <- package(format = "csv", path = NULL) get_resource(dp$dr)$data write_package(dp, path = tmpdir) list.files(tmpdir)
unlink(tmpdir, recursive = TRUE)
Resource as a JSON file:
set_resource(dp$dr) <- package(format = "json", path = "data/data.json") get_resource(dp$dr)$data write_package(dp, path = tmpdir) list.files(tmpdir, recursive = TRUE)
unlink(tmpdir, recursive = TRUE)
Resource as a CSV file:
set_resource(dp$dr) <- package(format = "csv", path = "data/data.csv") get_resource(dp$dr)$data write_package(dp, path = tmpdir) list.files(tmpdir, recursive = TRUE)
unlink(tmpdir, recursive = TRUE)
read_package
reads package data and metadata into the same structure described above, but unlike write_package
, it supports both local and remote paths. The resources
argument can be used to read a subset of the package's resources (or all if NULL
, the default).
dp <- read_package( "https://raw.githubusercontent.com/columbia-glacier/optical-surveys-1985/master", resources = c("station", "velocity") ) get_package(dp) %>% str() dp$station head(dp$velocity)
read_package_github
accepts a shorthand GitHub repository address.
dp <- read_package_github("columbia-glacier/optical-surveys-1985", "station")
Only types string
, number
, integer
, boolean
, date
, and datetime
are implemented (see table-schema/field-descriptors). Add support for the remaining types:
type =
objecttype =
arraytype =
time (via package hms)type =
year (already supported via type = date
and format = "%Y"
)type =
yearmonth (already supported via type = date
and format = "%Y-%m"
)type =
duration (already supported via type = numeric
and unit
)type =
geopointtype =
geojsonAdditionally:
constraints
propertytype =
string, validate values against format
propertypath
like "data/data.csv.gz" to/from compressed filespath
to a JSON fileAdd the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.