knitr::opts_chunk$set(echo = TRUE)
devtools::load_all()
library(magrittr)

An R package to read, write, and edit Data Package data and metadata. Unlike other existing R packages dpmr and datapkg, dpkg can be used to build and document Data Packages entirely within R. Please note that this is a work in progress and function naming and functionality may drift based on feedback from the community.

This package is not on CRAN. To install in R, use devtools:

devtools::install_github("ezwelty/dpkg")

Quick introduction

To build a data package, assemble the data and add metadata to the various elements:

data <- data.frame(
  id = 1L %>% set_field(title = "Identifier"),
  value = 1.1,
  added = Sys.Date()
)
# Data Resource (list of Fields)
dr <- data %>%
  set_resource(
    name = "data",
    path = "data/data.csv"
  )
# Data Package (list of Resources)
dp <- list(dr) %>%
  set_package(
    name = "data-package"
  )

You can preview the package metadata:

get_package(dp) %>% str()

Write the package to file:

dir <- tempdir()
write_package(dp, path = dir)

And read the package back in:

read_package(dir)

Build a package

In dpkg, the contents of a data package is stored as a list of one or more data resources (each a list) of one or more fields (each typically an atomic vector). For example:

dp <- list(
  dr = data.frame(
    id = 1L,
    value = 1.1,
    added = Sys.Date()
  )
)

Package, resource, and field ("data objects") metadata can be set or updated using the set_* functions (set_package, set_resource, set_field), which come in a <- flavor:

set_field(dp$dr$id) <- field(title = "Unique identifier", constraints = constraints(unique = TRUE))

and a pipe-friendly flavor:

dp$dr$id %<>% set_field(title = "Identifier", constraints = NULL)

As seen above with the use of field and constraints, a suite of helper functions are available to assist in the building of metadata:

Data object metadata is stored as attributes. Although in base R attributes are lost in many common operations, this package provides protection from this by making metadata resilient to [, [[, subset, and append.

Preview a package

To preview a package, metadata can be retrieved from data objects using the get_* functions (get_package, get_resource, get_field). Missing properties are filled with their default values:

get_field(dp$dr$id) %>% str()
get_resource(dp$dr) %>% str()
get_package(dp) %>% str()

Write a package

write_package writes package data and metadata to disk using the following rules for each resource:

tmpdir <- file.path(tempdir(), "example")

Resource as an inline JSON object:

set_resource(dp$dr) <- package(format = "json", path = NULL)
get_resource(dp$dr)$data
write_package(dp, path = tmpdir)
list.files(tmpdir)
unlink(tmpdir, recursive = TRUE)

Resource as an inline CSV string:

set_resource(dp$dr) <- package(format = "csv", path = NULL)
get_resource(dp$dr)$data
write_package(dp, path = tmpdir)
list.files(tmpdir)
unlink(tmpdir, recursive = TRUE)

Resource as a JSON file:

set_resource(dp$dr) <- package(format = "json", path = "data/data.json")
get_resource(dp$dr)$data
write_package(dp, path = tmpdir)
list.files(tmpdir, recursive = TRUE)
unlink(tmpdir, recursive = TRUE)

Resource as a CSV file:

set_resource(dp$dr) <- package(format = "csv", path = "data/data.csv")
get_resource(dp$dr)$data
write_package(dp, path = tmpdir)
list.files(tmpdir, recursive = TRUE)
unlink(tmpdir, recursive = TRUE)

Read a package

read_package reads package data and metadata into the same structure described above, but unlike write_package, it supports both local and remote paths. The resources argument can be used to read a subset of the package's resources (or all if NULL, the default).

dp <- read_package(
  "https://raw.githubusercontent.com/columbia-glacier/optical-surveys-1985/master",
  resources = c("station", "velocity")
)
get_package(dp) %>% str()
dp$station
head(dp$velocity)

read_package_github accepts a shorthand GitHub repository address.

dp <- read_package_github("columbia-glacier/optical-surveys-1985", "station")

TODO

Fields

Only types string, number, integer, boolean, date, and datetime are implemented (see table-schema/field-descriptors). Add support for the remaining types:

Additionally:

Resources & Packages



ezwelty/dpkg documentation built on May 30, 2019, 7:19 a.m.