An R package to read, write, and edit Data Package data and metadata. Unlike other existing R packages dpmr and datapkg, dpkg can be used to build and document Data Packages entirely within R. Please note that this is a work in progress and function naming and functionality may drift based on feedback from the community.
This package is not on CRAN. To install in R, use devtools:
devtools::install_github("ezwelty/dpkg")
To build a data package, assemble the data and add metadata to the various elements:
data <- data.frame(
id = 1L %>% set_field(title = "Identifier"),
value = 1.1,
added = Sys.Date()
)
# Data Resource (list of Fields)
dr <- data %>%
set_resource(
name = "data",
path = "data/data.csv"
)
# Data Package (list of Resources)
dp <- list(dr) %>%
set_package(
name = "data-package"
)
You can preview the package metadata:
get_package(dp) %>% str()
## List of 3
## $ name : chr "data-package"
## $ profile : chr "data-package"
## $ resources:List of 1
## ..$ :List of 4
## .. ..$ name : chr "data"
## .. ..$ path : chr "data/data.csv"
## .. ..$ profile: chr "data-resource"
## .. ..$ schema :List of 1
## .. .. ..$ fields:List of 3
## .. .. .. ..$ :List of 3
## .. .. .. .. ..$ name : chr "id"
## .. .. .. .. ..$ type : chr "integer"
## .. .. .. .. ..$ title: chr "Identifier"
## .. .. .. ..$ :List of 2
## .. .. .. .. ..$ name: chr "value"
## .. .. .. .. ..$ type: chr "number"
## .. .. .. ..$ :List of 3
## .. .. .. .. ..$ name : chr "added"
## .. .. .. .. ..$ type : chr "date"
## .. .. .. .. ..$ format: chr "%Y-%m-%d"
Write the package to file:
dir <- tempdir()
write_package(dp, path = dir)
And read the package back in:
read_package(dir)
## $data
## id value added
## 1 1 1.1 2017-08-24
##
## attr(,"dpkg_package")
## attr(,"dpkg_package")$name
## [1] "data-package"
##
## attr(,"dpkg_package")$profile
## [1] "data-package"
##
## attr(,"class")
## [1] "dpkg" "list"
In dpkg
, the contents of a data package is stored as a list of one or more data resources (each a list) of one or more fields (each typically an atomic vector). For example:
dp <- list(
dr = data.frame(
id = 1L,
value = 1.1,
added = Sys.Date()
)
)
Package, resource, and field ("data objects") metadata can be set or updated using the set_*
functions (set_package
, set_resource
, set_field
), which come in a <-
flavor:
set_field(dp$dr$id) <- field(title = "Unique identifier", constraints = constraints(unique = TRUE))
and a pipe-friendly flavor:
dp$dr$id %<>% set_field(title = "Identifier", constraints = NULL)
As seen above with the use of field
and constraints
, a suite of helper functions are available to assist in the building of metadata:
package
, resource
, field
schema
, foreignKey
, constraints
, license
, source
, contributor
Data object metadata is stored as attributes. Although in base R attributes are lost in many common operations, this package provides protection from this by making metadata resilient to [
, [[
, subset
, and append
.
To preview a package, metadata can be retrieved from data objects using the get_*
functions (get_package
, get_resource
, get_field
). Missing properties are filled with their default values:
name
: The name of the object in a list (resource).type
: The type corresponding to the object class.character
-> "string"
numeric
-> "number"
integer
-> "integer"
logical
-> "boolean"
Date
-> "date"
POSIXt
-> "datetime"
"string"
format
: The default format for that type.date
-> "%Y-%m-%d"
datetime
-> "%Y-%m-%dT%H-%M-%SZ"
unit
: Units set by units deparsed to product power form.name
: The name of the object in a list (package).schema$fields
: Field metadata from the elements of the object.resources
: Resource metadata from the elements of the object.get_field(dp$dr$id) %>% str()
## List of 2
## $ type : chr "integer"
## $ title: chr "Identifier"
get_resource(dp$dr) %>% str()
## List of 3
## $ profile: chr "data-resource"
## $ schema :List of 1
## ..$ fields:List of 3
## .. ..$ :List of 3
## .. .. ..$ name : chr "id"
## .. .. ..$ type : chr "integer"
## .. .. ..$ title: chr "Identifier"
## .. ..$ :List of 2
## .. .. ..$ name: chr "value"
## .. .. ..$ type: chr "number"
## .. ..$ :List of 3
## .. .. ..$ name : chr "added"
## .. .. ..$ type : chr "date"
## .. .. ..$ format: chr "%Y-%m-%d"
## $ data :'data.frame': 1 obs. of 3 variables:
## ..$ id : int 1
## ..$ value: num 1.1
## ..$ added: chr "2017-08-24"
get_package(dp) %>% str()
## List of 2
## $ profile : chr "data-package"
## $ resources:List of 1
## ..$ :List of 4
## .. ..$ name : chr "dr"
## .. ..$ profile: chr "data-resource"
## .. ..$ schema :List of 1
## .. .. ..$ fields:List of 3
## .. .. .. ..$ :List of 3
## .. .. .. .. ..$ name : chr "id"
## .. .. .. .. ..$ type : chr "integer"
## .. .. .. .. ..$ title: chr "Identifier"
## .. .. .. ..$ :List of 2
## .. .. .. .. ..$ name: chr "value"
## .. .. .. .. ..$ type: chr "number"
## .. .. .. ..$ :List of 3
## .. .. .. .. ..$ name : chr "added"
## .. .. .. .. ..$ type : chr "date"
## .. .. .. .. ..$ format: chr "%Y-%m-%d"
## .. ..$ data :'data.frame': 1 obs. of 3 variables:
## .. .. ..$ id : int 1
## .. .. ..$ value: num 1.1
## .. .. ..$ added: chr "2017-08-24"
write_package
writes package data and metadata to disk using the following rules for each resource:
format
: If missing, checks path
file extension and mediatype
. Only "csv" ("text/csv") and "json" ("application/json") are supported.path
: If not set, the data is saved in the metadata (datapackage.json
) as either an inline JSON object (format:
"json" or missing) or a CSV string (format:
"csv"). For writing, path
must be a single, local, relative path.Resource as an inline JSON object:
set_resource(dp$dr) <- package(format = "json", path = NULL)
get_resource(dp$dr)$data
## id value added
## 1 1 1.1 2017-08-24
write_package(dp, path = tmpdir)
list.files(tmpdir)
## [1] "datapackage.json"
Resource as an inline CSV string:
set_resource(dp$dr) <- package(format = "csv", path = NULL)
get_resource(dp$dr)$data
## [1] "id,value,added\n1,1.1,2017-08-24"
write_package(dp, path = tmpdir)
list.files(tmpdir)
## [1] "datapackage.json"
Resource as a JSON file:
set_resource(dp$dr) <- package(format = "json", path = "data/data.json")
get_resource(dp$dr)$data
## NULL
write_package(dp, path = tmpdir)
list.files(tmpdir, recursive = TRUE)
## [1] "data/data.json" "datapackage.json"
Resource as a CSV file:
set_resource(dp$dr) <- package(format = "csv", path = "data/data.csv")
get_resource(dp$dr)$data
## NULL
write_package(dp, path = tmpdir)
list.files(tmpdir, recursive = TRUE)
## [1] "data/data.csv" "datapackage.json"
read_package
reads package data and metadata into the same structure described above, but unlike write_package
, it supports both local and remote paths. The resources
argument can be used to read a subset of the package's resources (or all if NULL
, the default).
dp <- read_package(
"https://raw.githubusercontent.com/columbia-glacier/optical-surveys-1985/master",
resources = c("station", "velocity")
)
get_package(dp) %>% str()
## List of 8
## $ name : chr "optical-surveys-1985"
## $ title : chr "Optical Surveys (1985)"
## $ description : chr "Velocity of three reflectors 1.3, 2.8, and 4.6 km from the terminus and meteorological observations from a station on nearby He"| __truncated__
## $ profile : chr "data-package"
## $ version : chr "0.1.0"
## $ sources :List of 1
## ..$ :List of 2
## .. ..$ title: chr "Original data, scripts, and documentation"
## .. ..$ path : chr "sources/"
## $ contributors:List of 1
## ..$ :List of 3
## .. ..$ title: chr "Ethan Welty"
## .. ..$ email: chr "ethan.welty@gmail.com"
## .. ..$ role : chr "author"
## $ resources :List of 2
## ..$ :List of 5
## .. ..$ name : chr "station"
## .. ..$ path : chr "data/station.csv"
## .. ..$ profile: chr "data-resource"
## .. ..$ title : chr "Station Metadata"
## .. ..$ schema :List of 1
## .. .. ..$ fields:List of 2
## .. .. .. ..$ :List of 4
## .. .. .. .. ..$ name : chr "lat"
## .. .. .. .. ..$ type : chr "number"
## .. .. .. .. ..$ description: chr "Latitude (WGS84, EPSG:4326)."
## .. .. .. .. ..$ unit : chr "°"
## .. .. .. ..$ :List of 4
## .. .. .. .. ..$ name : chr "lng"
## .. .. .. .. ..$ type : chr "number"
## .. .. .. .. ..$ description: chr "Longitude (WGS84, EPSG:4326)."
## .. .. .. .. ..$ unit : chr "°"
## ..$ :List of 5
## .. ..$ name : chr "velocity"
## .. ..$ path : chr "data/velocity.csv"
## .. ..$ profile: chr "data-resource"
## .. ..$ title : chr "Marker Velocity"
## .. ..$ schema :List of 1
## .. .. ..$ fields:List of 4
## .. .. .. ..$ :List of 3
## .. .. .. .. ..$ name : chr "marker"
## .. .. .. .. ..$ type : chr "integer"
## .. .. .. .. ..$ description: chr "Marker identifier (1: 1.3 km, 2: 2.8km, and 3: 4.6 km from the terminus)."
## .. .. .. ..$ :List of 3
## .. .. .. .. ..$ name : chr "sequence"
## .. .. .. .. ..$ type : chr "integer"
## .. .. .. .. ..$ description: chr "Sequence number from figure tracing. Observations are 'continuous' between times of the same sequence."
## .. .. .. ..$ :List of 4
## .. .. .. .. ..$ name : chr "t"
## .. .. .. .. ..$ type : chr "datetime"
## .. .. .. .. ..$ format : chr "%Y-%m-%dT%H:%M:%SZ"
## .. .. .. .. ..$ description: chr "Date and time (UTC)."
## .. .. .. ..$ :List of 4
## .. .. .. .. ..$ name : chr "value"
## .. .. .. .. ..$ type : chr "number"
## .. .. .. .. ..$ description: chr "Velocity"
## .. .. .. .. ..$ unit : chr "m d-1"
dp$station
## lat lng
## 1 60.98891 ° -147.0357 °
head(dp$velocity)
## marker sequence t value
## 1 1 2 1985-08-06 14:43:20 9.289464 m/d
## 2 1 1 1985-08-06 14:45:22 9.218484 m/d
## 3 1 4 1985-08-06 14:51:20 9.511275 m/d
## 4 1 3 1985-08-06 15:07:43 9.440296 m/d
## 5 1 6 1985-08-06 15:27:08 9.635490 m/d
## 6 1 5 1985-08-06 15:28:58 9.571165 m/d
read_package_github
accepts a shorthand GitHub repository address.
dp <- read_package_github("columbia-glacier/optical-surveys-1985", "station")
Only types string
, number
, integer
, boolean
, date
, and datetime
are implemented (see table-schema/field-descriptors). Add support for the remaining types:
type =
objecttype =
arraytype =
time (via package hms)type =
year (already supported via type = date
and format = "%Y"
)type =
yearmonth (already supported via type = date
and format = "%Y-%m"
)type =
duration (already supported via type = numeric
and unit
)type =
geopointtype =
geojsonAdditionally:
constraints
propertytype =
string, validate values against format
propertypath
like "data/data.csv.gz" to/from compressed filespath
to a JSON fileAdd the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.