knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
Data Resource is a simple format to describe a data resource such as an individual table or file, including its name, format, path, etc.
::: {.callout-info} In this document we use the terms "package" for Data Package, "resource" for Data Resource, "dialect" for Table Dialect, and "schema" for Table Schema. :::
Frictionless supports reading, manipulating and writing resources, but much of its functionality is limited to Tabular Data Resources.
resources()
lists all resources in a package:
library(frictionless) package <- example_package() # List the resources resources(package)
read_resource()
reads data from a tabular resource to a data frame:
read_resource(package, "deployments")
Frictionless does not support reading data from non-tabular resources.
remove_resource()
removes a resource (of any type):
remove_resource(package, "deployments") # This and many other functions return "package", which you can update with # package <- remove_resource(package, "deployments")
add_resource()
adds or replaces a tabular resource. The provided data must be a data frame or a tabular data file (e.g. CSV):
# Add a resource with data from a data frame add_resource(package, "iris", data = iris) # Replace a resource with one where data is stored in a tabular file path <- system.file("extdata", "v1", "deployments.csv", package = "frictionless") add_resource(package, "deployments", data = path, replace = TRUE)
::: {.callout-info}
You can pipe most functions (see vignette("data-package")
).
:::
write_package()
writes a package to disk as a datapackage.json
file. This file includes the metadata of all the resources. write_package()
also writes resource data to CSV files, unless the referred data are referred to be URL or inline. See the function documentation for details.
name
is required. It is used to identify a resource in read_resource()
, add_resource()
and remove_resource()
(always as the second argument):
deployments <- read_resource(package, resource_name = "deployments")
add_resource()
sets name
to the provided resource_name
:
add_resource(package, resource_name = "iris", data = iris)
path
or data
(see further) is required. Providing both is not allowed.
path
is for data in files (e.g. a CSV file). It can be a local path or URL. Supported protocols are http
, https
, ftp
, sftp
and sftp
. Absolute paths (/
) or relative parent paths (../
) are not allowed to avoid security vulnerabilities.
When multiple paths are provided ("path": ["myfile1.csv", "myfile2.csv"]
), the files are expected to have the same structure. read_resource()
merges these into a single data frame in the order the paths are provided (using dplyr::bind_rows()
):
# The "observations" resource has multiple files in path package$resources[[2]]$path # These are combined into a single data frame when reading read_resource(package, "observations")
add_resource()
sets path
to the path(s) provided in data
:
path <- system.file("extdata", "v1", "deployments.csv", package = "frictionless") add_resource(package, "deployments", data = path, replace = TRUE)
::: {.callout-warning}
Support for inline data
is currently limited, e.g. JSON object and string are not supported and schema
, mediatype
and format
are ignored.
:::
data
is for inline data (included in the datapackage.json
). read_resource()
attempts to read data
if it is provided as a JSON array:
# The "media" resource has inline data str(package$resources[[3]]$data) read_resource(package, "media")
add_resource()
adds the provided data frame to data
:
df <- data.frame("col_1" = c(1, 2), "col_2" = c("a", "b")) package <- add_resource(package, "df", df) package$resources[[4]]$data
write_package()
writes that data frame to a CSV file, adds its path to path
and removes data
.
profile
is required to have the value "tabular-data-resource"
. add_resource()
sets profile
to that value.
schema
is required. It is used by read_resource()
to parse data types and missing values. It can either be a JSON object or a path or URL referencing a JSON object. See vignette("table-schema")
for details.
dialect
is used by read_resource()
to parse a tabular data file. It can either be a JSON object or a path or URL referencing a JSON object. See vignette("table-dialect")
for details.
title
is ignored by read_resource()
and not set by add_resource()
, unless provided:
add_resource( package, "iris", iris, title = "Edgar Anderson's Iris Data", replace = TRUE )
description
is ignored by read_resource()
and not set by add_resource()
unless provided (cf. title
).
format
is ignored by read_resource()
. add_resource()
sets format
when data are provided as a file, based on the provided delim
:
delim | format
--- | ---
","
(default) | "csv"
"\t"
| "tsv"
any other value | "csv"
path <- system.file("extdata", "v1", "observations_1.tsv", package = "frictionless") package <- add_resource(package, "observations", data = path, delim = "\t", replace = TRUE) package$resources[[2]]$format
add_resource()
leaves format
undefined when data are provided as a data frame. write_package()
sets it to "csv"
when writing to disk.
mediatype
is ignored by read_resource()
. add_resource()
sets mediatype
when data are provided as a file, based on the provided delim
:
delim | mediatype
--- | ---
","
(default) | "text/csv"
"\t"
| "text/tab-separated-values"
any other value | "text/csv"
path <- system.file("extdata", "v1", "observations_1.tsv", package = "frictionless") package <- add_resource(package, "observations", data = path, delim = "\t", replace = TRUE) package$resources[[2]]$mediatype
add_resource()
leaves mediatype
undefined when data are provided as a data frame. write_package()
sets it to "text/csv"
when writing to disk.
encoding
(e.g. "windows-1252"
) is used by read_resource()
to parse the file. It defaults to UTF-8 if no encoding
is provided or if it cannot be recognized. The returned data frame is always UTF-8.
add_resource()
guesses the encoding
(using readr::guess_encoding()
) when data are provided as file. It leaves the encoding
undefined when data are provided as a data frame. write_package()
sets it to "utf-8"
when writing to disk.
path <- system.file("extdata", "v1", "deployments.csv", package = "frictionless") package <- add_resource(package, "deployments", data = path, delim = ",", replace = TRUE) package$resources[[2]]$encoding
bytes
is ignored by read_resource()
and not set by add_resource()
unless provided (cf. title
).
hash
is ignored by read_resource()
and not set by add_resource()
unless provided (cf. title
).
sources
is ignored by read_resource()
and not set by add_resource()
unless provided (cf. title
).
licenses
is ignored by read_resource()
and not set by add_resource()
unless provided (cf. title
).
compression
(a recipe) is ignored by read_resource()
and not set by add_resource()
.
Compression is derived from the provided path
instead. If the path
ends in .gz
, .bz2
, .xz
, or .zip
, the files are automatically decompressed by read_resource()
(using default readr::read_delim()
functionality). Only .gz
files can be read directly from URL path
s.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.