tableschema.r-package | R Documentation |
Table class for working with data and schema
Table Schema is a simple language- and implementation-agnostic way to declare a schema for tabular data. It is well suited for handling and validating tabular data in text formats such as CSV, but its utility extends to many applications where a portable schema format is beneficial.
Tabular data consists of rows, where each row contains a consistent set of fields (columns). In CSV or spreadsheet formats, the first row is typically used as the header row. In other data systems like SQL, headers (field names) are explicitly defined.
The physical representation of tabular data refers to its format as stored on disk (e.g., CSV, JSON), where data may or may not carry type information. In contrast, the logical representation reflects how data is structured and typed after parsing, as defined by the schema specification.
For example, constraints
should be applied to the logical representation, while missingValues
are handled during the parsing of the physical format.
A Table Schema is defined using a descriptor, which MUST
be a JSON object (see RFC 4627). The descriptor MUST
include a fields
property (array of field descriptors) and MAY
include additional optional properties.
See Field
class.
See Types
class.
See Constraints
class.
Table-level properties include keys, missing value indicators, and metadata.
Missing values may be indicated using empty strings or placeholders such as "-"
or "NaN"
. The missingValues
property defines which string values are treated as nulls, and applies during parsing.
missingValues
MUST
be a list of strings. For example:
missingValues = list("")
missingValues = list("-")
missingValues = list("NaN", "-")
The primaryKey
property MAY
be:
A string (for a single field).
A list of strings (for a composite key).
The foreignKeys
property, if present, MUST
be a list of foreign key objects. Each foreignKey
object MUST
include:
fields
A string or list of strings specifying the source field(s).
reference
An object with:
resource
The referenced resource name. Use ""
for self-referencing keys.
fields
The target field(s) on the referenced resource.
Comment: Foreign keys create links between Table Schemas. Typically, schemas are part of a larger Data Package, where resources and schemas are associated.
jsonlite is used to convert JSON to R list objects. Inputs can be JSON strings, lists, or files; outputs are R lists.
future allows asynchronous creation of Table/Schema objects. Use value()
to retrieve results.
See examples in individual functions for using 'jsonlite' and 'future' with tableschema.r
.
The term "array" in JSON corresponds to R list
objects.
The key words MUST
, MUST NOT
, REQUIRED
, SHALL
, SHALL NOT
, SHOULD
, SHOULD NOT
, RECOMMENDED
, MAY
, and OPTIONAL
are interpreted as described in RFC 2119.
Maintainer: Kleanthis Koupidis koupidis@okfn.gr
Authors:
Lazaros Ioannidis larjohn@gmail.com
Charalampos Bratsas cbratsas@math.auth.gr
Other contributors:
Open Knowledge International info@okfn.org [copyright holder]
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.