tableschema.r-package: Table Schema Package

tableschema.r-packageR Documentation

Table Schema Package

Description

Table class for working with data and schema

Introduction

Table Schema is a simple language- and implementation-agnostic way to declare a schema for tabular data. It is well suited for handling and validating tabular data in text formats such as CSV, but its utility extends to many applications where a portable schema format is beneficial.

Tabular data

Tabular data consists of rows, where each row contains a consistent set of fields (columns). In CSV or spreadsheet formats, the first row is typically used as the header row. In other data systems like SQL, headers (field names) are explicitly defined.

Physical and logical representation

The physical representation of tabular data refers to its format as stored on disk (e.g., CSV, JSON), where data may or may not carry type information. In contrast, the logical representation reflects how data is structured and typed after parsing, as defined by the schema specification.

For example, constraints should be applied to the logical representation, while missingValues are handled during the parsing of the physical format.

Descriptor

A Table Schema is defined using a descriptor, which MUST be a JSON object (see RFC 4627). The descriptor MUST include a fields property (array of field descriptors) and MAY include additional optional properties.

Field Descriptors

See Field class.

Types and Formats

See Types class.

Constraints

See Constraints class.

Other Properties

Table-level properties include keys, missing value indicators, and metadata.

Missing Values

Missing values may be indicated using empty strings or placeholders such as "-" or "NaN". The missingValues property defines which string values are treated as nulls, and applies during parsing.

missingValues MUST be a list of strings. For example:

  • missingValues = list("")

  • missingValues = list("-")

  • missingValues = list("NaN", "-")

Primary Key

The primaryKey property MAY be:

  • A string (for a single field).

  • A list of strings (for a composite key).

Foreign Keys

The foreignKeys property, if present, MUST be a list of foreign key objects. Each foreignKey object MUST include:

fields

A string or list of strings specifying the source field(s).

reference

An object with:

resource

The referenced resource name. Use "" for self-referencing keys.

fields

The target field(s) on the referenced resource.

Comment: Foreign keys create links between Table Schemas. Typically, schemas are part of a larger Data Package, where resources and schemas are associated.

Details

  • jsonlite is used to convert JSON to R list objects. Inputs can be JSON strings, lists, or files; outputs are R lists.

  • future allows asynchronous creation of Table/Schema objects. Use value() to retrieve results.

See examples in individual functions for using 'jsonlite' and 'future' with tableschema.r.

The term "array" in JSON corresponds to R list objects.

Language

The key words MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL are interpreted as described in RFC 2119.

Author(s)

Maintainer: Kleanthis Koupidis koupidis@okfn.gr

Authors:

Other contributors:

See Also

Table Schema Specifications


frictionlessdata/tableschema-r documentation built on April 13, 2025, 3:51 p.m.