This tutorial will show you how to install the R library for working with Data Packages and Table Schema, load a CSV file, infer its schema, and write a Tabular Data Package.
For this tutorial, we will need the Data Package R library (datapackage.r).
devtools library is required to install the datapackage.r
library from github.
# Install devtools package if not already
install.packages("devtools")
And then install the development version of datapackage.r from github.
devtools::install_github("frictionlessdata/datapackage.r")
You can start using the library by loading datapackage.r
.
library(datapackage.r)
You can add useful metadata by adding keys to metadata dict attribute. Below, we are adding the required name
key as well as a human-readable title
key. For the keys supported, please consult the full Data Package spec. Note, we will be creating the required resources
key further down below.
dataPackage = Package.load()
dataPackage$descriptor['name'] = 'period-table'
dataPackage$descriptor['title'] = 'Periodic Table'
# commit the changes to Package class
dataPackage$commit()
## [1] TRUE
We will use periodic-table data from remote path
atomic.number symbol name atomic.mass metal.or.nonmetal. 1 H Hydrogen 1.00794 nonmetal 2 He Helium 4.002602 noble gas 3 Li Lithium 6.941 alkali metal 4 Be Beryllium 9.012182 alkaline earth metal 5 B Boron 10.811 metalloid 6 C Carbon 12.0107 nonmetal 7 N Nitrogen 14.0067 nonmetal 8 O Oxygen 15.9994 nonmetal 9 F Fluorine 18.9984032 halogen 10 Ne Neon 20.1797 noble gasWe can guess at our CSV's schema by using infer
from the Table Schema library. We pass directly the remote link to the infer function, the result of which is an inferred schema. For example, if the processor detects only integers in a given column, it will assign integer
as a column type.
filepath = 'https://raw.githubusercontent.com/okgreece/datapackage-r/master/vignettes/exampledata/data.csv'
schema = tableschema.r::infer(filepath)
Once we have a schema, we are now ready to add a resource
key to the Data Package which points to the resource path and its newly created schema. Below we define resources with three ways, using json text format with usual assignment operator in R list objects and directly using addResource
function of Package
class:
# define resources using json text
resources = helpers.from.json.to.list(
'[{
"name": "data",
"path": "filepath",
"schema": "schema"
}]'
)
resources[[1]]$schema = schema
resources[[1]]$path = filepath
# or define resources using list object
resources = list(list(
name = "data",
path = filepath,
schema = schema
))
And now, add resources to the Data Package:
dataPackage$descriptor[['resources']] = resources
dataPackage$commit()
## [1] TRUE
Or you can directly add resources using addResources
function of Package
class:
resources = list(list(
name = "data",
path = filepath,
schema = schema
))
dataPackage$addResource(resources)
Now we are ready to write our datapackage.json
file to the current working directory.
dataPackage$save('example_data')
The datapackage.json
(download) is inlined below. Note that atomic number has been correctly inferred as an integer
and atomic mass as a number
(float) while every other column is a string
.
jsonlite::prettify(helpers.from.list.to.json(dataPackage$descriptor))
## {
## "profile": "data-package",
## "name": "period-table",
## "title": "Periodic Table",
## "resources": [
## {
## "name": "data",
## "path": "https://raw.githubusercontent.com/okgreece/datapackage-r/master/vignettes/exampledata/data.csv",
## "schema": {
## "fields": [
## {
## "name": "atomic number",
## "type": "integer",
## "format": "default"
## },
## {
## "name": "symbol",
## "type": "string",
## "format": "default"
## },
## {
## "name": "name",
## "type": "string",
## "format": "default"
## },
## {
## "name": "atomic mass",
## "type": "number",
## "format": "default"
## },
## {
## "name": "metal or nonmetal?",
## "type": "string",
## "format": "default"
## }
## ],
## "missingValues": [
## ""
## ]
## },
## "profile": "data-resource",
## "encoding": "utf-8"
## }
## ]
## }
##
Now that you have created your Data Package, you might want to publish your data online so that you can share it with others.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.