In jeremystan/tidyjson: Tidy Complex 'JSON'

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "README-"
)

tidyjson graphs

tidyjson provides tools for turning complex json into tidy data.

Installation

Get the released version from CRAN:

install.packages("tidyjson")

or the development version from github:

devtools::install_github("colearendt/tidyjson")

Examples

The following example takes a character vector of r library(tidyjson);length(worldbank) documents in the worldbank dataset and spreads out all objects.
Every JSON object key gets its own column with types inferred, so long as the key does not represent an array. When recursive=TRUE (the default behavior), spread_all does this recursively for nested objects and creates column names using the sep parameter (i.e. {"a":{"b":1}} with sep='.' would generate a single column: a.b).

library(dplyr)
library(tidyjson)

worldbank %>% spread_all

Some objects in worldbank are arrays, which are not handled by spread_all. This example shows how to quickly summarize the top level structure of a JSON collection

worldbank %>% gather_object %>% json_types %>% count(name, type)

In order to capture the data in the majorsector_percent array, we can use enter_object to enter into that object, gather_array to stack the array and spread_all to capture the object items under the array.

worldbank %>%
  enter_object(majorsector_percent) %>%
  gather_array %>%
  spread_all %>%
  select(-document.id, -array.index)

API

Spreading objects into columns

spread_all() for spreading all object values into new columns, with nested objects having concatenated names
spread_values() for specifying a subset of object values to spread into new columns using the jstring(), jinteger(), jdouble() and jlogical() functions. It is possible to specify multiple parameters to extract data from nested objects (i.e. jstring('a','b')).

Object navigation

enter_object() for entering into an object by name, discarding all other JSON (and rows without the corresponding object name) and allowing further operations on the object value
gather_object() for stacking all object name-value pairs by name, expanding the rows of the tbl_json object accordingly

Array navigation

gather_array() for stacking all array values by index, expanding the rows of the tbl_json object accordingly

JSON inspection

json_types() for identifying JSON data types
json_length() for computing the length of JSON data (can be larger than 1 for objects and arrays)
json_complexity() for computing the length of the unnested JSON, i.e., how many terminal leaves there are in a complex JSON structure
is_json family of functions for testing the type of JSON data

JSON summarization

json_structure() for creating a single fixed column data.frame that recursively structures arbitrary JSON data
json_schema() for representing the schema of complex JSON, unioned across disparate JSON documents, and collapsing arrays to their most complex type representation

Creating tbl_json objects

as.tbl_json() for converting a string or character vector into a tbl_json object, or for converting a data.frame with a JSON column using the json.column argument
tbl_json() for combining a data.frame and associated list derived from JSON data into a tbl_json object
read_json() for reading JSON data from a file

Converting tbl_json objects

as.character.tbl_json for converting the JSON attribute of a tbl_json object back into a JSON character string

Included JSON data

commits: commit data for the dplyr repo from github API
issues: issue data for the dplyr repo from github API
worldbank: world bank funded projects from jsonstudio
companies: startup company data from jsonstudio

Philosophy

The goal is to turn complex JSON data, which is often represented as nested lists, into tidy data frames that can be more easily manipulated.

Work on a single JSON document, or on a collection of related documents
Create pipelines with %>%, producing code that can be read from left to right
Guarantee the structure of the data produced, even if the input JSON structure changes (with the exception of spread_all)
Work with arbitrarily nested arrays or objects
Handle 'ragged' arrays and / or objects (varying lengths by document)
Allow for extraction of data in values or object names
Ensure edge cases are handled correctly (especially empty data)
Integrate seamlessly with dplyr, allowing tbl_json objects to pipe in and out of dplyr verbs where reasonable

Related Work

Tidyjson depends upon

magrritr for the %>% pipe operator
jsonlite for converting JSON strings into nested lists
purrr for list operators
tidyr for unnesting and spreading

Further, there are other R packages that can be used to better understand JSON data

listviewer for viewing JSON data interactively

jeremystan/tidyjson documentation built on Feb. 4, 2023, 6:54 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

jeremystan/tidyjson
Tidy Complex 'JSON'

In jeremystan/tidyjson: Tidy Complex 'JSON'

Installation

Examples

API

Spreading objects into columns

Object navigation

Array navigation

JSON inspection

JSON summarization

Creating tbl_json objects

Converting tbl_json objects

Included JSON data

Philosophy

Related Work

R Package Documentation

Browse R Packages

We want your feedback!

jeremystan/tidyjson Tidy Complex 'JSON'

In jeremystan/tidyjson: Tidy Complex 'JSON'

Installation

Examples

API

Spreading objects into columns

Object navigation

Array navigation

JSON inspection

JSON summarization

Creating tbl_json objects

Converting tbl_json objects

Included JSON data

Philosophy

Related Work

R Package Documentation

Browse R Packages

We want your feedback!

jeremystan/tidyjson
Tidy Complex 'JSON'