library(rmarkdown)

Overview

The vizlab package was created to facilitate the rapid assembly of web-ready visualizations. It provides a framework for common steps that go into most/all visualizations, namely fetching, processing, and visualizing data, and finally publishing it in a format such as HTML. Other features such as Google Analytics are integrated as well. The package itself consists of R functions and scripts that carry out two distinct tasks. First they create a series of make files, which in turn execute other pieces of R code that carry out the fetch, process, visualize, and publish steps. Code for each step can be supplied by the user, and seamlessly integrated into the visualization through a central YAML control file (viz.yaml). There are default methods for some common variants of these steps included as well, such as for retrieving data from USGS ScienceBase. Ultimately, this package aims to eliminate the technical tasks common to the construction of most vizualizations, and allow time to be focused on content creation and other creative tasks.

This vignette will outline all the steps necessary to use the package, utilizing an example visualization that can be found on github at https://github.com/USGS-VIZLAB/example.

Installation

Vizlab can be downloaded from github via devtools:

install.packages('devtools')
devtools::install_github('USGS-VIZLAB/vizlab')

The example visualization ('viz') is located at https://github.com/USGS-VIZLAB/example and should be cloned into a separate directory.

To build the example viz, your working directory should be set to the main example directory, and the vizlab package should be loaded.

Getting Started

Viz design: the viz.yaml

The viz.yaml file manages all the different components of the viz, and is used to generate the make files that build the final published product. viz.yaml contains sections for each of the fetch, process, visualize, and publish steps, as well as general information about the viz. For each individual step, information is read from the viz.yaml and passed to the appropriate internal or external function as a viz object. This is simply a list with elements for each field of that yaml section. For example, this yaml section:

id: cars_data
    location: cache/fetch/cars.csv
    fetcher: cars
    mimetype: text/csv
    scripts: scripts/fetch/cars.R

becomes this R list object:

as.viz('cars_data')
$id
[1] "cars_data"

$location
[1] "cache/fetch/cars.csv"

$fetcher
[1] "cars"

$mimetype
[1] "text/csv"

$scripts
[1] "scripts/fetch/cars.R"

$block
[1] "fetch"

$export
[1] FALSE

attr(,"class")
[1] "viz"

The viz object will inherit additional classes as it passes through different functions. Most importantly, defaults will be assigned for some items if you leave them unspecified (e.g., export above) - see viz.defaults.yaml for all defined defaults.

Here we will examine the viz.yaml included in the example viz:

Info

vizlab: "0.1.5"
info:
  id: example
  name: Simple but complete vizualization
  date: 2016-07-19
  publish-date: 2016-08-02
  path: /example
  analytics-id: UA-78530187-1
  description: >-
    This is meant to touch all features and act as an integration
    test of the vizlab platform.

analytics-id contains the Google Analytics (GA) ID to be used. The other fields here are self-explanatory and used to generate an index of all published visualizations to go in the footer of the finished product.

Each viz should have it's own GA ID. Follow these instructions to create a new ID.

Fetch

fetch:
  -
    id: iris_data
    location: data/iris.csv
    mimetype: text/csv
    scripts:
  -
    id: car_data
    location: data/car_info
    mimetype: text/csv
    reader: folder
    scripts:
  -
    id: cuyahoga
    location: cache/fetch/cuyahoga.csv
    fetchargs:
      sourceloc: data/pretend_remote/cuyahoga.csv
    fetcher: cuyahoga
    scripts: scripts/fetch/cuyahoga.R
    mimetype: text/csv

Each individual piece in each section of the viz.yaml should have it's own unique ID, so they can be referenced by the suceeding sections. Following sections will look in the location field to find the corresponding file. mimetype specifies file type, and is a required field unless a custom reader is specified.

Two possibilites for fetchers are used here. iris_data and car_data are local files/folders already located in the data folder, and so do not need to be fetched. They use the default fetcher, file. Additional built-in fetchers, for ScienceBase and URLs, are defined in vizlab and can also be used. cuyahoga is a custom fetcher, located in scripts/fetch/cars.R. It will retrieve the cars data (by reading in the default cars dataset) and save it to cache/fetch/cars.csv.

The reader specified will be used to read in the data for other steps. See methods('readData') for a list of the readers included in the vizlab package. If not specified, the appropriate reader is inferred from mimetype. Alternatively, a specific or custom reader can be specified. For example, the car_data step indicates a specific folder reader. If the reader is a custom reader written for this vizzy, the code should be stored in the scripts/process directory.

Process

process:
  -
    id: cuyahoga_short
    location: cache/process/cuyahoga_short.tsv
    mimetype: text/tab-separated-values
    scripts: scripts/process/cuyahoga.R
    depends:
      - cuyahoga
    processor: cuyahoga

depends references the IDs in the fetch section. All processors must be supplied by the user. Just as the custom fetchers are stored in scripts/fetch, processors are stored in scripts/process. The output of each processor is stored in the directory specified by the location field.
A processing step is not mandatory --- the iris_data plot uses only the raw data, so it has no section here.

Visualize

visualize:
  -
    id: relative_abundance_fig
    title: Relative Abundance
    alttext: Relative abundance of mayflies, above vs below
    location: figures/relative_abundance_fig.svg
    depends:
        cuyahoga: cuyahoga_short
        mayfly: mayfly_nymph
        sizes: plot_info
    visualizer: relative_abundance
    mimetype: image/svg+xml
    export: true
    scripts: scripts/visualize/relative_abundance.R

In the visualize step, location specifies the location of the figure being output by each step. Visualizers are R functions stored in scripts/visualize/. title and alttext become part of the finished viz, as a figure title and alt text that appears when the figure is hovered over. Note that there can be multiple dependencies -- relative_abundance_fig depends on the processing step cuyahoga_short, the raw data mayfly_nymph from the fetch step, and plot parameters plot_info. If any of these dependencies change, visualize_relative_abundance will be rerun and the figure updated.

Publish

  -
    id: mainCSS
    location: layout/css/main.css
    mimetype: text/css
    publisher: resource
  -
    id: normalizeCSS
    location: layout/css/normalize.css
    mimetype: text/css
    publisher: resource
  -
    id: index
    name: index
    template: fullPage
    depends:
      relative_abundance: "relative_abundance"
      mainCSS: "mainCSS"
      normalizeCSS: "normalizeCSS"
      footer-style: "footer-style"
      footer: "footer"
      font: "pagefonts"
    publisher: page
    context:
      title: testViz
      sections: [ "relative_abundance" ]
      resources: [ "font", "mainCSS", "normalizeCSS", "footer-style"]
      footer: [ "footer" ]
  -
    id: relative_abundance
    template: simplefigure
    context:
      id: relative_abundance
      figure: relative_abundance_fig
      caption: "Relative Abundance"
    depends:
      relative_abundance_fig: "relative_abundance_fig"
    publisher: section
  -
    id: facebook-thumb
    location: images/facebook-thumb.png
    publisher: thumbnail
    for: facebook
    mimetype: image/png
    export: TRUE
  -
    id: landing-thumb
    location: images/landing-thumb.png
    publisher: thumbnail
    for: landing
    mimetype: image/png
    export: TRUE
  -
    id: footer
    publisher: footer
    template: footer-template
    depends: footer-style
    blogsInFooter: TRUE
    vizzies:
      - name: Microplastics in the Great Lakes
        org: USGS-VIZLAB
        repo: great-lakes-microplastics
      - name: Climate Change and Freshwater Fish
        org: USGS-VIZLAB
        repo: climate-fish-habitat
    blogs:
      - name: Using the dataRetrieval Stats Service
        url: https://owi.usgs.gov/blog/stats-service-map/
        thumbLoc: https://owi.usgs.gov/blog/images/owi-mobile.png
  -
    id: footer-style
    location: layout/css/footer.css
    publisher: resource
    mimetype: text/css
  -
    id: pagefonts
    publisher: googlefont
    family: "Source Sans Pro"
    weight: [300, 400, 700]

The publish step creates the finished viz product. The first section applies to the viz as a whole. It is dependent on all the other publish sections for each figure and the text. name will be the name of the finished viz HTML file. Each section with a figure depends on the corresponding visualize section. Publishers and templates are used in each section of process, and correspond to the sections's content. Default publisher R functions are stored in publish.R in the vizlab package, and default templates inside the vizlab package in inst/templates. Templates are mustache files that define how the text or image content is displayed in the final HTML. For the example viz, the section publisher is used for every individual section, and fullPage publishes the full viz. Each section, including index, has a context section that defines the actual material to be used. For figures it includes the caption. Note that text-only sections can be included here as well, using the printall template (text-section above). The figure-style section references CSS included with the example viz package. The footer section builds a footer for the web page which may contain links to other visualizations or footers. The footer publisher gets viz information from theirGithub repos, while blog information is specified directly.

Setup

The following steps are required to start a viz from scratch. The first two have already been done for the example viz, so are not required here. These steps assume that the actual content creation phases are complete, i.e. there are scripts complete for data retrieval, processing, and figure creation.

  1. Create the skeleton: First the visualization directory structure should be created, using the vizSkeleton function. This is unnecessary for the example viz, since the directories are already set up in the repo. You can run this function in a dummy directory to see what happens.

  2. Fill in the viz.yaml: As described above, viz.yaml needs to be filled out in order to guide the creation of the various make files. An empty skeleton viz.yaml is created by the vizSkeleton fuction. The complete viz.yaml in the example viz repository can also be a useful reference.

  3. Authentication (if needed): The dssecrets package or a personal secret vault can be used for authentication. Check with a DS team member for details. For security, it is strongly reccommended to use a ScienceBase account that does not utilize personal credentials.

Build

Finally, execute all the created make files by running vizmake() from the console:

vizmake()
> vizmake()
[  LOAD ] 
(       ) vizlab/remake/timestamps/cuyahoga.txt
(       ) vizlab/remake/timestamps/never_current.txt
Starting build at 2017-10-28 10:23:39
[  READ ]                                                |  # loading sources
<  MAKE > Viz
[ BUILD ] plot_info                                      |  plot_info <- parameter("plot_info")
[  READ ]                                                |  # loading packages
[ BUILD ] data/site_text.yaml                            |  fetch("site_text_data")
[ BUILD ] vizlab/remake/timestamps/cuyahoga.txt          |  fetchTimestamp("cuyahoga")
[ BUILD ] data/car_info                                  |  fetch("car_data")
[ BUILD ] vizlab/remake/timestamps/never_current.txt     |  fetchTimestamp("never_current")
[ BUILD ] layout/css/main.css                            |  publish("mainCSS")
[ BUILD ] layout/css/normalize.css                       |  publish("normalizeCSS")
[ BUILD ] images/facebook-thumb.png                      |  publish("facebook-thumb")
# etc

The finished HTML will be created in the target directory, along with its corresponding CSS and images for each figure.

Controlling the fetch section

Every fetch item has a fetcher, either implemented by vizlab or within your code. There must be methods for both fetch and fetchTimestamp for every fetcher. Consider setting your custom fetchTimestamp.xxx method to alwaysCurrent or neverCurrent if appropriate - these function names describe the currency of the local information relative to the remote information only, because dependencies on local information are taken care of elsewhere.

Timestamps are potentially checked (using fetchTimestamp) every time you build a fetch item with vizmake. You can slow down the frequency of timestamp checks or eliminate checks altogether by placing a line in a file called preferences.yaml in your main project directory. For example:

timetolive:
  iris_data: Inf days
  cuyahoga: 6 hours

means that the iris_data item will never have its timestamp checked, while cuyahoga will have its timestamp checked on any build occuring more than 6 hours after the last time the timestamp was checked. The default (for unmentioned fetch items) is "0 days". If the timestamp is checked and has changed, the item with be re-fetched.

Edits to the viz.yaml chunk for the fetch item will also trigger re-fetches, regardless of the timestamp's value or whether it has exceeded its timetolive. Similarly, a fetch item that hasn't yet been built will always be built.

Dependencies

R package dependencies are stored in the required-packages section of the viz.yaml, with a repository (CRAN or GRAN generally) and version number. Don't include remake on the required-packages list.



USGS-VIZLAB/vizlab documentation built on July 10, 2019, 12:08 a.m.