library("vegawidget")
library("conflicted")
library("lubridate")
library("readr")
library("dplyr")

Dates and times can be tricky, even within the familiar confines of R. Vega and Vega-Lite are run in JavaScript, which has its own philosophy for dates and times. In R, we have access to the entire time-zone database; in JavaScript, using the Date object, there are only two time zones available: the local time zone used by the client browser and UTC.

If you are working with time-based data, this mismatch of capabilities introduces a lot of "opportunities" to create Vega(-Lite) visualizations that do not behave as intended. The purposes of this article are

The documentation for the Altair Python package includes an article on times and dates with very clear explanations; it is hoped that this article can be similarly clear.

Time zones

str_moon_landing <- "1969-07-21T02:56:15Z"
moon_landing <- parse_datetime(str_moon_landing)

Things happen at fixed instants in time; we use time zones to describe those fixed instants using a local context. Consider that Neil Armstrong's first step on the moon happened at r str_moon_landing, represented here as an ISO-8601 string.

All ISO-8601 strings refer unambiguously to UTC, the globally-recognized standard reference-frame; it is the standard way to communicate time-stamps. However, it lacks context for someone who may remember an instant from Chicago, or another who may remember the same instant from Paris.

In the context of Chicago, the first-step happened at r with_tz(moon_landing, "America/Chicago"); in the context of Paris, it happened at r with_tz(moon_landing, "Europe/Paris"). We use a system of time zones to provide this context. The standard representation of time zones is IANA (also known as Olson) time-zones. These are each associated with a given geographic region, such as "America/Chicago" or "Europe/Paris"; a time zone is used to determine the offset to UTC, given an instant expressed in UTC.

High level-programming languages, such as R, have access to the full IANA time-zone database. However, the JavaScript Date object has access to only two time zones: the timezone of the local browser, and UTC. JavaScript-based visualization systems, including d3, Plotly, and Vega, are constrained by this limitation because they depend on the JavaScript Date object.

Vega-Lite examples

Let's look at an example using one of the datasets included in this package: data_seattle_hourly, adapted from vega-datasets:

glimpse(data_seattle_hourly)

This dataset contains hourly observations of Seattle temperature (°F) for the year 2010. Let's inspect the time zone of the date variable:

tz(data_seattle_hourly$date)

We see that the date variable uses the "America/Los_Angeles" time zone, which is what we would expect for Seattle. This means that the values for date are displayed above as local time. To make the following demonstrations a little lighter, we will use an shortened version of this dataset, one that uses only 2010-01-01:

data_seattle_hourly_short <-
  data_seattle_hourly %>%
  dplyr::filter(
    floor_date(date, "day") == ymd("2010-01-01", tz = "America/Los_Angeles")
  ) %>%
  glimpse()
spec_local <- 
  list(
    `$schema` = vega_schema(),
    width = 600,
    height = 75,
    data = list(values = data_seattle_hourly_short),
    mark = "line",
    encoding = list(
      x = list(
        field = "date", 
        type = "temporal",
        axis = list(format = "%H:%M")
      ),
      y = list(
        field = "temp", 
        type = "quantitative", 
        scale = list(zero = FALSE)
      )
    )
  ) %>%
  as_vegaspec()

spec_local

Our goal is to show the times in the chart using the local time-zone; in this case, we have succeeded. Good news, but let's have a closer look at what's going on.

Here are the first couple of observations in the data frame, rendered as JSON (using the same settings for datetimes as we use when render the entire vegaspec to JSON):

data_seattle_hourly_short %>% head(2) %>% jsonlite::toJSON()

We see that date-times are not formatted using ISO-8601 formatting - the times are serialized to JSON using the local context. When Vega-Lite parses this data, it recognizes that this is a datetime and that the format is not ISO-8601. As such, these times will be parsed, interpreted, and displayed as local times:

1) Times are parsed as UTC time if the date strings are in ISO format. Note that in JavaScript date strings without time are interpreted as UTC but but date strings with time and without timezone as local. 2) If that is not the case, by default, times are assumed to be local.

This is what we intended to demonstrate, but you should be aware of some "gotchas" due to daylight-saving time:

This is why it is a best-practice in R (and elsewhere) to serialize datetimes using the ISO-8601 format, then to treat the local time-zone, in this case "America/Los_Angeles", as metadata. However, for the reasons outlined in at the beginning of this article, we cannot get this to work using Vega(-Lite). What follows are two different ways that Vega-Lite works with UTC times, neither of which is likely what you want.

This package has a function, vw_serialize_data(), to serialize dates and times in a data frame. It has an argument to specify a format for dates, iso_date, and an option to specify a format for datetimes, iso_dttm. We set the iso_dttm argument to TRUE to serialize our datetimes using ISO-8601 format:

data_seattle_hourly_short_iso <- 
  data_seattle_hourly_short %>%
  vw_serialize_data(iso_dttm = TRUE) %>%
  glimpse()

You can see that we have changed the date variable from a datetime to a character string. Its values show date:

Let's see what our chart looks like, using this data:

spec_iso_local <- spec_local

spec_iso_local$data$values <- data_seattle_hourly_short_iso

spec_iso_local

If the browser you are using to view this page is in the "America/Los_Angeles" time zone, this chart will appear identical to the first chart. However, if your browser is not "in" this time-zone, the axis labels for date are now not what we might expect. Vega-Lite parses and interprets the date values as UTC, but displays the date using the browser's local time-zone. I (Ian) am in the "America/Chicago" time zone, so the axis, for me, begins at 02:00. This is consistent with my time zone being two hours ahead of "America/Los_Angeles".

As noted above, the JavaScript Date object knows two time-zones: the browser's and UTC. Our other option is to direct Vega-Lite to display the time as UTC:

spec_iso_utc <- spec_iso_local

spec_iso_utc$encoding$x$scale <- list(type = "utc") 

spec_iso_utc

Here, the date axis begins at 08:00, which is the UTC time when it was midnight in Seattle.

Datetime compromise

Because of the time-zone limitations of the JavaScript Date object, we are forced to make a compromise.

If we serialize datetimes using the local (to the data) time-zone, using a non-ISO-8601 format:

If we serialize datetime data using the ISO-8601 format, the instants are parsed and interpreted correctly. However, we have the option to display using only the time zone of the browser or UTC. It is entirely likely that the context (time zone) of our data is different from the context of our browser - in which case we are unable to make an effective visualization.

Faced with this choice, it seems the first option -- serializing our data using the data's local time-zone -- is the least-bad option, as it gives us the opportunity to present the data using its own context. That being said, we need to be mindful of the daylight-saving pitfalls, and warn our users accordingly. Further, we have to recognize (and possibly note to our users) that the data inside the visualization is compromised.

In other situations, the best-practice for serializing datetimes is to use ISO-8601 formatting and to keep the time zone as metadata. This allows us to refer, unambiguously, to the actual instants-in-time. It would be great if this were "baked into" Vega(-Lite), but it seems the limitation is fundamentally in the JavaScript Date object, suggesting that it would be a heavy lift to implement a comprehensive solution.

Notes on dates

In R, we have different types for date (Date) vs. datetimes (POSIXct). This can be a useful distinction, as using dates makes it easier for us to compare a day in New York with a day in Brisbane.

In JavaScript, we have a single type, Date, to represent both dates and datetimes; the convention is to treat dates as the datetimes at the start of that day. For a given situation, it remains to determine which time zone to use: UTC or local time.

In my opinion, this choice would depend on the larger context of the data. If working with a dataset where a date is used in association with datetimes in the same time-zone context, e.g. all associated with a single factory, then it might make more sense to think of both the date and datetimes in the same time-zone context. However, if working with a dataset with dates associated with locations in different time-zones, e.g. New York and Brisbane, it might make more sense to parse these dates using a neutral context, like UTC.

In Vega, the parsing default for datetimes is that if the data is formatted using ISO-8601, then it is parsed as UTC. To be clear, what an R user might think of as a date string: "2001-01-01", is in ISO-8601 format; accordingly, by default, it is parsed as a JavaScript Date object in the UTC context.

Let's consider the dataset data_seattle_daily, where we have daily observations of Seattle weather over the course of four years:

glimpse(data_seattle_daily)

The default JSON-serializer for dates uses the ISO-8601 format for dates. If we wanted to use a non-ISO-8601 format, we could use vw_serialize_data() using iso_date = FALSE:

data_seattle_daily %>%
  vw_serialize_data(iso_date = FALSE) %>%
  glimpse()

For the purpose of making some examples of how to work with UTC dates, let's filter the original dataset, with ISO-8601 formatting, keeping the first month:

data_seattle_daily_short <- 
  data_seattle_daily %>%
  dplyr::filter(floor_date(date, "month") == as.Date("2012-01-01")) %>%
  glimpse()

Let's look at a line-plot of the daily maximum-temperatures:

spec_tempmax_local <- 
  list(
    `$schema` = vega_schema(),
    width = 600,
    height = 75,
    data = list(values = data_seattle_daily_short),
    mark = "line",
    encoding = list(
      x = list(
        field = "date",
        type = "temporal"
      ),
      y = list(
        field = "temp_max",
        type = "quantitative"
      ),
      tooltip = list(
        list(field = "date", type = "temporal", format = "%Y-%m-%d %H:%M:%S")
      )
    )
  ) %>%
  as_vegaspec()

spec_tempmax_local

If you are in a timezone that is different from UTC (sorry UK), you will see that the points that make up the line are not aligned with the grid. This is another effect of time zones - Vega has parsed and interpreted the date as UTC, but its default is to display it to us using the local time-zone of our browser. For me, near Chicago, the days are "starting" six hours early. You can verify this by using the tooltips, which show the date with the time attached.

In this situation, we wish to keep the UTC interpretation throughout the visualization. We can do this by using a UTC scale-specification for the x-axis (and the tooltip):

spec_tempmax_utc <- spec_tempmax_local

spec_tempmax_utc$encoding$x$scale <- list(type = "utc")
spec_tempmax_utc$encoding$tooltip[[1]]$scale <- list(type = "utc")

spec_tempmax_utc

When using UTC dates (and datetimes), be mindful that you may also have to set the scales to UTC, so that the axis and tooltips behave similarly, and properly, for everyone.

Summary

Parsing

When Vega(-Lite) parses values for dates and times, it checks whether the string is formatted using ISO-8601.

Here are some examples of ISO-8601 strings:

2001-01-01T19:34:05Z
2001-01-01T19:34:05+05:00
2001-01-01

Here are some examples of non-ISO-8601 strings:

2001-01-01 19:34:05
Jan-01-2001 19:34:05
2001/01/01 

By default, if the format is ISO-8601, Vega will parse the string as UTC. Similarly, by default, if the format is non-ISO-8601, Vega will parse the string as local time. You can override the default by providing a parsing directive in the specification. If you wish to change how dates and times are serialized, you can use the vw_serialize_data() function.

Datetimes

It is likely that you wish to show datetime values using the context of the data, e.g. you want to look at the temperatures in Seattle using in the context of Seattle. From R, assuming that your datetimes are using the data-local timezone, e.g. "America/Los_Angeles", the least-bad way to do this is to serialize your datetime values using data-local time, rather than using UTC; this will happen by default. Although this will let you display the data as you intend, there are a couple of important points to keep in mind:

If you wish for Vega to parse your datetimes using UTC, you can serialize your datetimes using the function vw_serialize_data() using iso_dttm = TRUE.

Dates

Date and datetimes are the same type in JavaScript; a date is just a datetime at the start of a day.

Depending on your situation, you may wish to interpret a date using UTC or using the local time-zone of the browser.

Scales

In Vega-Lite, displaying temporal-values on a scale is separate from parsing temporal-values. Regardless of the parsing behavior, Vega-Lite's default temporal-scale uses the browser's local time-zone. You can specify a scale in an encoding, for example:

list(
  ...,
  encoding = list(
    x = list(field = "date", type = "temporal", scale = list(type = "utc")),
    ...
  ),
  ...
)

Vega-Lite time unit

For an interactive version of this section on time units, please see this Observable notebook.

Let's say you have daily weather data for, say, Seattle over the course of four years. In this case, we want to load the data into vega using a URL, making this page a little lighter by not carrying all the data around in each of the following charts. Let's have a look at the first few lines:

url_seattle_daily <- "https://vega.github.io/vega-datasets/data/seattle-weather.csv"

url_seattle_daily %>%
  read_lines(n_max = 6) %>%
  print()

As above, this is daily data covering the years 2012-2015 inclusive. Our goal in this section is to show how Vega-Lite time units work by creating charts that:

To be clear, you could prepare such variables in a pre-processing step, creating new variables in R, using dplyr and lubridate. You can also do this in Vega-Lite, using time unit, a transformation mechanism for temporal values.

Let's start specification for a time-based scatterplot for the daily maximum-temperature, showing every observation in the dataset. For purposes of illustrating how to work with UTC values, we parse the dates as UTC. In the vegaspec below, note the format entry: parse = list(date = "utc:'%Y/%m/%d'"):

spec_daily <-
  list(
    `$schema` = vega_schema(),
    width = 600,
    height = 200,
    data = list(
      url = url_seattle_daily,
      format = list(
        parse = list(date = "utc:'%Y-%m-%d'")
      )
    ),
    mark = "point",
    encoding = list(
      x = list(
        field = "date",
        type = "temporal",
        scale = list(type = "utc")
      ),
      y = list(
        field = "temp_max",
        type = "quantitative",
        scale = list(domain = list(-5, 40))
      ),
      tooltip = list(
        list(
          field = "date", 
          type = "temporal", 
          scale  = list(type = "utc"),
          title = "date",
          format = "%Y-%m-%d"
        ),
        list(field = "temp_max", type = "quantitative")
      )
    )
  ) %>%
  as_vegaspec()

spec_daily

We use point-marks here not because it is optimal for the visualization (it probably isn't), but because it may help make clearer what is happening behind the scenes. Seeing what Vega-Lite actually does can help us to understand its time-unit API.

Let's say we want to look at the maximum temperature for each year-month in the dataset - 48 months in all. If we were to do this in R, we might make a preprocessing step like this:

data_seattle_daily %>%
  mutate(yearmonth = floor_date(date, "month")) %>%
  group_by(yearmonth) %>%
  summarise(temp_max = max(temp_max)) %>%
  glimpse()

We could then plot this data. However, we can make these data transformations within Vega-Lite. We will show this as a series of steps; the first step is to map the values of date to the beginning of the month.

# copy the previous spec
spec_yearmonth <- spec_daily

# modify the x-encoding
spec_yearmonth$encoding$x$scale <- NULL
spec_yearmonth$encoding$x$timeUnit <- "utcyearmonth"

# modify the tooltip-encoding
spec_yearmonth$encoding$tooltip <- 
  append(
    spec_yearmonth$encoding$tooltip, 
    list(
      list(
        field = "date", 
        type = "temporal", 
        timeUnit = "utcyearmonth", 
        title = "yearmonth", 
        format = "%Y-%m-%d"
      )      
    ), 
    after = 1
  )


spec_yearmonth

The main change we made to the specification was to tweak the x-encoding: removing the UTC scale and adding a timeUnit, setting it to "utcyearmonth". The specification of a time unit did three things:

For me, the essence of the concept is that a time unit defines a mapping of an existing temporal-variable to a new temporal-variable; it is a dplyr::mutate() that returns another datetime.

We removed the UTC scale from the x-encoding because this is taken care-of with the specification of "utc" within the time unit. If we were to specify it twice, the UTC-offset would be applied twice.

We modified the tooltip to demonstrate that we have the same underlying data. On the x-axis, we use the "mutated" year-month rather than the date.

The next step is to apply the Vega-Lite equivalents to dplyr's group_by() and summarise() operations.

# copy the previous spec
spec_yearmonth_aggregate <- spec_yearmonth

# add an aggretation directive to the y-encoding
spec_yearmonth_aggregate$encoding$y$aggregate = "max"

# remove the original date from the tooltip, add temperature-aggretation
spec_yearmonth_aggregate$encoding$tooltip[[1]] <- NULL 
spec_yearmonth_aggregate$encoding$tooltip[[2]]$aggregate <- "max"

spec_yearmonth_aggregate

These two operations are undertaken by adding a single specification: the y-encoding directive aggregate = "max". Vega-Lite carries out an implied group_by() on all the encodings are not aggregated. In this case, y (temp_max) is aggregated; we group by our remaining encoding, x, which is now the "year-month".

Also, note that we need to take care with our tooltip definitions. Tooltips are encodings, so the aggregation rule applies to tooltip encodings no different that it applies to other encodings. If something "weird" happens when building a chart that uses implicit aggregation, it can be useful to check the tooltip-definitions for encodings and aggregations that are inconsistent with the rest of the encodings (ask me how I found this out).

We can add an color-encoding for the year, again using a time-unit:

# copy previous spec
spec_yearmonth_aggregate_color <- spec_yearmonth_aggregate

# create encoding for color using "utcyear" time-unit of date
spec_yearmonth_aggregate_color$encoding$color <- 
  list(field = "date", type = "nominal", timeUnit = "utcyear")

# insert a tooltip element to indicate year
spec_yearmonth_aggregate_color$encoding$tooltip <- 
  append(
    spec_yearmonth_aggregate_color$encoding$tooltip,
    list(
      list(
        field = "date", 
        type = "temporal", 
        timeUnit = "utcyear", 
        title = "year", 
        format = "%Y-%m-%d"
      )
    ),
    after = 0
  )


spec_yearmonth_aggregate_color

Again, this is another intermediate step; creating a new internal variable using a "utcyear" time-unit. We see that Vega-Lite does the "right" thing with the color-legend. Using the tooltip, we also see that the values of year are the datetimes corresponding to the first instant in a given year.

In the previous chart, we noted that Vega-Lite will group-by any encoded variable that is not aggregated, then perform the aggregations. In this case, we are grouping by "year" and "year-month", so the end-result of the groupings is the same as the previous example.

Our final chart in this series of examples will be to put each of the years on the same scale; we do this by using a time-unit of "utcmonth" for the x-encoding.

spec_month_aggregate_color <- spec_yearmonth_aggregate_color

spec_month_aggregate_color$encoding$x$timeUnit <- "utcmonth"

spec_month_aggregate_color$encoding$tooltip[[2]]$timeUnit <- "utcmonth"

spec_month_aggregate_color

Looking at the chart itself, we can see that it is has now "arrived" at our destination: we can more-easily compare years because the months are on the same scale. Using the tooltip, you can see that the time-unit "utcmonth" has projected all of the different years into the year 1900. We are showing this projected year on the x-axis, but the axis labels indicate only the month. The specification of the "utcmonth" time-unit did four things:

Summary

A Vega-Lite timeUnit maps a datetime to another datetime.

There are a lot of choices among time units; each choice, e.g. "monthdate", "utcyear", etc., has two dimensions:

Finally, a timeUnit specification will set the default formatting (used for axes) for that variable.



vegawidget/vegawidget documentation built on Jan. 27, 2024, 10:48 a.m.