knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
The purpose of this article is to demonstrate how to use data provided by the Vega datasets Python package.
Here's the short version:
We can access the Vega datasets using the import_vega_data()
function. This package's convention is to assign this to an object named vega_data
.
To access the data for a specific dataset, call the method for that dataset: e.g. vega_data$cars()
. Where you see a hyphen in a dataset name in the Altair documentation, use an underscore instead, e.g. vega_data$sf_temps()
.
These datasets have metadata that include a description, references, and a URL: e.g. vega_data$cars$references
.
When you create an alt$Chart()
, the data
argument need not be a data frame; it can be a reference to a data frame, such as a URL.
If your chart contains a reference to an external resource, such as vega_data$cars$url
, it will not render in the RStudio IDE due to RStudio's (well founded) security policy. If you view such a chart using an external browser, it should work OK if your computer can access the remote data.
In the Altair documentation, you will see this code used often:
from vega_datasets import data cars = data.cars()
The Altair convention is to use the name data
to refer to the data
object in the vega_datasets
package. This package offers a similar convention:
library("altair") vega_data <- import_vega_data() cars <- vega_data$cars()
Instead, our convention is to use an object called vega_data
.
Our vega_data
object has a method to list all its datasets:
vega_data$list_datasets() %>% head()
Each dataset can be accessed using a method whose name is an element returned from vega_data$list_datasets()
.
library("tibble") vega_data$anscombe() %>% as_tibble()
It is useful to keep in mind that reticulate changes the names of the datasets, and presumably, Python objects in general. Where you see a -
in a name of a Python object, a _
will be used in the name of the reticulated object in R. For example, in Python: data.sf-temps()
; in R:
vega_data$sf_temps() %>% as_tibble()
Each dataset has some metadata, such as a description and references.
wrapcat <- function(x) { x %>% strwrap() %>% cat(sep = "\n") } vega_data$anscombe$description %>% wrapcat()
vega_data$anscombe$references %>% wrapcat()
Some of the datasets are stored locally as a part of the Vega datasets Python package, others are not. The method that returns the data, e.g. vega_data$anscombe()
will do the right thing. You can use the is_local
property to find out what the right thing is for a dataset.
vega_data$anscombe$is_local
Each dataset has a remote URL, which you can use instead of a data frame in any Altair data
argument.
vega_data$anscombe$url
You can specify data
using a URL that points to a dataset, rather than using a data frame explicitly.
cars_url <- vega_data$cars$url chart_cars <- alt$Chart(cars_url)$ encode( x = "Weight_in_lbs:Q", y = "Miles_per_Gallon:Q", color = "Cylinders:N" )$ mark_point() chart_cars
This works in your browser, but not might not work in the RStudio IDE. This is because, for security reasons, the RStudio IDE may not let you refer external URLs that are not on their allow-list (such as YouTube and Vimeo). If you open this up in a browser, it works just fine (as long as you have access to the internet).
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.