The rdatamarket
package is an R client for the DataMarket.com
API, fetching the contents and metadata
of datasets on DataMarket.com into R.
To install the package:
> install.packages("rdatamarket")
(If you are on Linux and get error messages involving RCurl, you may need
to install a package called libcurl4-openssl-dev
or similar, to get RCurl
working.)
... and then load the package:
> library(rdatamarket)
Just find the data you want on datamarket.com, then copy the URL from your
browser (or a short URL to it) into dmlist
or dmseries
:
> plot(dmseries("http://datamarket.com/data/set/17tm/#ds=17tm!kqc=17.v.i"))
> plot(dmseries("http://data.is/nyFeP9"))
> l <- dmlist("http://data.is/nyFeP9"))
If you need to go through an HTTP proxy, set it up this way:
> dmCurlOptions(proxy="http://outproxy.mycompany.com")
Get a dataset object (find the ID in a datamarket URL, or just paste in the whole URL if you like):
> oil <- dminfo("17tm")
> oil <- dminfo("http://datamarket.com/data/set/17tm/#ds=17tm!kqc=17.v.i"))
> print(oil)
Title: "Oil: Production tonnes"
Provider: "BP"
Dimensions:
"Country" (60 values):
"Algeria"
"Angola"
"Argentina"
"Australia"
"Azerbaijan"
[...]
See all the values of the Country
dimension:
> oil$dimensions[[1]]$values
a "Algeria"
17 "Angola"
d "Argentina"
z "Australia"
1l "Azerbaijan"
1b "Brazil"
v "Brunei"
1h "Cameroon"
13 "Canada"
1o "Chad"
[...]
Here's a dataset with two dimensions (besides time):
> p<-dminfo("http://datamarket.com/data/set/12r9/male-population-thousands")
> print(p)
Title: "Male population (thousands)"
Provider: "United Nations" (citing "United Nations Population Division")
Dimensions:
"Country or Area" (229 values):
"Afghanistan"
"Africa"
"Albania"
"Algeria"
"Angola"
[...]
"Variant" (5 values):
"Constant-fertility scenario"
"Estimate variant"
"High variant"
"Low variant"
"Medium variant"
From that last dataset, fetch the UN's population prediction for Sweden and Somalia in the constant-fertility scenario (note the “(thousands)” in the dataset title):
> dmseries(p, 'Country or Area'=c("Somalia", "Sweden"),
Variant="Constant-fertility scenario")
Somalia Sweden
2010-07-01 4642.070 4613.551
2015-07-01 5357.233 4725.918
2020-07-01 6211.305 4840.434
2025-07-01 7243.572 4942.865
2030-07-01 8490.929 5021.646
2035-07-01 9990.910 5083.680
2040-07-01 11793.524 5144.685
2045-07-01 13966.319 5211.212
2050-07-01 16597.110 5281.437
> dmlist(p, 'Country or Area'=c("Somalia", "Sweden"),
Variant="Constant-fertility scenario")
Country.or.Area Variant Year Value
1 Somalia Constant-fertility scenario 2010 4642.070
2 Somalia Constant-fertility scenario 2015 5357.233
3 Somalia Constant-fertility scenario 2020 6211.305
4 Somalia Constant-fertility scenario 2025 7243.572
5 Somalia Constant-fertility scenario 2030 8490.929
6 Somalia Constant-fertility scenario 2035 9990.910
7 Somalia Constant-fertility scenario 2040 11793.524
8 Somalia Constant-fertility scenario 2045 13966.319
9 Somalia Constant-fertility scenario 2050 16597.110
10 Sweden Constant-fertility scenario 2010 4613.551
11 Sweden Constant-fertility scenario 2015 4725.918
12 Sweden Constant-fertility scenario 2020 4840.434
13 Sweden Constant-fertility scenario 2025 4942.865
14 Sweden Constant-fertility scenario 2030 5021.646
15 Sweden Constant-fertility scenario 2035 5083.680
16 Sweden Constant-fertility scenario 2040 5144.685
17 Sweden Constant-fertility scenario 2045 5211.212
18 Sweden Constant-fertility scenario 2050 5281.437
The above demonstrates dimension filtering; dimensions and their values can
be specified by their $id
or their $title
, to fetch the data filtered to
specific values of a dimension. If no filtering is specified, all of the
dataset is fetched (careful: some datasets are enormous, and the DataMarket.com
API may truncate extremely large responses).
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.