Partial table data download

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  cache = FALSE,
  fig.width = 7,
  fig.height = 5,
  eval = nzchar(Sys.getenv("COMPILE_VIG"))
)

Sometimes one is only interested in a relatively small subset of a large table, and instead of having to download the entire table, or just a handful of vectors, one might want to access a substantial number of coordinates without having to download the entire table. The package facilitates this by

We demonstrate this process by taking the large Table 34-10-0285 on building permits.

library(cansim)
library(dplyr, warn.conflicts = FALSE)
library(ggplot2)

bp_template <- get_cansim_table_template("34-10-0285")

head(bp_template)

The template gives information on the possible combinations of data (although some combinations might not be available in the table and have no data). We can use this to pinpoint data series we are interested in. For the purpose of this example we want to look at the number of housing units created and lost in the four largest metro areas in Canada.

bp_template_filtered <- bp_template %>%
  filter(Geography %in% c("Toronto, Ontario", "Montréal, Quebec", "Vancouver, British Columbia", "Calgary, Alberta"),
         `Type of building` %in% c("Total residential","Total demolitions"),
         `Type of work` %in% c("Demolitions for residential dwellings","Deconversion total","Conversions total","New dwelling units total"),
         Variables %in% c("Number of dwelling-units created", "Number of dwelling-units lost", "Number of dwelling-units demolished"),
         `Seasonal adjustment, value type` == "Unadjusted, current"
  )

bp_template_filtered

This leaves us with r if (exists("bp_template_filtered")) nrow(bp_template_filtered) else 96 coordinates from the table. To get data for just this subset of the table we feed that into the get_cansim_data_for_table_coord_periods function. By default this will get the data for all available times, we could narrow it down to the latest $n$ periods using the periods argument, but here we want to get all available data.

bp_data <- get_cansim_data_for_table_coord_periods(bp_template_filtered)

bp_data

We then aggregate it to annual data and plot it by year and area, separating out dwellings newly created, created through conversion, those lost to deconversions and those demolished.

bp_data |>
  mutate(Value=case_when( # count demolitions and deconversions as negative
    Variables %in% c("Number of dwelling-units demolished","Number of dwelling-units lost") ~ - val_norm,
    TRUE ~ val_norm
  )) |>
  mutate(Name=gsub(", .+","",GEO),
         Year=strftime(Date,"%Y")) |>
  summarize(Value=sum(Value),n=n(),.by=c(Name,Year,`Type of work`)) |>
  filter(n==12,!is.na(Value)) |> # only show years with complete 12 months of data
  ggplot(aes(x=Year,y=Value,fill=`Type of work`)) +
  geom_bar(stat="identity") +
  facet_wrap(~Name,scales="free_y") +
  scale_y_continuous(labels=scales::comma) +
  theme(axis.text.x = element_text(angle=90, hjust=1)) +
  labs(title="Building permits for residential structures in Canadian metro areas",
       y="Number of dwelling units",
       x=NULL,
       fill="Metric",
       caption="StatCan Table 34-10-0285") 

Using vectors instead of coordinates

The same can be achieved with downloading data by vectors. For this we need to add the vector information to the table template. Vector information is only available for coordinates, so it also gives an effective way to filter out invalid coordinate combinations from the template. Vector information is not available for census data tables.

bp_template_filtered_vecotrs <- bp_template_filtered |>
  add_cansim_vectors_to_template()

bp_data_vector <- bp_template_filtered_vecotrs$VECTOR |>
  na.omit() |>
  get_cansim_vector()

This gives us the same data as above, with possibly shorter time series for some coordinates as querying data by vector will not pull data for times when the specific vector was not available. When accessed by vector or by coordinate the data will only differ in this limited way, and the values on the difference will be NA and won't affect the results. For completeness we plot the vector data and obtain an identical graph.

bp_data_vector |>
  mutate(Value=case_when( # count demolitions and deconversions as negative
    Variables %in% c("Number of dwelling-units demolished","Number of dwelling-units lost") ~ - val_norm,
    TRUE ~ val_norm
  )) |>
  mutate(Name=gsub(", .+","",GEO),
         Year=strftime(Date,"%Y")) |>
  summarize(Value=sum(Value),n=n(),.by=c(Name,Year,`Type of work`)) |>
  filter(n==12,!is.na(Value)) |> # only show years with complete 12 months of data
  ggplot(aes(x=Year,y=Value,fill=`Type of work`)) +
  geom_bar(stat="identity") +
  facet_wrap(~Name,scales="free_y") +
  scale_y_continuous(labels=scales::comma) +
  theme(axis.text.x = element_text(angle=90, hjust=1)) +
  labs(title="Building permits for residential structures in Canadian metro areas",
       y="Number of dwelling units",
       x=NULL,
       fill="Metric",
       caption="StatCan Table 34-10-0285") 


Try the cansim package in your browser

Any scripts or data that you put into this service are public.

cansim documentation built on June 8, 2025, 1:12 p.m.