knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) library(nzcensr) library(sf) library(knitr) suppressPackageStartupMessages(library(tidyverse))
The nzcensr
package is a data package which makes it easy to import the New Zealand Census data as either normal or spatial dataframes without having to download the data for each project and perform different joins.
The package contains the following data sets:
All of these data sets are provided at the meshblock, area unit, local board, territorial authority ("tas") and regional spatial level. They all follow the same regular naming convention of data set name with the spatial area following e.g.
dwelling_area_units
or
individual_part_3b
All of the data sets are lazily loaded which means that they are only brought into memory when called; not when the package is just loaded.
A key idea behind this package is the idea of making it easier to access and know what data is available and then to transform it.
The function nz_census_tables
makes it easy to explore the data. Simply calling it without any arguments returns a table with all of the data frames available, and a small note explaining what they are:
kable(nz_census_tables())
This function also accepts the input of a table which returns all of the unique topics in the data set, and whether to include the variables or not.
kable(nz_census_tables("dwelling_area_units"))
or (table shows just the first ten for presentation)
nz_census_tables("dwelling_area_units", variables = TRUE) %>% slice(1:10) %>% kable()
Let's have a look at religious association in Auckland, New Zealand. Firstly, let's select the topic using the keyword religion, and then transform into the long format that is 'cleaned' and replace the confidential values with 1. By cleaned, I mean that the census columns are split up into year, topic and variable columns.
religious_association_nz_region <- select_by_topic(individual_part_2_regions, "religious") %>% filter_by_census_area("regions", "regions", "Auckland") %>% transform_census(gis = FALSE, long = TRUE, clean = TRUE)
The above workflow essentially encapsulates the workflow brought to the user by the nzcensr
package which tries to make it as easy to access the data, select the topics wanted, filter by regions and transform in a 'tidy' table.
The output of this is the following table (top five rows only):
religious_association_nz_region %>% head(5) %>% kable()
Cool, that is all tidy.
Let's make some plots!
For interests sake, let us create a variable of percentage of people as opposed to absolute numbers. To do this, we create two tables of the religions and the totals and then join them back to each other based on the region. Then we can simply divide the religion number by the total and create a percentage.
religious_association_nz_region_wanted_variables <- filter(religious_association_nz_region, !(variable %in% c("Not Elsewhere Included(3)", "Total people", "Total people stated", "Object to Answering", "Other Religions", "Spiritualism and New Age Religions"))) %>% rename(number_people = value) religious_association_nz_region_total_stated <- religious_association_nz_region %>% filter(variable == "Total people stated") %>% select(Area_Code_and_Description, year, total_number_people = value) religious_association_nz_region_percent <- left_join(religious_association_nz_region_wanted_variables, religious_association_nz_region_total_stated) %>% mutate(percentage_people = round(number_people / total_number_people, 4) * 100, Description = str_replace(Description, " Region", "")) %>% select(-topic)
Now to create a line plot to investigate the trends the three time periods.
ggplot(religious_association_nz_region_percent) + geom_line(aes(x = year, y = percentage_people, colour = variable, group = variable)) + scale_colour_discrete(name = "Religion") + ggtitle("Religion as a percentage of population in Auckland from\n2001 to 2006 to 2013") + ylab("Percent") + xlab("Year") + theme_minimal() + theme(legend.position = "top")
Interesting, in these plots you can clearly see the 'decline' of Christianity occuring and the increasing 'lack of religion'. Clearly, these two categories dwarf the others, and it would be interesting to drop these out to see more of the minor religions.
religious_association_nz_region_percent_minors <- filter(religious_association_nz_region_percent, !(variable %in% c("Christian", "No Religion"))) ggplot(religious_association_nz_region_percent_minors) + geom_line(aes(x = year, y = percentage_people, colour = variable, group = variable)) + facet_wrap(~Description, scales = "free_y") + scale_colour_discrete(name = "Religion") + ggtitle("Religion as a percentage of population in Auckland from\n2001 to 2006 to 2013") + ylab("Percent") + xlab("Year") + theme_minimal() + theme(legend.position = "top")
Some interesting things going on here!
Anyway, I hope I have demonstrated how easy it is to analyse NZ census data with the nzcensr
package.
It's brand new, and my first package, so please let me know if you find any issues or bugs.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.