knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
Please read our article for the full context of this project (Open Access):
Vargas SepĂșlveda, Mauricio and Barkai, Lital. 2025. "The REDATAM format and its challenges for data access and information creation in public policy." Data & Policy 7 (January): e18. https://dx.doi.org/10.1017/dap.2025.4.
We start by downloading the Chilean Census 2017 from ECLAC website (link valid as of 2024-10-01):
url <- "https://redatam.org/cdr/descargas/censos/poblacion/CP2017CHL.zip" zip <- "CP2017CHL.zip" if (!file.exists(zip)) { download.file(url, zip, method = "wget") }
Now we can extract the files:
# install.packages("archive") dout <- basename(zip) dout <- sub("\\.zip$", "", dout) archive::archive_extract(zip, dir = dout)
You can use unzip()
from base R, but this file in particular gave me an
error. The archive
package is a wrapper around 'libarchive' that provides
multi-format archive and compression support.
The REDATAM files are now stored in the CP2017CHL
directory. We can read the
REDATAM dictionary file (DIC or DICX for this particular case):
library(redatam) fout <- "chile2017.rds" if (!file.exists(fout)) { chile2017 <- read_redatam("CP2017CHL/BaseOrg16/CPV2017-16.dicx") saveRDS(chile2017, fout) } else { chile2017 <- readRDS(fout) }
One of the many possibilities with this census is to obtain the number of houses with overcrowding. For this, the Secretary for Social Development and Family (Ministerio de Desarrollo Social y Familia) divides the number of people residing in a dwelling and the number of bedrooms in the dwelling, with the special case of adding one to studio apartments and similar units, and the result is discretized as follows.
According to the census documentation in the previous ZIP file, this consists in
dividing the variables cant_pers
and p04
from the vivienda
(housing) table
to then discretize the result. The documentation also states that we must join
the vivienda
table with zonaloc
(zones), area
, distrito
(district) and
communa
(municipality) to match each house with its corresponding
municipality. This can be done with dplyr
:
library(dplyr) overcrowding <- chile2017$comuna %>% select(ncomuna, comuna_ref_id) %>% inner_join( chile2017$distrito %>% select(distrito_ref_id, comuna_ref_id) ) %>% inner_join( chile2017$area %>% select(area_ref_id, distrito_ref_id) ) %>% inner_join( chile2017$zonaloc %>% select(zonaloc_ref_id, area_ref_id) ) %>% inner_join( chile2017$vivienda %>% select(zonaloc_ref_id, vivienda_ref_id, cant_per, p04) %>% mutate( p04 = case_when( p04 == 98 ~ NA_integer_, p04 == 99 ~ NA_integer_, TRUE ~ p04 ) ) %>% filter(!is.na(p04)) ) %>% mutate( overcrowding = case_when( p04 >=1 ~ cant_per / p04, p04 ==0 ~ cant_per / (p04 + 1) ) ) %>% mutate( overcrowding_discrete = case_when( overcrowding < 2.5 ~ "No Overcrowding", overcrowding >= 2.5 & overcrowding < 3.5 ~ "Mean", overcrowding >= 3.5 & overcrowding < 5 ~ "High", overcrowding >= 5 ~ "Critical" ) ) %>% group_by(comuna = ncomuna, overcrowding_discrete) %>% count()
Now we can filter for any municipality of our interest, for example:
overcrowding %>% filter(comuna == "VITACURA") overcrowding %>% filter(comuna == "LA PINTANA")
# A tibble: 4 x 3 # Groups: comuna, overcrowding_discrete [4] comuna overcrowding_discrete n <fct> <chr> <int> 1 VITACURA Critical 9 2 VITACURA High 18 3 VITACURA Mean 174 4 VITACURA No Overcrowding 26752 # A tibble: 4 x 3 # Groups: comuna, overcrowding_discrete [4] comuna overcrowding_discrete n <fct> <chr> <int> 1 LA PINTANA Critical 497 2 LA PINTANA High 1112 3 LA PINTANA Mean 4522 4 LA PINTANA No Overcrowding 39163
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.