knitr::opts_chunk$set(echo = TRUE)

sdcSpatial: Privacy protected maps

knitr::include_graphics("fig/privacy.jpg")

sdcSpatial: Privacy protected maps

Takeout message: sdcSpatial has methods for:

Who am I and why sdcSpatial?

Motivation for sdcSpatial

Sdc in sdcSpatial?

SDC = "Statistical Disclosure Control"

Collection of statistical methods to:

Data

data(dwellings, package="sdcSpatial")
nrow(dwellings)
head(dwellings) # consumption/unemployed are simulated!

Let's create a sdc_raster

Creation:

\scriptsize

library(sdcSpatial)
unemployed <- sdc_raster( dwellings[c("x", "y")] # realistic locations
                        , dwellings$unemployed # simulated data!
                        , r = 500 # raster resolution of 500m
                        , min_count = 10 # min support
                        )

What has been created?

\scriptsize

print(unemployed)

42% of the data on this map is sensitive!

What is sensitivity?

Binary score (logical) per raster cell indicating if it's unsafe to publish.

Calculated:

a) Per location $(x_i,y_i)$ (raster cell) b) Using risk function disclosure_risk $r(x,y) \in [0,1]$. How accurate can an attacker estimate the value of an individual?
If $r(x_i,y_i) >$ max_risk then $(x_i,y_i)$ is sensitive. c) Using a minimum number of observations.
If $\textsf{count}_i <$ min_count, then $(x_i,y_i)$ is sensitive.

Disclosure risks

External (numeric)

$$ r(x,y) = \max \frac{v_i}{\sum_{i \in (x,y)} v_i} \textrm{with } v_i \in \mathbb{R} $$

Discrete (logical)

$$ r(x,y) = \frac{1}{n} \sum_{i \in (x,y)} v_i \textrm{ with } v_i \in {0,1} $$

Type of raster density maps:

(Stored in unemployed$value):

Density can be area-based:

Or density can population-based:

Note: All density types are valid, but (total) value per square strongly interacts with population density.
(e.g. https://xkcd.com/1138)
.

Plotting a sdc_raster

plot(unemployed, "mean")

How to reduce sensitivity?

Options:

a) Use a coarser raster: sdc_raster. b) Apply spatial smoothing: protect_smooth [@WolfJonge2018; @JongeWolf2016]. c) Aggregate sensitive cells hierarchically with a quad tree until not sensitive: protect_quadtree [@Sune2017]. d) Remove sensitive locations: remove_sensitive.

Option: coarser raster

\scriptsize

unemployed_1km <- sdc_raster( dwellings[c("x", "y")]
                            , dwellings$unemployed, r =1e3) # 1km!
plot(unemployed_1km, "mean")

Option: Coarsening

Pros

Cons

Option: protect_smooth

\scriptsize

unemployed_smoothed <- protect_smooth(unemployed, bw = 1500)
plot(unemployed_smoothed, "mean")

Option: protect_smooth

Pro's

Con's

Option: protect_quadtree

\scriptsize

unemployed_100m <- sdc_raster( dwellings[c("x","y")], dwellings$unemployed
                             , r = 100) # use a finer raster
unemployed_qt <- protect_quadtree(unemployed_100m)
plot(unemployed_qt)

Option: protect_quadtree

Pro

Cons

Publish: visual interpolation

In 5 lines we create a visual attractive map that is safe:

\scriptsize

unemployed <- sdc_raster(dwellings[c("x","y")], dwellings$unemployed, r=500)
unemployed_smoothed <- protect_smooth(unemployed, bw = 1500)
unemployed_safe <- remove_sensitive(unemployed_smoothed)
mean_unemployed <- mean(unemployed_safe)
raster::filledContour(mean_unemployed, main="Unemployment rate")

The end

Thank you for your attention!

Questions?

Curious?

install.packages("sdcSpatial")

Feedback and suggestions?

https://github.com/edwindj/sdcSpatial/issues

References



edwindj/sdcSpatial documentation built on April 13, 2025, 1:57 a.m.