knitr::opts_chunk$set(echo = TRUE)
knitr::include_graphics("fig/privacy.jpg")
sdcSpatial
has methods for:sdc_raster
for pop density, value density and mean
density, using the excellent raster
[@Hijmans2019].plot_sensitive
, is_sensitive
.protect_smooth
, protect_quadtree
.Statistics Netherlands is producer main official statistics in the Netherlands:
sdcSpatial
sdcSpatial
?SDC = "Statistical Disclosure Control"
sdcSpatial
works upon locations.data(dwellings, package="sdcSpatial") nrow(dwellings) head(dwellings) # consumption/unemployed are simulated!
sdc_raster
\scriptsize
library(sdcSpatial) unemployed <- sdc_raster( dwellings[c("x", "y")] # realistic locations , dwellings$unemployed # simulated data! , r = 500 # raster resolution of 500m , min_count = 10 # min support )
\scriptsize
print(unemployed)
42% of the data on this map is sensitive!
Binary score (logical
) per raster cell indicating
if it's unsafe to publish.
a) Per location $(x_i,y_i)$ (raster cell)
b) Using risk function disclosure_risk
$r(x,y) \in [0,1]$.
How accurate can an attacker estimate the value of an individual?
If $r(x_i,y_i) >$ max_risk
then $(x_i,y_i)$ is sensitive.
c) Using a minimum number of observations.
If $\textsf{count}_i <$ min_count
, then $(x_i,y_i)$ is sensitive.
numeric
)$$ r(x,y) = \max \frac{v_i}{\sum_{i \in (x,y)} v_i} \textrm{with } v_i \in \mathbb{R} $$
logical
)$$ r(x,y) = \frac{1}{n} \sum_{i \in (x,y)} v_i \textrm{ with } v_i \in {0,1} $$
(Stored in unemployed$value
):
Density can be area-based:
$count
): population density.$sum
): number of unemployed per square.Or density can population-based:
$mean
): unemployment rate per square.Note: All density types are valid, but (total) value per square
strongly interacts with population density.
(e.g. https://xkcd.com/1138).
sdc_raster
plot(unemployed, "mean")
a) Use a coarser raster: sdc_raster
.
b) Apply spatial smoothing: protect_smooth
[@WolfJonge2018; @JongeWolf2016].
c) Aggregate sensitive cells hierarchically with a quad tree until not
sensitive: protect_quadtree
[@Sune2017].
d) Remove sensitive locations: remove_sensitive
.
\scriptsize
unemployed_1km <- sdc_raster( dwellings[c("x", "y")] , dwellings$unemployed, r =1e3) # 1km! plot(unemployed_1km, "mean")
protect_smooth
\scriptsize
unemployed_smoothed <- protect_smooth(unemployed, bw = 1500) plot(unemployed_smoothed, "mean")
protect_smooth
bw
)protect_quadtree
\scriptsize
unemployed_100m <- sdc_raster( dwellings[c("x","y")], dwellings$unemployed , r = 100) # use a finer raster unemployed_qt <- protect_quadtree(unemployed_100m) plot(unemployed_qt)
protect_quadtree
In 5 lines we create a visual attractive map that is safe:
\scriptsize
unemployed <- sdc_raster(dwellings[c("x","y")], dwellings$unemployed, r=500) unemployed_smoothed <- protect_smooth(unemployed, bw = 1500) unemployed_safe <- remove_sensitive(unemployed_smoothed) mean_unemployed <- mean(unemployed_safe) raster::filledContour(mean_unemployed, main="Unemployment rate")
install.packages("sdcSpatial")
https://github.com/edwindj/sdcSpatial/issues
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.