# Introducing sdcSpatial In sdcSpatial: Statistical Disclosure Control for Spatial Data

raster::plot(production$value[[1:3]])  The important question is: Can we publish this map or does it contain sensitive values? ## Sensitive locations Let us see how many of the values are sensitive: print(production)  Printing the production object shows that when we demand that a raster cell should at least have 10 observations (min_count) and its value should not be dominated by one enterprise (max_risk), then r round(100*sensitivity_score(production))% of the data in the map is sensitive! For a 500m by 500m block a threshold of 10 enterprises is on the high side, so let us change that into 5: production$min_count <- 5
production\$max_risk <- 0.9
# or equally
production <- sdc_raster(enterprises, "production"
, r = 500, min_count = 5, max_risk = 0.9)
sensitivity_score(production)


The score dropped, but which cells are we talking about?

plot(production)
sensitive_cells <- is_sensitive(production)


sensitive_cells is a raster which can be used for further inspection.

## Reducing sensitivity

Let us try to reduce the sensitivity of the map using a smoothing method:

production_smoothed <- protect_smooth(production, bw = 500)
plot(production_smoothed)


In this case smoothing reduced the number of sensitive locations drastically! Let us remove the remaining sensitive cells

production_safe <- remove_sensitive(production_smoothed)
sensitivity_score(production_safe) # check, double check


We can improve upon the "blocky" map by using raster::disaggregate. We can plot the following:

mean_production <- mean(production_safe)
mean_production <- raster::disaggregate(mean_production, 10, "bilinear")
# generated with R >= 3.6
# col <- hcl.colors(10, "YlOrRd", rev = TRUE)
col <- c("#FFFFC8", "#FEF1B2", "#FADC8A", "#F7C252", "#F5A400", "#F18000",
"#EB5500", "#D12D00", "#A90D00", "#7D0025")
raster::plot(mean_production, col=col)

# library(leaflet)
# leaflet() %>%
#   leaflet::addTiles() %>%
#   leaflet::addRasterImage(mean_production, colors = col, opacity = 0.5)


protect_quadtree is also a protecting method, which we demonstrate with the variable fined.

First we create a more fine grained (pun not intended) raster for the variable fined.

fined <- sdc_raster(enterprises, "fined", min_count = 5, r = 200, max_risk = 0.8)
print(fined)


Which is rather sensitive, let us have a look at the locations:

# col <- hcl.colors(10, rev=TRUE) # generated with R >= 3.6
col <- c("#FDE333", "#BBDD38", "#6CD05E", "#00BE7D", "#00A890"
, "#008E98",  "#007094", "#185086", "#422C70", "#4B0055")
plot(fined, "mean", col=col)


The quadtree method aggregates sensitive cells with its 3 neighbors and does this recursively: the result is as follows:

fined_qt <- protect_quadtree(fined)
plot(fined_qt, col=col)


which has a sensitivity score of r sensitivity_score(fined_qt).

The method has the advantage of locally selecting the necessary resolution to suppress sensitive values, while the protect_smooth method uses a fixed bandwidth.

The protection result is blocky in comparison with the smoothing method, but safer if you look at the sensitive cells in high fined areas.

fined_smooth <- protect_smooth(fined, bw = 500)
plot(fined_smooth, col = col)
sensitivity_score(fined_smooth)


## Thanks to raster

sdcSpatial builds heavily upon the excellent raster package: it creates raster maps and uses the machinery of raster to calculate sensitivity and to apply protection methods to raster maps.

## Try the sdcSpatial package in your browser

Any scripts or data that you put into this service are public.

sdcSpatial documentation built on July 20, 2019, 1:04 a.m.