remSmall: Function that will move values from grid cells with small...

View source: R/remSmall.R

remSmallR Documentation

Function that will move values from grid cells with small values to the ones with larger values for disclosure control reasons

Description

Two main confidentiality rules are considered: - Threshold rule (suppression due to a minimum number of counts) - Dominance rule (suppression due to dominance by one or more units)

Usage

remSmall(
  gdl,
  ress,
  ires0,
  mincount = 10,
  ifg,
  var,
  weight,
  nlarge = 2,
  plim = 0.85,
  sampleRandom = TRUE,
  domEstat = TRUE,
  verbose = FALSE,
  nclus = 1,
  clusType,
  outfile = NULL,
  checkDominance = TRUE,
  checkReliability = TRUE
)

Arguments

gdl

A list of gridded data with different resolutions (from a call to gridData

ress

A vector with the different resolutions

ires0

Which resolution level to use as base for the downscaling

mincount

The minimum number of farms for a grid cell (threshold rule)

ifg

Either a data.frame or tibble or sf-object with the locations and the data of the survey or census data, or a list of such objects.

var

Variable of interest that should be aggregated (necessary when ifg is used for individual farm specific confidence rules)

weight

Extrapolation factor (weight) wi of unit i in the sample of units nc falling into a specific cell c. Weights are used for disclosure control measures.

nlarge

Parameter to be used if the nlarge(st) farms should count for maximum plim percent of the total value for the variable in the grid cell (see details of gridData)

plim

See nlarge

sampleRandom

Logical; if the value is TRUE, values from grid cells with values under the limit will be moved to a random neighbour if there are more neighbours above the limit. False will always pick the largest (and the first one in the list if they are equal)

domEstat

Should the dominance rule be applied as in the IFS handbook (TRUE), where the weights are rounded before finding the first nlarge contributors, or should it be the first nlarge contributors*weight, where also fractions are considered (FALSE)?

verbose

Indicates if some extra output should be printed. Usually TRUE/FALSE, but can also have a value of 2 for multiResGrid for even more output.

nclus

Number of clusters to use for parallel processing. No parallelization is used for nclus = 1.

clusType

The type of cluster; see makeCluster for more details. The default of makeCluster is used if type is missing or NA

outfile

File to direct the output in case of parallel processing, see makeCluster for more details.

checkDominance

Logical - should the dominance rule be applied?

checkReliability

Logical - should the prediction variance be checked, and used for the aggregation? This considerably increases computation time

Details

This function uses the hierarchy of gridded data to associate values from grid cells that need to be anonymized to the grid cell with the highest values, within increasingly larger sub-grids.

The parameters nlarge and plim are used for setting value dependent confidentiality rules. If the rule is that the largest two holdings in a grid cell should not count for more than 85 of the total value (UAA, number of livestock, ...), then nlarge = 2 and plim = 0.85

The function will create set the value to NA for the grid cells where the content has been moved to a neighbouring grid cells.

Value

A gridded data set, where each grid cell respects the confidentiality rules.

Examples


library(sf)
library(sf)
if (!require(ggplot2)) print("Plotting of results will not work without installation of ggplot2")
if (!require(viridis)) print("Some of the plots will not work without installation of ggplot2")
if (!require(patchwork)) print("Some of the plots will not work without installation of patchwork")

if (require(giscoR)) {
  useBorder = TRUE 
} else {
  useBorder = FALSE
  print("You need to install giscoR for plotting borders and clipping the gridded maps")
}
# These are SYNTHETIC agricultural FSS data 
data(ifs_dk) # Census data
ifs_weight = ifs_dk %>% dplyr::filter(Sample == 1) # Extract weighted subsample

# Create spatial data
ifg = fssgeo(ifs_dk, locAdj = "LL")
fsg = fssgeo(ifs_weight, locAdj = "LL")

if (useBorder) {
# Read country borders, only used for plotting
  borders = gisco_get_nuts(nuts_level = 0)
  dkb = borders[borders$CNTR_CODE == "DK",] %>% st_transform(crs = 3035)
}

# Set the base resolutions, and create a hierarchical list with gridded data
ress = c(1,5,10,20,40,80, 160, 320, 640, 1280, 2560)*1000
# Create the grid with UAA as variable and EXT_CORE as weight
# These can be dropped if only the number of farms are of interest in the analyses
ifl = gridData(ifg, "UAA", weight = "EXT_CORE", res = ress)

# Run the procedure for the third resolution level (10 km), only using number of holdings
# as confidentiality rule
# himg1 and himg2 should give the same result, but only when sampleRandom = FALSE
himg1 <- remSmall(ifl, ress, 3, sampleRandom = FALSE)
plot(himg1[, "count"])
himg12 <- remSmall(ifl, ress, 3, sampleRandom = FALSE, nclus = 2)
# Run the procedure for UAA, using the defaults for variable
# confidentiality rule (nlarge = 2 and plim = 0.85)

himg2 <- remSmall(ifl, ress, weight = "EXT_CORE", ires0 = 3, var = "UAA", ifg = ifg)
plot(himg2[, "count"])
plot(himg2[, "UAA"])

# Run the procedure for organic UAA, but still requiring 10 holdings of any kind per grid cell
# Using resolution level 5 (40 km)
 iflOuaaAll = gridData(ifg, "UAAXK0000_ORG", res = ress)
himg3 = remSmall(iflOuaaAll, ress, 5, ifg = ifg, var = "UAAXK0000_ORG")
plot(himg3[, "count"])
plot(himg3[, "UAAXK0000_ORG"])

# Run the procedure for organic UAA, but require at least 10 organic holdings per grid cell
# Using resolution level 5 (40 km)
ifgOuaa = ifg[ifg$UAAXK0000_ORG > 0, ]
iflOuaa = list()
iflOuaa = gridData(ifgOuaa, "UAAXK0000_ORG",  res = ress)
himg4 = remSmall(iflOuaa, ress, 5, ifg = ifg, var = "UAAXK0000_ORG")
plot(himg4[, "count"])
plot(himg4[, "UAAXK0000_ORG"])

himg4l = list()
# Run the proceduure for organic UAA for different resolution levels
for (ipl in 1:6) himg4l[[ipl]] = remSmall(iflOuaa, ress, ipl, ifg = ifg, var = "UAAXK0000_ORG")


 # Create proper plots
  breaks = c(1,3,10,30,100)
  labels = breaks
  p1 = ggplot() + geom_sf(data = himg1, aes(fill = count, color = count)) +
  scale_fill_viridis( name = "number of \nholdings", trans = "log10", 
                  breaks = breaks, labels = labels, limits = c(1,100)) +
  scale_color_viridis( name = "number of \nholdings", trans = "log10", 
                 breaks = breaks, labels = labels, limits = c(1,100)) +
  geom_sf(data = dkb, fill = NA, colour='black', lwd = 1) +
  coord_sf(crs = 3035) +#, xlim = c(2377294, 6400000), ylim = c(1313597, 5628510)) +
  ggtitle("Number of holdings after swapping") +
  theme_bw()

# For comparison the number of organic farms and organic UAA, without taking any
# confidentiality into account
gcompOfarms = ggplot() + geom_sf(data = ifl[[3]], aes(fill = count, color = count)) +
 scale_fill_viridis( name = "number of \nholdings", trans = "log10", 
                         breaks = breaks, labels = labels, limits = c(1,100)) +
 scale_color_viridis( name = "number of \nholdings", trans = "log10", 
                         breaks = breaks, labels = labels, limits = c(1,100)) +
 geom_sf(data = dkb, fill = NA, colour='black', lwd = 1) +
 coord_sf(crs = 3035) +
 ggtitle("Number of holdings - ordinary gridded data") +
 theme_bw()

  gcompOfarms + p1 + plot_layout(guides = "collect") 

  p2 = ggplot() + geom_sf(data = himg2, aes(fill = count, color = count)) +
  scale_fill_viridis( name = "number of \nholdings", trans = "log10") +
  scale_color_viridis( name = "number of \nholdings", trans = "log10") +
  geom_sf(data = dkb, fill = NA, colour='black', lwd = 1) +
  coord_sf(crs = 3035) +#, xlim = c(2377294, 6400000), ylim = c(1313597, 5628510)) +
  ggtitle("Number of farms - corrected for farm size") +
  theme_bw()

  p3 = ggplot() + geom_sf(data = himg2, aes(fill = UAA, color = UAA)) +
  scale_fill_viridis( name = "UAA", trans = "log10") +
  scale_color_viridis( name = "UAA", trans = "log10") +
  geom_sf(data = dkb, fill = NA, colour='black', lwd = 1) +
  coord_sf(crs = 3035) +#, xlim = c(2377294, 6400000), ylim = c(1313597, 5628510)) +
  ggtitle("UAA - corrected for farm size") +
  theme_bw()

  p4 = ggplot() + geom_sf(data = himg3, aes(fill = count, color = count)) +
  scale_fill_viridis( name = "number of \nholdings", trans = "log10") +
  scale_color_viridis( name = "number of \nholdings", trans = "log10") +
  geom_sf(data = dkb, fill = NA, colour='black', lwd = 1) +
  coord_sf(crs = 3035) +#, xlim = c(2377294, 6400000), ylim = c(1313597, 5628510)) +
  ggtitle("Number of farms - based on number of organic farms and organic farm size") +
  theme_bw()

  p5 = ggplot() + geom_sf(data = himg3, aes(fill = UAAXK0000_ORG, color = UAAXK0000_ORG)) +
  scale_fill_viridis( name = "UAA organic", trans = "log10") +
  scale_color_viridis( name = "UAA organic", trans = "log10") +
  geom_sf(data = dkb, fill = NA, colour='black', lwd = 1) +
  coord_sf(crs = 3035) +#, xlim = c(2377294, 6400000), ylim = c(1313597, 5628510)) +
  ggtitle("UAA organic - based on organic farm numbers and size") +
  theme_bw()

  p6 = ggplot() + geom_sf(data = himg4, aes(fill = count, color = count)) +
  scale_fill_viridis( name = "number of \nholdings", trans = "log10") +
  scale_color_viridis( name = "number of \nholdings", trans = "log10") +
  geom_sf(data = dkb, fill = NA, colour='black', lwd = 1) +
  coord_sf(crs = 3035) +#, xlim = c(2377294, 6400000), ylim = c(1313597, 5628510)) +
  ggtitle("Number of organic farms - based on organic farm numbers and size") +
  theme_bw()

  uaalims = c(min(c(himg4$UAAXK0000_ORG, iflOuaa[[5]]$UAAXK0000_ORG), na.rm = TRUE),
              max(c(himg4$UAAXK0000_ORG, iflOuaa[[5]]$UAAXK0000_ORG), na.rm = TRUE))
  p7 = ggplot() + geom_sf(data = himg4, aes(fill = UAAXK0000_ORG, color = UAAXK0000_ORG)) +
  scale_fill_viridis( name = "UAA organic", trans = "log10", limits = uaalims) +
  scale_color_viridis( name = "UAA organic", trans = "log10", limits = uaalims) +
  geom_sf(data = dkb, fill = NA, colour='black', lwd = 1) +
  coord_sf(crs = 3035) +#, xlim = c(2377294, 6400000), ylim = c(1313597, 5628510)) +
  ggtitle("UAA organic after swapping  ") +
  theme_bw()

# For comparison the number of organic farms and organic UAA, without taking any
# confidentiality into account

gcompOUAA = ggplot() + geom_sf(data = iflOuaa[[5]], 
                   aes(fill = UAAXK0000_ORG, color = UAAXK0000_ORG)) +
 scale_fill_viridis( name = "UAA organic", trans = "log10", limits = uaalims) +
 scale_color_viridis( name = "UAA organic", trans = "log10", limits = uaalims) +
 geom_sf(data = dkb, fill = NA, colour='black', lwd = 1) +
 coord_sf(crs = 3035) +
 ggtitle("Organic UAA - ordinary gridded data") +
 theme_bw()

  print(gcompOUAA) + p7 +  plot_layout(guides = "collect") 
  
  ppl = list()
  counts = do.call("rbind", himg4l[1:5])$count
  clim = c(min(counts, na.rm = TRUE), max(counts, na.rm = TRUE)) 
  for (ipl in 1:length(himg4l)) {
    ppl[[ipl]] = ggplot() + geom_sf(data = himg4l[[ipl]], aes(fill = count, color = count)) +
  scale_fill_viridis( name = "number of \nholdings", trans = "log10", limits = clim) +
  scale_color_viridis( name = "number of \nholdings", trans = "log10", limits = clim) +
  geom_sf(data = dkb, fill = NA, colour='black', lwd = 1) +
  coord_sf(crs = 3035) +#, xlim = c(2377294, 6400000), ylim = c(1313597, 5628510)) +
  ggtitle(paste("Base resolution", ress[ipl]/1000, "km")) +
  theme_bw()
  }
  ppl[[1]] + ppl[[2]] + ppl[[3]] + ppl[[4]] + plot_layout(guides = "collect")
  
MRGcluster(action = "stop")


MRG documentation built on Oct. 28, 2024, 5:07 p.m.

Related to remSmall in MRG...