assess_privacy: assess_privacy

View source: R/assess_privacy.r

assess_privacyR Documentation

assess_privacy

Description

At this time, data with privacy considerations must be aggregated such that each polygon has a minimum of 5 unique values for sensitive fields like Licenses, License Holders, and Vessels. This function takes a dataframe and shapefile and for each polygon in the shapefile calculates 1) aggregate values for a number of (user-specified) fields , and 2) how many unique values exist in each polygon for each of a number of sensitive fields.

Usage

assess_privacy(
  df = NULL,
  grid.shape = "hex",
  lat.field = "LATITUDE",
  lon.field = "LONGITUDE",
  rule.of = 5,
  agg.fields = "KEPT_WT",
  calculate = c("MEAN", "COUNT", "SUM"),
  sens.fields = NULL,
  facet.field = NULL,
  key.fields = NULL,
  for.public = TRUE,
  create.spatial = TRUE,
  create.centroid.csv = FALSE,
  file.id = NULL,
  agg.poly.shp = NULL,
  agg.poly.field = NULL,
  custom.grid = NULL
)

Arguments

df

a dataframe to be analyzed. If left NULL, a value for db should be provided

grid.shape

default is "hex". This identifies the shape of the you want to aggregate your data into. The options are "hex" or "square"

lat.field

the default is "LATITUDE". This is the name of the field in df holding latitude values (in decimal degrees)

lon.field

the default is "LONGITUDE". This is the name of the field in df holding longitudevalues (in decimal degrees)

rule.of

default is 5 Whether or not data can be shown (even aggregated) depends on the presence of a threshold number of unique values for certain sensitive fields. This parameter sets that threshold.

agg.fields

the default is "KEPT_WT". These are the fields in the data that contain the values you want to aggregate (e.g. calculate the mean, sum or count of. This field needs to be numeric.

calculate

the default is c("MEAN", "COUNT", "SUM"). These are the analytics which should be performed for every field identified in agg.field. For example, if KEPT_WT and DISCARD_WT are both identified in agg.field, then for every resultant aggregated polygon (e.g. hexagon), the mean, count and sum of both of these fields is calculated for every polygon.

sens.fields

the defaults are NULL These are fields to which the "rule of 5" should be applied. The Treasury Secretariat states that when data is shown to the public, certain fields must have at least 5 unique values for these fields aggregated together. When run, this function will look at these fields, and calculate how many unique values exist for each. It will then populate a field 'TOTUNIQUE' with the minimum number of unique values of all the assessed fields. If this is 5 or more, a field called 'CAN_SHOW' will be marked as 'YES' (otherwise it will be 'NO').

facet.field

default is NULL. In cases like bycatch data, you may have a dataframe where each row might represent different species. You probably want a breakdown of each individual species, rather than summing them all up to get some generic weight of all species combined. This is the field that will be used to aggregate data by common values (like Species_Code) .

key.fields

default is NULL. This is a vector of fields that are required to uniquely identify each fishing set. If a facet.field is provided, the facet.field, key.fields and agg.fields are all pulled off of the original data and then merged back onto it. The key.fields are instrumental in ensuring that the data is able to get rejoined back to the original sets.

for.public

default is TRUE. While calculating the aggregated values within each 2min cell, this script first establishes whether or not cells within an area have enough unique values of sensitive fields to be allowed to show any data at all. If this parameter is TRUE, the calculated valued value for areas that cannot be shown will be wiped prior to generating the output files.

create.spatial

default is TRUE. This indicates whether or not to create a gpkg file containing spatial files for 1) the polygon file (with aggregated values for each polygon and an indication of whether or not each polygon meets the privacy constraints), and 2) the 2 min gridded data (only for within those polygons that meet the privacy constraints).

create.centroid.csv

default is FALSE. This indicates whether or not a csv should be created for the 2 min gridded data (only for within those polygons that meet the privacy constraints). This is a more portable option than the gpkg file created by the create.spatial parameter, and is usable without a GIS. If this is TRUE AND create.spatial is TRUE, then the centroid file will also be added to the generated gpkg file.

file.id

default is NULL Whatever is entered here will be used to name the output shapefiles and/or plots. If nothing is entered, the output files will just be named using timestamps.

agg.poly.shp

default is NULL. This is the shapefile that has polygons that should be checked for sufficient unique values of the sens.fields. If NULL, NAFO zones will be used. Otherwise, a path to any polygon shapefile can be provided.

agg.poly.field

default is NULL. This identifies the field within the shapefile provided to agg.poly.shp that should be used to check for sufficient unique values of the sens.fields.

custom.grid

default is NULL. If there is a need to use a custom grid to apply to the data,

Value

a list containing an sf grid layer, an sf overlay later, and if create.spatial==T, a gpkg spatial file containing these same objects. Additionally, if create.centroid.csv =T, it can also produce a csv of the centroids of the grid layer (which willl also be loaded into the gpkg file).

Note

If sensitive fields have names that are different than what is provided in the sen.fields, they will not be detected, or included in the checks. Please make very sure you correctly identify such fields.

It should be also noted that this function can result in spatial files with 100s of columns relatively easily when a facet.field is provided (e.g. for bycatch species). For example, if all 3 default calculate fields are requested on 3 different agg.fields, and there are 30 unique values in the facet.field, this will result in (3*3*30 =) 270 fields plus 3 or 4 additional housekeeping fields.

Author(s)

Mike McMahon, Mike.McMahon@dfo-mpo.gc.ca

See Also

Other privacy: plot_hex_data()


Maritimes/Mar.utils documentation built on May 5, 2024, 9:44 p.m.