assess_privacy: assess_privacy
In Maritimes/Mar.utils: A suite of functions used by a variety of Maritimes packages

assess_privacy

R Documentation

assess_privacy

Description

At this time, data with privacy considerations must be aggregated such that each polygon has a minimum of 5 unique values for sensitive fields like Licenses, License Holders, and Vessels. This function takes a dataframe and shapefile and for each polygon in the shapefile calculates 1) aggregate values for a number of (user-specified) fields , and 2) how many unique values exist in each polygon for each of a number of sensitive fields.

Usage

assess_privacy(
  df = NULL,
  grid.shape = "hex",
  lat.field = "LATITUDE",
  lon.field = "LONGITUDE",
  rule.of = 5,
  agg.fields = "KEPT_WT",
  calculate = c("MEAN", "COUNT", "SUM"),
  sens.fields = NULL,
  facet.field = NULL,
  key.fields = NULL,
  for.public = TRUE,
  create.spatial = TRUE,
  create.centroid.csv = FALSE,
  file.id = NULL,
  agg.poly.shp = NULL,
  agg.poly.field = NULL,
  custom.grid = NULL
)

Arguments

`df`	a dataframe to be analyzed. If left `NULL`, a value for `db` should be provided
`grid.shape`	default is `"hex"`. This identifies the shape of the you want to aggregate your data into. The options are "hex" or "square"
`lat.field`	the default is `"LATITUDE"`. This is the name of the field in `df` holding latitude values (in decimal degrees)
`lon.field`	the default is `"LONGITUDE"`. This is the name of the field in `df` holding longitudevalues (in decimal degrees)
`rule.of`	default is `5` Whether or not data can be shown (even aggregated) depends on the presence of a threshold number of unique values for certain sensitive fields. This parameter sets that threshold.
`agg.fields`	the default is `"KEPT_WT"`. These are the fields in the data that contain the values you want to aggregate (e.g. calculate the mean, sum or count of. This field needs to be numeric.
`calculate`	the default is `c("MEAN", "COUNT", "SUM")`. These are the analytics which should be performed for every field identified in `agg.field`. For example, if KEPT_WT and DISCARD_WT are both identified in `agg.field`, then for every resultant aggregated polygon (e.g. hexagon), the mean, count and sum of both of these fields is calculated for every polygon.
`sens.fields`	the defaults are `NULL` These are fields to which the "rule of 5" should be applied. The Treasury Secretariat states that when data is shown to the public, certain fields must have at least 5 unique values for these fields aggregated together. When run, this function will look at these fields, and calculate how many unique values exist for each. It will then populate a field 'TOTUNIQUE' with the minimum number of unique values of all the assessed fields. If this is 5 or more, a field called 'CAN_SHOW' will be marked as 'YES' (otherwise it will be 'NO').
`facet.field`	default is `NULL`. In cases like bycatch data, you may have a dataframe where each row might represent different species. You probably want a breakdown of each individual species, rather than summing them all up to get some generic weight of all species combined. This is the field that will be used to aggregate data by common values (like Species_Code) .
`key.fields`	default is `NULL`. This is a vector of fields that are required to uniquely identify each fishing set. If a `facet.field` is provided, the `facet.field`, `key.fields` and `agg.fields` are all pulled off of the original data and then merged back onto it. The key.fields are instrumental in ensuring that the data is able to get rejoined back to the original sets.
`for.public`	default is `TRUE`. While calculating the aggregated values within each 2min cell, this script first establishes whether or not cells within an area have enough unique values of sensitive fields to be allowed to show any data at all. If this parameter is `TRUE`, the calculated valued value for areas that cannot be shown will be wiped prior to generating the output files.
`create.spatial`	default is `TRUE`. This indicates whether or not to create a gpkg file containing spatial files for 1) the polygon file (with aggregated values for each polygon and an indication of whether or not each polygon meets the privacy constraints), and 2) the 2 min gridded data (only for within those polygons that meet the privacy constraints).
`create.centroid.csv`	default is `FALSE`. This indicates whether or not a csv should be created for the 2 min gridded data (only for within those polygons that meet the privacy constraints). This is a more portable option than the gpkg file created by the `create.spatial` parameter, and is usable without a GIS. If this is `TRUE` AND `create.spatial` is `TRUE`, then the centroid file will also be added to the generated gpkg file.
`file.id`	default is `NULL` Whatever is entered here will be used to name the output shapefiles and/or plots. If nothing is entered, the output files will just be named using timestamps.
`agg.poly.shp`	default is `NULL`. This is the shapefile that has polygons that should be checked for sufficient unique values of the sens.fields. If NULL, NAFO zones will be used. Otherwise, a path to any polygon shapefile can be provided.
`agg.poly.field`	default is `NULL`. This identifies the field within the shapefile provided to agg.poly.shp that should be used to check for sufficient unique values of the sens.fields.
`custom.grid`	default is `NULL`. If there is a need to use a custom grid to apply to the data,

Value

a list containing an sf grid layer, an sf overlay later, and if create.spatial==T, a gpkg spatial file containing these same objects. Additionally, if create.centroid.csv =T, it can also produce a csv of the centroids of the grid layer (which willl also be loaded into the gpkg file).

Note

If sensitive fields have names that are different than what is provided in the sen.fields, they will not be detected, or included in the checks. Please make very sure you correctly identify such fields.

It should be also noted that this function can result in spatial files with 100s of columns relatively easily when a facet.field is provided (e.g. for bycatch species). For example, if all 3 default calculate fields are requested on 3 different agg.fields, and there are 30 unique values in the facet.field, this will result in (3*3*30 =) 270 fields plus 3 or 4 additional housekeeping fields.

Author(s)

Mike McMahon, Mike.McMahon@dfo-mpo.gc.ca

Maritimes/Mar.utils
A suite of functions used by a variety of Maritimes packages

assess_privacy: assess_privacy
In Maritimes/Mar.utils: A suite of functions used by a variety of Maritimes packages

assess_privacy

Description

Usage

Arguments

Value

Note

Author(s)

See Also

Related to assess_privacy in Maritimes/Mar.utils...

R Package Documentation

Browse R Packages

We want your feedback!

Maritimes/Mar.utils A suite of functions used by a variety of Maritimes packages

assess_privacy: assess_privacy In Maritimes/Mar.utils: A suite of functions used by a variety of Maritimes packages

assess_privacy

Description

Usage

Arguments

Value

Note

Author(s)

See Also

Related to assess_privacy in Maritimes/Mar.utils...

R Package Documentation

Browse R Packages

We want your feedback!

Maritimes/Mar.utils
A suite of functions used by a variety of Maritimes packages

assess_privacy: assess_privacy
In Maritimes/Mar.utils: A suite of functions used by a variety of Maritimes packages