map_codelist: Maps and replaces one dimension of a fact dataset using a...
In ptaconet/rtunaatlas: IRD Sardara Tuna Atlas R facilities

map_codelist

R Documentation

Maps and replaces one dimension of a fact dataset using a dataset of mappings between code lists

Description

This function maps one dimension (i.e. column with codes) of a fact dataset to a target code list using a dataset of mappings between code lists. In other words, it makes the correspondance between two code lists for a given dimension, and replaces for this dimension in the fact dataset the old codes by the new codes available in the dataset of mappings between code lists.

Usage

map_codelist(df_input, df_mapping, dimension_to_map,keep_src_code)

Arguments

`df_input`	a data.frame of fact
`df_mapping`	a data.frame of code list mapping
`dimension_to_map`	the name (string) of the dimension to map.
`keep_src_code`	boolean keep source coding system column? TRUE will conserve in the output dataset both source and target coding systems columns, FALSE will conserve only target coding system (i.e. mapped). Default is FALSE

Details

The data frames of fact and code list mapping must be properly structured. The data.frame of mapping must have the 2 following columns:

"src_code": The source codes for the dimension to map (i.e. the codes used in the df_input fot the considered dimension)
"trg_code": The target codes for the dimension to map

Some codes might not be mapped, because no correspondance exists between the source code(s) and the target code(s). In the output dataset of the function map_codelist, these unmapped codes are set to "UNK". If keep_src_code is set to FALSE, the source coding system column will be dropped and the target coding system column will be named out dimension_to_map. If keep_src_code is set to TRUE, the source coding system column will be kept. In that case, the source coding system column will conserve its original name (dimension_to_map), and the target coding system column will be named "dimension_to_map"_mapping (e.g. gear_mapping)

Value

a list with two objects:

"df": The input data.frame of fact, where the dimension_to_map has been mapped using the df_mapping
"stats": A data.frame with some information regarding the data not mapped. It provides, for each unit of measure available in the input dataset, the sum and percentage of the data that could not be map because no correspondance are available in the dataset of mappings between code lists

Author(s)

Paul Taconet, paul.taconet@ird.fr

Examples


# Connect to Tuna atlas database
con<-db_connection_tunaatlas_world()

  # Reads IOTC nominal catch dataset (2017 release)
  iotc_nominal_catch<-extract_dataset(con,list_metadata_datasets(con,identifier="indian_ocean_nominal_catch_1950_01_01_2015_01_01_tunaatlasIOTC_2017_level0"))
  head(iotc_nominal_catch)
  
  # Read a mapping between code lists (in this case, mapping between codes for fishing gears used by the tuna RFMOs and the International Standard Statistical Classification of Fishing Gear)
  df_mapping<-extract_dataset(con,list_metadata_datasets(con,identifier="codelist_mapping_gear_iotc_isscfg_revision_1")) 
  head(df_mapping)
 
  # Map code lists. Output is a list with two elements (see section "return"). Default conserves only target coding system in the output dataset. Set keep_src_code=TRUE to conserve both source and target coding systems in the output dataset.
  df_mapped<-map_codelist(iotc_nominal_catch,df_mapping,"gear",FALSE)
  
  # Get the dataframe mapped: dimension "gear" mapped to ISSCFG. The column "gear" has its values changed compared to the ones before the execution of the function. The codes have been mapped following the dimensions "gear" and "source_authority", since the dataset of mappings between code lists had both dimensions.
  df_mapped_df<-df_mapped$df
  head(df_mapped_df) 
  
  # Get information regarding the data that were not mapped.
  df_mapped$stats

ptaconet/rtunaatlas documentation built on June 23, 2024, 9:35 p.m.