convert_units: Convert units of measure in a fact dataset
In ptaconet/rtunaatlas: IRD Sardara Tuna Atlas R facilities

convert_units

R Documentation

Convert units of measure in a fact dataset

Description

This function converts the units of measure in a fact dataset using a dataset of factors of conversions between units. The data frames of fact and of factors of conversions must be properly structured. For structures of data frames, see details and here: http://.

Usage

convert_units(con,df_input, df_conversion_factor, codelist_geoidentifiers_df_input, codelist_geoidentifiers_conversion_factors)

Arguments

`con`	a wrapper of rpostgresql connection (connection to a database) where the geospatial codelists are stored.
`df_input`	a data.frame of fact
`df_conversion_factor`	a data.frame of factors of conversion between units
`codelist_geoidentifiers_df_input`	string. The name of the coding system used for the spatial dimension in df_input (i.e. table name in Sardara database).
`codelist_geoidentifiers_conversion_factors`	string. The name of the coding system used for the spatial dimension in df_conversion_factor (i.e. table name in Sardara database), or NULL if the coding system for the spatial dimension is the same as the one used in df_input. See section "details" for more details.

Details

The data frame of factors of conversion between units (df_conversion_factor) must have the following structure: - several columns of dimension, that stratify the factor of conversion (e.g. species, gear, etc.), - one column named 'unit', that provides the code of the source unit of the conversion factor, - one column named 'unit_target', that provides the code of the target unit of the conversion factor, - one column named 'conversion_factor', that provides the numerical factor of conversion to convert the measure from the unit stated in the column 'unit' to the unit stated in the column 'unit_target'. See an example of dataset of factors of conversion here: https://goo.gl/KriwxV.

Example: These are the first rows of a dataset of factors of conversions:

source_authority	species	gear	geographic_identifier	time_start	time_end	unit	unit_target	conversion_factor
IOTC	YFT	LL	1	1952-01-01	1953-01-01	NO	MT	0.048060001
IOTC	YFT	LL	2	1952-01-01	1953-01-01	NO	MT	0.048680000
IOTC	YFT	LL	3	1952-01-01	1953-01-01	NO	MT	0.058639999
IOTC	BET	LL	0	1952-01-01	1953-01-01	NO	MT	0.044340000

The first row means that for the combination of dimensions: source_authority=IOTC, species=YFT, gear=LL, geographical_identifier=1, starting date of validity of the factor of conversion (time_start)=1952-01-01, ending date of validity of the factor of conversion (time_start)=1953-01-01, the factor of conversion to convert a measure from unit=NO to target_unit=MT is equal to 0.048060001

The codes used in the dimensions of the dataset of factors of conversion (df_conversion_factor) must be the same as the ones used in the dataset of fact with units to convert (df_input), except for the spatial dimension (geographic_identifier) - see here-after for more details. The only mandatory columns of the dataset of factors of conversion are "unit", "unit_target" and "conversion_factor". All the other columns are here to stratify the factors of conversion (by species, gear, time, space, etc.). Particularly, the columns "time_start", "time_end" and "geographic_identifier" allow to stratify spatialy and temporarily the factors of conversion.

The columns "time_start" and "time_end" provide respectively the starting date and the ending date of validity of the factor of conversion.

The column "geographic_identifier" provides the spatial stratification of the factor of conversion. If the coding system for spatial stratification used in df_conversion_factor is the same as the one used in df_input, then the parameter codelist_geoidentifiers_conversion_factors must be set to NULL. Else, the spatial coding system used in df_conversion_factor must be stored in the Sardara database, and the parameter codelist_geoidentifiers_conversion_factors must be set to the name of the spatial coding system (table) in Sardara DB.

If df_conversion_factor mixes factors of conversion that have and do not have a spatial stratification, the rows that do not have spatial stratification must be set to geographic_identifier= 0.

Columns of time (time_start and time_end) must be of type character (not Posix) and they must have the same resolution (e.g. day, second. etc).

Value

a list with two objectsss:

"df": The input data.frame of fact, where the measures and related units have been converted when factors of conversion were available. Some data might not be converted at all because no conversion factor exists for the stratum: these data are kept in their source unit (i.e. they are not removed from the dataset).
"stats": A data.frame with some information regarding the conversion. ####It provides, for each unit of measure available in the input dataset, the sum and percentage of the data that could not be map because no correspondance are available in the dataset of mappings between code lists

Author(s)

Paul Taconet, paul.taconet@ird.fr

Examples


# Connect to Tuna atlas database
con<-db_connection_tunaatlas_world()

# Retrieve IOTC georeferenced catch data from 2017
df_input<-iotc_catch_level0(2017)

# some curation before use of the functions
df_input$time_start<-substr(as.character(df_input$time_start), 1, 10)
df_input$time_end<-substr(as.character(df_input$time_end), 1, 10)

# Open a dataset of factors of conversion (the one used to convert units of catch in the IRD Tuna Atlas)
conversion_factors_dataset="https://goo.gl/KriwxV"
df_conversion_factor=read.csv(conversion_factors_dataset,stringsAsFactors = F,colClasses="character")
head(df_conversion_factor)


# Convert units MTNO to MT and remove NOMT (we do not keep the data that were expressed in number with corresponding value in weight)
df_input$unit[which(df_input$unit == "MTNO")]<-"MT"
df_input<-df_input[!(df_input$unit=="NOMT"),]

# Convert units from numbers to weight using the dataset of factors of conversion. 
# The spatial coding system used in conversion_factor (column geographic_identifier) is not the same as the one used in df_input. Hence, we set in the parameter codelist_geoidentifiers_conversion_factors the name of the spatial coding system used in df_conversion factor ("areas_conversion_factors_numtoweigth_ird").
df_converted<-convert_units(con = con, df_input = df_input, df_conversion_factor = df_conversion_factor, codelist_geoidentifiers_df_input ="areas_tuna_rfmos_task2" ,codelist_geoidentifiers_conversion_factors = "areas_conversion_factors_numtoweigth_ird",)

# Get the dataframe with units converted: data that were expressed in number are converted to metric tons. Some data might not be converted at all because no conversion factor exists for the stratum: these data are kept in their original unit (in this case, number).
df_converted_df<-df_converted$df
head(df_converted_df)

# Get information regarding the conversion (data converted, data not converted because no factor of conversion existed, etc.)
df_converted$stats

dbDisconnect(con)

ptaconet/rtunaatlas documentation built on June 23, 2024, 9:35 p.m.