raise_datasets_by_dimension: Create a dataset with most complete information using two...
In ptaconet/rtunaatlas: IRD Sardara Tuna Atlas R facilities

View source: R/raise_datasets_by_dimension.R

raise_datasets_by_dimension

R Documentation

Create a dataset with most complete information using two datasets with partial information

Description

This function creates a dataset with most complete information using two datasets with partial information , by crossing the informations available in both datasets. It takes as input two datasets, each with partial information, but the full information is available if we consider both datasets. Each dataset used as input misses one dimension, the missing dimension in the dataset n°1 being available in the dataset n°2 and reversely. The functions uses both datasets to create a new dataset with the full information (i.e. with values filled for both dimensions on the dataset). The data frames of fact must be properly structured. For structures of data frames, see details and here: http://.

Usage

raise_datasets_by_dimension(df1, df2, dimension_missing_df1, dimension_missing_df2)

Arguments

`df1`	a data.frame of fact with one dimension missing (i.e. for this dimension all values are set to UNK), the latter being available in df2.
`df2`	the same data.frame of fact with another dimension missing, the latter being available in df1.
`dimension_missing_df1`	string. The name of the dimension missing in df1.
`dimension_missing_df2`	string. The name of the dimension missing in df2.

Details

In this function, we make the hypothesis that ....? comment décrire ca???? -> cf mail Manu "Formes" Example:

df1 : is a dataset where the "flag" dimension is available but the "schooltype" dimension is missing (i.e. in this dataset, all the values of the column "schooltype" are set to UNK)
df2 : is a dataset where the "schooltype" dimension is available but the "flag" dimension is missing (i.e. in this dataset, all the values of the column "flag" are set to UNK)

The concept of raising the data is the following: in a given stratum S, a fishing country (flag) F has catched a percentage RF % of the total catches made in this stratum (this information is extracted from the dataset with flag detail, i.e. df1. RF is called the raising factor). In the same stratum S, there has been in total Y tons of catches realized on log school, made by all the fishing countries (this information extracted from the dataset with schooltype detail, i.e. df2). Raising the data means saying that the fishing country F has catched RF * T tons on log school in the stratum S. RF * T is the value raised.

In the output dataset, both flag and school type dimensions are available for each stratum.

We make the hypothesis that the proportion of catches by schooltype is equal for all the fishing countries.

Value

a list with one object:

"df": a data frame, where df1 and df2 have been crossed to get a dataset with both dimensions filled).

Author(s)

Paul Taconet, paul.taconet@ird.fr

Examples


# Connect to Tuna atlas database
con<-db_connection_tunaatlas_world()

# Retrieve some IATTC georeferenced times series of catch 

# IATTC dataset stratified by schooltype (and not flag)
dataset_iattc_ce_PSSetType<-extract_dataset(con,list_metadata_datasets(con,identifier="east_pacific_ocean_catch_1958_12_01_2016_01_01_1deg_1m_ps_tunaatlasIATTC_2017_level0__tuna_bySchool"))
head(dataset_iattc_ce_PSSetType)
unique(dataset_iattc_ce_PSSetType$flag) # Note that the column "flag" is all set with "UNK"
unique(dataset_iattc_ce_PSSetType$schooltype) # Note that the column "schooltype" is detailed

# Same IATTC datasets, but stratified by flag (and not schooltype)
dataset_iattc_ce_PSFlag<-extract_dataset(con,list_metadata_datasets(con,identifier="east_pacific_ocean_catch_1958_12_01_2016_01_01_1deg_1m_ps_tunaatlasIATTC_2017_level0__tuna_byFlag"))
head(dataset_iattc_ce_PSFlag)
unique(dataset_iattc_ce_PSFlag$schooltype) # Note that the column "schooltype" is all set with "UNK"
unique(dataset_iattc_ce_PSFlag$flag) # Note that the column "flag" is detailed

## Raise both datasets. In the output dataset, both flag and school type information are available for each stratum. 
 
dataset_iattc_flag_raised_to_schooltype<-raise_datasets_by_dimension(
df1=dataset_iattc_ce_PSFlag,
df2=dataset_iattc_ce_PSSetType,
dimension_missing_df1="schooltype",
dimension_missing_df2="flag")

head(dataset_iattc_flag_raised_to_schooltype$df)
unique(dataset_iattc_flag_raised_to_schooltype$df$schooltype)
unique(dataset_iattc_flag_raised_to_schooltype$df$flag) # Note that both columns "flag" and "schooltype" are detailed

dbDisconnect(con)

ptaconet/rtunaatlas documentation built on June 23, 2024, 9:35 p.m.