ratioDT: ratioDT

Description Usage Arguments Details Value Author(s) See Also

View source: R/Ratios.R

Description

The function calculates ratios of corresponding variables and corresponding rows between two data sets, DT1 and DT2. The result is a data set with the same dimensions as DT1. The variables can be specified by vars, without specification the subfunction select.VarsElements matches column names with element abbreviations. Which row of DT1 corresponds to which row in DT2 has to be specified by the variable(s) group1.vars (and optional group2.vars). If DT2 has different number of rows than DT1 a 'new DT2' with equal dimensions to DT1 is prepared by the function preparationDT2. At the moment there are three different options for calculating the ratios:

For more details please refer to preparationDT2 and section Details.

Usage

1
2
3
4
5
ratioDT(DT1, DT2, vars = NULL, group1.vars, group2.vars = NULL,
  ratio_type = "simple", vars.ref, id.vars, Errors = FALSE,
  Error_method = "gauss", var_subgroup = NULL, use_only_DT2 = FALSE,
  DT2_replace = NULL, STD_DT1, STD_DT2, minNr_DT1 = 50, minNr_DT2 = 50,
  return_all = FALSE, return_as_list = FALSE)

Arguments

DT1

data.frame or data.table, samples in rows and variables in columns

DT2

data.frame or data.table, samples in rows and variables in columns.

vars

optional, character vector of column names of DT1 and DT2, default is function select.VarsElements. Please make sure the columns given in vars are of class numeric.

group1.vars

character vector, column name(s) for subsetting DT1 and DT2

group2.vars

optional, column name for subsetting DT1 and DT2 if some entries in group1.vars are empty.

ratio_type

character vector of "simple", "log", "ar", "alr", "cr" and "clr". Please refer to details for explanations.

vars.ref

reference variable, one out of vars. Only for ratio_type "ar" or "alr".

id.vars

column with unique (!) entries for each row. Class can be integer (corresponding row numbers) or character (e.g. sample IDs). If missing, all columns but vars will be assigned to it. Please note: Function is faster and more stable if id.vars is provided.

Errors

logical, should absolute errors get calculated appended to the list - output? Default is FALSE. If Errors are set to TRUE it overrides the option return_as_list and always returns a list.

Error_method

method with which the error should be calculated. At the moment you can choose between "gauss" (default) and "biggest". See Details for explanation.

var_subgroup

optional, character vector of one column name of DT1. This option affects the only the error calculation, hence it is ignored if Errors is set to FALSE. If provided, DT1 is split into subsets by group1.vars and 'var_subgroup' and the error will calculated for each of these subset. Please read in the Details for further information.

use_only_DT2

logical, default is FALSE. If there are not enough DT2 data of the location should the DT2s of the region be used? If the use_only_DT2 is set to FALSE then the Upper Crust is used for the correction.

DT2_replace

mandatory if use_only_DT2 is set to FALSE, serves as substitute for DT2 where DT2 has no corresponding rows to DT1. A named vector or one-row data.table/ data.frame with the all vars present. A column for group1.vars is not necessary.

STD_DT1

optional, data.frame or data.table object for calculating errors for DT1, e.g. the standards. Please see Details. If left empty a default of 5.2% relative error is used.

STD_DT2

optional, data.frame or data.table object for calculating errors for DT2, e.g. the standards. Please see Details. If left empty a default of 5.2% relative error is used.

minNr_DT1

minimum numbers of samples/observations in DT1 for calculating a relative error of observations. If the number of observations of DT1 is smaller than minNr_DT1 the error is calculated via the data set STD_DT1. Default is 50.

minNr_DT2

minimum numbers of samples/observations in DT2 for calculating a relative error of observations. If the number of observations of DT1 is smaller than minNr_DT2 the error is calculated via the data set STD_DT2. Default is 50.

return_all

logical, should all used data sets be returned as a list? Default is FALSE. If set to TRUE the list contains DT1, DT2, vars, ratios, and optional additional ratios_error, DT1_error and DT2_error.

return_as_list

logical, should the result get returned as list? Default is FALSE. If set to FALSE and Errors is set to TRUE a column type_of_data is appended. This option is ignored if option 'return_all' is set to TRUE.

Details

To calculate the ratios the functions internally calls preparationDT2 to create a data set 'new DT2' from the variables vars of DT2, which has equal number of rows to DT1. Then the division is done by the now corresponding data sets by the method given in 'ratio_type'.

The method "simple" is a simple division between DT1 and DT2:

\frac{DT1[vars]}{DT2[vars]}

The method "log" is the logarithm of the simple ratio:

ln ≤ft( \frac{DT1[vars]}{DT2[vars]} \right)

The methods "ar" and "alr" normalize all ratios to one reference column: ar:

\frac{DT1[vars_{i}]}{DT2[vars_{i}]} * \frac{DT2[vars_n]}{DT1[vars_n]}_{i=1,…, n, …, D}

alr:

ln ≤ft(\frac{DT1[vars_{i}]}{DT2[vars_{i}]} * \frac{DT2[vars_n]}{DT1[vars_n]}\right)_{i=1,…, n, …, D}

The methods "cr" and "clr" normalize all ratios to the geometric mean of all columns included by vars: "cr" is calculated by:

\frac{DT1[vars_{i}]}{DT2[vars_{i}]} * \frac{g(x)^{DT2[vars]}}{g(x)^{DT1[vars]}}_{i=1,…, D}

whereof the function g(x) stands for:

g(x) = √[D]{DT[vars_1] \cdot DT[vars_2] \cdots DT[vars_D]}

and "clr" is calculated by:

ln ≤ft(\frac{DT1[vars_{i}]}{DT2[vars_{i}]} * \frac{g(x)^{DT2[vars]}}{g(x)^{DT1[vars]}}\right)_{i=1,…, D}

The methods "clr" and "alr" should be considered if the data contain so called compositional data as defined by Aitchison, J. (1986): "The statistical analysis of compositional data". They names correspond to the names used in the package compositions by K. Gerald van den Boogaart, Raimon Tolosana and Matevz Bren.

Calculating the absolute error for the ratios requires calculating the absolute errors of DT1 and DT2, too. For calculating the errors of DT1 and DT2 the function relError_dataset is used. Accordingly the options for STD_DT1 and STD_DT2 are passed to the option STD in relError_dataset. If STD_DT1 and/or STD_DT2 are left empty the default of 5.2% relative error is used. Also the options minNr_DT1 and minNr_DT2 are passed to the option minNr in relError_dataset.

The Error_method determines how the absolute error of the ratios is calculated. The error method "gauss" refers to the error propagation after Gauss:

Δ x = \frac{Δ DT1}{DT2} - DT1 * \frac{Δ DT2}{DT2^2}

The error method "biggest" refers to the maximum error after Gauss:

Δ x = \frac{Δ DT1}{DT2} + DT1 * \frac{Δ DT2}{DT2^2}

For example: If you have in DT1 plant samples with group1.vars = "Location" the error function would calculate the relative standard deviation for all plants of one location. But maybe you have very different plants in one location so setting var_subgroup = "Species" the error function will calculate the relative standard deviation for each plant species per location, if there are more species per location than given in minNr_DT1. Suppose DT2 are soil data with several samples per location. If group1.vars = "Location" than the function calls preparationDT2 and calculates a mean for each location from the data set. The ratio from plant to soil and the absolute errors of the ratios is then calculated for each plant sample to a mean of soils from one location.

Value

The function returns either a data.table, data.frame or a list controlled by the option return_as_list. If return_as_list to FALSE a data.frame (or data.table if DT1 is of class data.table) is returned. If option Errors is set to TRUE ratios and error are combined into one object and a column type_of_data is appended with the entries ratio and ratio_error respectively. If return_as_list to TRUE the DT1-DT2-ratios are named in the list as "ratios" and, if Errors is set to TRUE the absolute errors of the ratios are saved in the list as "ratios_error". If 'return_all' is set to TRUE a list with the following entries will be returned:

[[1]] "DT1", [[2]] "DT2", [[3]] "vars", [[4]] "ratios" and if Errors is set to TRUE additionally [[5]] "ratios_error", [[6]] "DT1_error", [[7]] "DT2_error".

Author(s)

Solveig Pospiech

See Also

Other ratio functions: Correction.AdheringParticles, preparationDT2, ratio_append_smallest


ratios documentation built on May 2, 2019, 3:29 p.m.