check_database: Verify the integrity of a sediment unmixing database

View source: R/functions.R

check_databaseR Documentation

Verify the integrity of a sediment unmixing database

Description

This function automatically infers the type of sediment database ("raw", "averaged", or "isotopic") based on its column names and verifies its integrity. It validates column names and their order to ensure data is correctly structured for subsequent package functions.

To retain conservative tracers for subsequent analyses, it is recommended to perform a minimal dataset cleaning beforehand:

  • Replace BDL (below detection limit) entries with a small positive number.

  • Exclude tracers whose mixture value is BDL or zero.

  • Optionally, remove tracers with predominantly BDL values.

**Database 'raw' format:** This database contains individual measurements for scalar tracers. It must have the following columns in order:

  • ID: Unique identifier for each sample.

  • samples: A categorical column identifying each source and mixture. The unique value representing the mixture must appear last. In cases with multiple mixture samples, they must all share the same mixture name but will be distinguished by unique entries in the ID column.

  • tracer1, tracer2, ...: Columns for each tracer measurement.

**Database 'isotopic raw' format:** This database contains individual measurements for isotopic tracers, which require both ratio and content data. It must have the following columns in order:

  • ID: Unique identifier for each sample.

  • samples: A categorical column identifying each source and mixture. The unique value representing the mixture must appear last. In cases with multiple mixture samples, they must all share the same mixture name but will be distinguished by unique entries in the ID column.

  • ratio1, ratio2, ...: Columns with the isotopic ratio values for each tracer.

  • cont_ratio1, cont_ratio2, ...: Columns with the corresponding content (concentration) values for each tracer.

**Database 'averaged' format:** This database contains statistical summaries of the scalar tracer data. It must have the following columns in order:

  • ID: Unique identifier for each sample.

  • samples: A categorical column identifying each source and mixture. The unique value representing the mixture must appear last. In cases with multiple mixture samples, they must all share the same mixture name but will be distinguished by unique entries in the ID column.

  • mean_tracer1, mean_tracer2, ...: Columns with the mean value for each tracer.

  • sd_tracer1, sd_tracer2, ...: Columns with the standard deviation for each tracer.

  • n: The number of measurements used to calculate the mean and standard deviation.

**Database 'isotopic averaged' format:** This database contains statistical summaries for isotopic tracers. It must have the following columns in order:

  • ID: Unique identifier for each sample.

  • samples: A categorical column identifying each source and mixture. The unique value representing the mixture must appear last. In cases with multiple mixture samples, they must all share the same mixture name but will be distinguished by unique entries in the ID column.

  • mean_ratio1, mean_ratio2, ...: Columns with the mean isotopic ratio values.

  • mean_cont_ratio1, mean_cont_ratio2, ...: Columns with the mean isotopic content values.

  • sd_ratio1, sd_ratio2, ...: Columns with the standard deviation of the isotopic ratio values.

  • sd_cont_ratio1, sd_cont_ratio2, ...: Columns with the standard deviation of the isotopic content values.

  • n: The number of measurements.

Usage

check_database(data)

Arguments

data

A data frame to be checked.

Value

A logical value ('TRUE' if the database is valid, 'FALSE' otherwise). If the check fails, the function will also print a descriptive error message.


fingerPro documentation built on Aug. 27, 2025, 5:11 p.m.