bdc_standardize_datasets: Standardize datasets columns based on metadata

View source: R/bdc_standardize_datasets.R

bdc_standardize_datasetsR Documentation

Standardize datasets columns based on metadata

Description

This function's main goal is to merge and standardize different datasets into a new dataset with column names following the Darwin Core terminology. All the process is based on a metadata file provided by the user.

Usage

bdc_standardize_datasets(
  metadata,
  format = "csv",
  overwrite = FALSE,
  save_database = FALSE
)

Arguments

metadata

A data frame with metadata containing information about the name, path, and columns of the original data set which need to be renamed. See @details.

format

a character setting the output file type. Option available are "csv" and "qs" (recommenced to save large datasets). Default == "csv".

overwrite

A logical vector indicating whether the final merged dataset should be overwritten. The default is FALSE.

save_database

logical. Should the standardized database be locally saved? Default = FALSE.

Details

bdc_standardize_datasets() facilitate the standardization of datasets with different column names by converting them into a new dataset following the Darwin Core terminology. The standardization process relies on a metadata file containing the name, path, and columns that need to be renamed. The metadata file can be constructed using built-in functions (e.g., data.frame()) or storing the information in a CSV file and importing it into R. Regardless of the method chosen, the data frame with metadata needs to contain the following column names (this is a list of required column names; for a comprehensive list of column names following Darwin Core terminology, see here

  • datasetName: A short name identifying the dataset (e.g., GBIF)

  • fileName: The relative path containing the name of the input dataset (e.g., Input_files/GBIF.csv)

  • scientificName: Name of the column in the original database presenting the taxon scientific names with or without authorship information, depending on the format of the source dataset (e.g., Myrcia acuminata)

  • decimalLatitude: Name of the column in the original database presenting the geographic latitude in decimal degrees (e.g., -6.370833)

  • decimalLongitude: Name of the column in the original database presenting the geographic longitude in decimal degrees (e.g., -3.25500)

Value

A merged data.frame with column names following Darwin Core terminology.

Examples

## Not run: 
metadata <- readr::read_csv(system.file("extdata/Config/DatabaseInfo.csv",
            package = "bdc"))

db_standardized <-
bdc_standardize_datasets(
  metadata = metadata,
  format = "csv",
  overwrite = TRUE,
  save_database = FALSE)

## End(Not run)

brunobrr/bdc documentation built on Nov. 21, 2024, 4:18 a.m.