finalize_database: Finalize the database

Description Usage Arguments Details Value Specimens Stations

View source: R/3_finalize_database.R

Description

This function selects only relevant data and builds one large final data frame.

Usage

1
2
3
4
5
6
finalize_database(
  data_folder = "data",
  out_folder = "data",
  database_folder = NULL,
  as_CSV = TRUE
)

Arguments

data_folder

This is the folder that contains the additions database. Default is 'data', as created by the construct_database function.

out_folder

This is the folder where you want your finalized species and stations database, to be stored. Default is 'data'.

database_folder

This is the folder where you want your final database (combined stations and species data) to be stores. Default is NULL, which will store the database in your working directory.

Details

This workflow describes which data from species_additions and stations_additions are selected to form a final, clean, database that can be used in the Shiny app and in ecological analyses.

Value

This function does not return an object, but stores a finalized stations data frame in the out_folder, a finalized species data frame in the out_folder, and a final database with stations and species data combined in the database_folder.

Specimens

Specimen entries are removed if:

Specimen counts are scaled-up according to the reported fraction.
Only the original filename, the StationID, the scale count, valid name, taxonomy (phylum, class, order family, and genus), and whether or not the name was matched fuzzy against WoRMS is stored. The latter information (isFuzzy) may be important to track if there was perhaps a wrongly assigned new valid name if there are suspicious results.

Stations

A column with final latitude and longitude are created, by first taking all originally reported midpoints, and filling any empty data points with the calculated midpoint based on track start and stop coordinates.
A column with final water depth is created, by first taking the water depths as measured during the cruise, and filling any empty data points with the water depth from bathymetry.
A column with final track length is created, by first taking the reported track length from the cruise, and filling any empty data points with track length calculated from the odometer.
Sampled track area in meter squared and sampled track volume in cubic meters is calculated with the finalized track length and reported blade width and depth.
The final dataframe contains the original file names, vessel names, CruiseIDs, StationIDs, station names, sampling dates, start and stop times, blade depth and width, midpoint coordinates and the source of these coordinates (reported or measured), water depth and the source of the depth (reported or bathymetry), track lengths and the source of the length (reported or calculated), and the track area and volume.


dswdejonge/TripleD documentation built on June 18, 2020, 12:24 p.m.