read_zooplankton_data: Read NPI zooplanton data from an Excel sheet

Description Usage Arguments Details Value Author(s) See Also

View source: R/read_zooplankton_data.R

Description

Reads IOPAN and NPI standard format zooplankton data

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
read_zooplankton_data(
  data_file,
  sheet = 1,
  dataStart = NULL,
  dataEnd = 1000,
  dataCols = NULL,
  output_format = "as.Date",
  control_species = c("species", "stage", "size_op", "length"),
  lookup_cols = "biomass_conv",
  species_info_cols = NULL,
  remove_missing = TRUE,
  control_stations = FALSE,
  add_coordinates = FALSE,
  control_sample_names = TRUE,
  round2ceiling = FALSE
)

Arguments

data_file

Path to the Excel file containing zooplankton data

sheet

The name or index of the sheet to read the zooplanton data from. See read.xlsx

dataStart

The row number where zooplanton data starts from. If NULL (default), the starting row number is guessed based on the first record of "Calanus finmarchicus".

dataEnd

The row number where zooplankton data ends. Larger than real row numbers in data are ignored. The default is 1000. Set to a higher value, if your dataset has more rows than that.

dataCols

Optional numeric index indicating the column numbers that contain zooplankton data. Not implemented yet.

output_format

Output formar for date. See convert_dates.

control_species

A character vector giving the names for species, stage, length operator and length columns from the Excel sheet. These names will be used as column names in the R output. The size operator (size_op) element is used to match old type zooplankton files (most of them) and can be ignored for the new standardized files.

lookup_cols

Character vector specifying the names of columns from the zooplankton lookup list (ZOOPL) that should be returned together with species_info_cols. If NULL (default), only species_info_cols will be returned. Has no effect, if control_species = FALSE.

species_info_cols

Character vector specifying the names of species information columns that should be preserved. Required only if control_species = NULL, otherwise ignored. Adds some flexibility if species names are messed up, but use of control_species list is recommended.

remove_missing

Logical indicating whether species with column sums of 0 should be removed from the output.

control_stations

Logical indicating whether station names should be controlled against a list of standardized station names (see STATIONS). Should be FALSE for any other dataset than the standard monitoring (MOSJ) datasets.

add_coordinates

If TRUE coordinates will be added to metadata from the list of standardized station names.

control_sample_names

Logical indicating whether non-standard symbols in sample names should be replaced by standardized equivalents. May fix problems when trying to merge zooplankton samples with meta data from another file. These names tend to have typos.

round2ceiling

Logical indicating whether decimals should be rounded to ceiling integers: some Polish data come rounded this way. It is recommended to ask for nonrounded values as using this parameter may lead to very large biases in biomass estimates of deep samples. This argument is included only for making testing the impact of rounding easier.

Details

Zooplankton taxonomy data from IOPAN are received in (more or less) standard format on MS Excel sheets. This function attempts to read that format and enable passing data to futher manipulation in R. The structure of the Excel sheet is explained in Figure 1.

Figure 1. Example how zooplankton Excel sheets tend to be arranged.

  1. Meta data are arranged row-vise (with headers on rows) and should contain following fields: "expedition", "station", "sample_name", "date", "from", "to", "unit", and "comment". The field names will be guessed. If the function does not guess the names correctly, try changing the names to the required field names. The dataCols argument may be used as help to specify the column indices containing data to help the function (currently not implemented).

  2. Data are listed column-wise for each station. Make sure that there are no blank data columns with meta data (entirely blank columns are OK) as the function does not manage to separate such columns yet. Specify the row number for beginning (dataStart) of the data section. Rows > dataStart will be considered as meta data. The dataEnd argument can be used in cases where the sheet contains scrap data. Rows > dataEnd will be dropped.

  3. Species list is arranged column-wise and the field headers should be listed in the control_species argument.

  4. The correct Excel sheet containing all data is often named "ALL...", but this varies (purple text).

The function sums up duplicate species entries for each sample. The function attemps to match the species names in data_file with the accepted ones listed in ZOOPL. Sometimes this routine fails and manual fixes are required.

The function is currently relatively unstable and most likely requires manual debugging for each dataset.

Value

Returns a list of class ZooplanktonData. The list contains 3 data frames: $data (abundance data), $meta (meta-data), and $splist (species information).

Author(s)

Mikko Vihtakari, Anette Wold

See Also

Other ZooplanktonData: merge_zooplankton_data(), print.ZooplanktonData(), subset.ZooplanktonData(), summarize_zooplankton_data()


MikkoVihtakari/MarineDatabase documentation built on July 7, 2020, 2:16 a.m.