read_metadata: Read meta-data from Excel sheets to NPI database input format

Description Usage Arguments Details Value Examples

View source: R/read_metadata.R


Reads meta-data from Excel files to the format accepted by NPI data managers


read_metadata(meta_file, sheet = 1, additional = NULL, add_time = 0,
  date_origin = "1899-12-30", expedition = NULL, station = NULL,
  type = NULL, sample_name = NULL, longitude = NULL,
  latitude = NULL, date = NULL, bottom_depth = NULL, gear = NULL,
  from = NULL, to = NULL, responsible = NULL, comment = NULL,
  guess_colnames = TRUE)



File name of the meta-data table or a data frame containing meta-data. If file name, the file has to have .xlsx extension.


sheet number or name where meta-data are located. See read.xlsx


Additional columns to be included in meta-data. Must be specified as a named character vector, which lists the exact column names to be included together with exported column names as names of the vector. See Examples.


Hours to be added to the ISO 8601 dates. See Details.


The origin for recorded dates in the Excel sheet in "YYYY-MM-DD" format. See Details.


Column name specifying the name of the expedition. See Details.


Column name specifying the station name. See Details.


Column name specifying the sample type column. See Details.


Column name specifying the sample name column. See Details.


Column name specifying the longitude column. Coordinates should be given in decimal degrees. See Details.


Column name specifying the latitude column. Coordinates should be given in decimal degrees. See Details.


Column name specifying the sampling time. The information will be transformed to ISO 8601 standard format. See note about date conversion in Details. See Details.


Column name specifying the bottom depth during sampling. See Details.


Name of the column specifying the sampling gear. See Details.


Name of the column specifying the depth from which sampling was started. See Details.


Name of the column specifying the depth from which sampling was ended. See Details.


Name of the column specifying the responsible persons for the sampling. See Details.


Name of the column specifying comments. Any non-numeric values in from and to will be transferred to this column. See Details.


Logical indicating whether guess_colname function should be used to search for required column names. Defaults to TRUE. See Details.


The guess_colname function is used to search for column names in meta_file by default. This procedure saves the user from specifying all the required column names (expedition, station, type, sample_name, longitude, latitude, date, bottom_depth, gear, from, to, responsible and comment). The function works well with tested Excel sheets, but might cause an error if column names are far from tested names. Modify coln_search_words in such case (guessing is done using regular expressions).

The easiest solution is to change the column names close to column names listed in Usage (or close to those listed in coln_search_words). Alternatively you can list all column names manually. Note that any white space in column names must be denoted by period (.). All column names should be specified as character strings of length 1. The column names refer to the meta-data table (dt).

Date conversion between Excel and R is fairly unstable business as your current locale (i.e. to what time zone your computer is set to) might affect the outcome. The easiest way to convert dates is to enter meta_file as a character string specifying the Excel file name (and location). This allows the usage of openxlsx package's convertToDateTime function, which often reads date-times correctly, given that the dates are entered in a consistent format (this is not always the case in sample logs).

Another option is to input a data frame already opened from Excel. This option might be necessary when doing changes to the data frame prior passing it through the read_metadata function. In this case, you have to make sure that date_origin is correct for your operating system (see

The add_time argument can be used to add or subtract hours from the output, if the times do not match with those in the Excel sheet. This can be helpful in either cases, if your locale causes an offset between recorded dates.


Returns a list of class MetaData that contains modified meta-data records ($meta), removed meta-data records ($deleted), file name to be used when saved for database import (file_id) and a data frame containing row numbers of duplicates in the original Excel sheet ($duplicates).

Dates ($meta$date) are returned as UTC date-time in ISO 8601 format.


## Read meta-data and let the function to find errors in it:
x <- read_metadata("Samplelog MOSJ 2015.xlsx")

## Meta-data reading follows openxlsx syntax. 
## You can add columns using the additional argument:
x <- read_metadata("GlacierFront_2017_Samplelog_20171024.xlsx", 
sheet = "SAMPLELOG", additional = c(Conveyance = "Sampled.from"))

MikkoVihtakari/MarineDatabase documentation built on Nov. 15, 2018, 5:37 p.m.