HIC.Continuous.Data.Import.Format: Batch process: Format data exported from the HIC database

Description Usage Arguments Details Value Examples

View source: R/HICFunctionsForCleaningContinuousBioParameters.R

Description

Bach process function for importing the continuous water quality data from the HIC Hydrological Information Center(HIC) database. This function takes the csv file that is exported from the HIC data base and converts it into a format that can be used easily in R. The header of the HIC csv file has horizontally oriented metadata. These meta data are taken from the header and put into columns in the dataset. The dates and times are converted into R friendly datetime format and into UNIX numeric format for easier handling in R.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
HIC.Continuous.Data.Import.Format(
  InputDirectory,
  OutputDirectory,
  Data.sep = "\t",
  Meta.sep = ":",
  Data.header.line = NULL,
  Dec = ".",
  DateFormat = "%d/%m/%Y",
  TimeZone = "Etc/GMT-1",
  OneYearDataSet = F,
  ValueColumnNum = 3,
  ParamNameColmn = "Parameter.Name",
  StationNoColmn = "Station.Number",
  DateColmn = "Date",
  TimeColmn = "Time"
)

Arguments

InputDirectory

Path to the folder directory containing all the csv tables that you wish to format placed in quotations. Must have no back slashes (\), they must be all forward slashes (/) or double back slashes (\\).

OutputDirectory

Path to the directory where you wish to save the formatted data in quotations. Must have no back slashes (\), they must be all forward slashes (/) or double back slashes (\\).

Data.sep

The field separator character for the value data. The columns are separated by this character. The default is tab separated "\t".

Meta.sep

The field separator for the metadata values. The columns are separated by this character. The default is colon separated ":".

Data.header.line

It is assumed that the data header is the first line with the most separations. But if not, then the line number of the data header can be specified.

Dec

The decimal character. By default ".".

DateFormat

Character string giving the date format. See the strptime() help file for additional help.

TimeZone

The time zone is by default UTC+1 "Etc/GMT-1". Use OlsonNames() for a list of all time zone names.

OneYearDataSet

If the dataset is only within one calender year, then you can change this to TRUE and the year will be added to the ID and the file name, but if there are more than one calender years in the dataset then a warning message will appear and the year will not be added to the ID or file name.

ValueColumnNum

The column number of the data values. This column has inconsistent naming and thus must be refered to by column number.

ParamNameColmn

The parameter name column name in the meta data. If you enter in new names, then replace all spaces and special characters with "."

StationNoColmn

The station number column name in the meta data. If you enter in new names, then replace all spaces and special characters with "."

DateColmn

Date column name in quotations.

TimeColmn

Time column name in quotations.

Details

Place all HIC csv files into one directory. Specify this InputDirectory in the function in quotes and with forward-slashes(/) or double-back-slashes(\\) no back-slashes(\). Specify the OutputDirectory where you would like to have the data be exported to in quotes and with forward-slashes(/) or double-back-slashes(\\) no back-slashes(\). If you copy the directory path from windows, it will have back-slashes(\) and these need to be changed to forward-slashes(/) or double-back-slashes(\\). If you don't write the full path for the OutputDirectory, then it will create that directory in your working directory.

Assumed input data file structure

Data table format assumed to be a vertical list of the meta data on top of the horizontally oriented data table.
Example of the assumed data table structure of the input data:

Station.Number: RTZ25
Parameter.Name: temp
Parameter.Unit: C

Date Value State.of.Value
25/01/2018 5.6 110
25/01/2018 7.8 110
25/01/2018 4.2 110

Possible issues

This code can't deal with extremely inconsistent column names between files. It searches for key words to find the columns in the meta data, but if there are no common words between the different files, then it can't find them. Data is all saved with auto-names Station.ParameterName.Year.SystemTimeInSecondsFileNumber.csv so there is a risk of overwriting older data if you run this batch process in a loop and if multiple files are processed within less than a second of each other with the same station and parameter and happen to be the same file number in their folder. This is unlikely to occur but in theory is possible.

Value

This function returns each seperate csv file in the input directory as a separate comma separated csv file in the output directory with all the metadata placed into columns to the right of the data, date and time merged into one datetime column ("DateTime"), a numeric datetime column ("DateTimeUnix") in UNIX seconds and all parameter values in the column "Value".

Examples

1
2
3
4
HIC.Continuous.Data.Import.Format(
    InputDirectory = "C:/Rdata/originaldata/HICdata",
    OutputDirectory = "FormattedHICdata")
#the folder "FormattedHICdata" will be created in your working directory since it is not a full path.

pgelsomini/HICbioclean documentation built on Dec. 28, 2021, 5:22 p.m.