HICbioclean: Auto and Manual Validation and Calibration of Continuous Biological Parameters from the Flemish Hydrological Information Center (HIC) Database

Documented in dspk.BatchProcess dspk.DataGapInterpolation dspk.DespikingWorkflow.CSVfileBatchProcess dspk.getmode dspk.MinMaxfilter dspk.Spikefilter dspk.TableFormatting dspk.vectorize

#Automated Spike Removal
#Pali Gelsomini, ECOBE, University of Antwerp 2020



#function for mode -------------------------------------------

#' Mode
#'
#'
#' @param v A vector. May be a string, numeric, ect.
#'
#' @return The mode of the input vector
#' @export
dspk.getmode <- function(v) {
  uniqv <- unique(v)
  uniqv[which.max(tabulate(match(v, uniqv)))]
}

#Function for vectorizing data input----------------------------------

#' Vectorize utility function
#'
#' This function is not exported
#'
#' Vector data can be processed much faster than data in a dataframe.
#' Can take a vector directly, a data frame and a column name or a data frame and a column number.
#'
#' @param Value Either the vector itself, the column name as a string in quotes or the column number as numeric
#' @param Data An R object dataframe. Optional.
#' @param na.strings allows you to handle NA values. ex: na.strings = c(-777, NA) all values of -777 will be replaced with values of NA
#' @param sup.warn if TRUE it will prevent warning messages
#'
#' @return A vector object of the input
#'
#' @examples
#' #Data is vectorized either from vector itself, or a datatable and you can put in either the name of the column or the column number
#' #    ex: a <- c(1,2,3,4)
#' #        b <- c('a','b','c','d')
#' #        c <- data.frame(a,b)
#' #
#' #        dspk.vectorize(b)  #selecting the vector directly
#' #        > [1] 'a' 'b' 'c' 'd'
#' #        dspk.vectorize(Data = c, Value = 'b')  #selecting the column from a dataframe using column name
#' #        > [1] 'a' 'b' 'c' 'd'
#' #        dspk.vectorize(Data = c, Value = 2)  #selecting the column from a dataframe using column number
#' #        > [1] 'a' 'b' 'c' 'd'
#' #        WARNING: If you are trying to select a column from a dataframe but you don't places Value in quotes,
#' #                 and your entred value is an already existing variable, then it will vectorize that variable
#' #                 and ignor the given dataframe. See example below.
#' #        dspk.vectorize(Data = c, Value = b)  #selecting the column from a dataframe using column name
#' #        > [1] 'a' 'b' 'c' 'd'

dspk.vectorize <- function(Value, Data = NULL, na.strings = NULL, sup.warn = FALSE) {
  #vectorize Value
  if(is.character(Value)&length(Value)==1){ #if Value is a single character type then try to select the column from Data with that name
    if(!is.data.frame(Data)) stop('Data must be a dataframe') #check that Data is infact a dataframe
    vec <- as.vector(Data[[Value]]) #vectorize value column
  }else if (is.numeric(Value)&length(Value)==1){ #if Value is a number then select that column number
    if(!is.data.frame(Data)) stop('Data must be a dataframe') #check that Data is infact a dataframe
    vec <- as.vector(Data[,Value]) #vectorize value column number
  }else{ #if esle assume the given Value is a vector and just save that
    #save value as vector
    vec <- as.vector(Value)
    if(!sup.warn) if(!is.null(Data)) warning('A dataframe was given (Data =), but no columns were selected from given dataframe. The entered value (Value =) is an already existing variable. Place Value in quotations (Value = "name") if you wished to select a column from the dataframe.')
  }

  #remove NA value codes
  #syntax na.strings = c(na.code.to.be.replaced, na.code.to.be.replaced.with )
  if(!is.null(na.strings)){
    con <- vec == na.strings[1]
    vec[con] <- na.strings[2]
  }

  if(!is.vector(vec)) stop('Cannot be saved as a vector')

  return(vec)
}



#Function for pre-formating the datatables add State-of-Value column for quality control and despiked values column ---------------------------------------------------------------------

#' Pre-format data tables for use with other HICbioclean functions
#'
#' This function is not exported
#'
#' Puts data values and datetime in a unique column and creates a state of value column. If the state of value column "dspk.StateOfValue" already
#' exists then those values will be kept as the state of values. if datetime is numeric then those values will be kept. If date time is
#' a character format, then it will be converted into UNIX seconds. If datetime is missing then the records will be numbered cosecutivly.
#'
#' @param Data a datatable or NULL
#' @param Value a vector, column number or column name
#' @param DateTime a vector, column number, column name or NULL. If left NULL then a time will be generated by numbering the records consecutivly.
#' @param NAvalue NULL or the value to be turned to NA
#' @param datetime.format If date time is in character format then a format must be given so it can be converted to POSIX format.  Character string giving the datetime format. See strptime() documentation for additional help.
#' @param datetime.timezone Character string giving the time zone of the datetime. By default “GMT”. See strptime() documentation for additional help.
#' @param state.of.value.code State of value to give all non-NA values
#' @param state.of.value.code.na State of value to give all NA values
#' @param add.original.data If TRUE then all original data in the provided datatable will be included in the output data table
#'
#' @return data table with values in "dspk.Values", numeric datetime in "dspk.DateTimeNum", and state of value in "dspk.StateOfValue".
#' if add.original.data is TRUE then all the data in the input data table will be included.
#'
dspk.TableFormatting <- function(Data = NULL, Value, DateTime = NULL, NAvalue = NULL, datetime.format = NULL, datetime.timezone = "GMT", state.of.value.code = 110,state.of.value.code.na = 255, add.original.data = T){

  #vectorize Value
  dspk.Values <- dspk.vectorize(Data = Data, Value = Value, na.strings = c(NAvalue,NA),sup.warn = T)

  #vectorize datetime
  if(is.null(DateTime)){ #if no datetime given
    message("no datetime: a datetime was automatically generated numbering the samples consecutivly")
    dspk.DateTimeNum <- seq(max(length(dspk.Values))) #the functions need a time col-ust number the samples consecutivly which will work as a time column substitute
  }else{ #if datetime given
    dspk.DateTimeNum <- dspk.vectorize(Data = Data, Value = DateTime, na.strings = c(NAvalue,NA), sup.warn = T)
  }

  #check that the entered dataframe and the entered vectors are the same length
  if(!is.null(Data)){
    if(length(Data[,1])!=length(dspk.Values)) stop('length of entered dataframe and value vector are not equal')
    if(length(Data[,1])!=length(dspk.DateTimeNum)) stop('length of entered dataframe and date time vector are not equal')
  }


  #Check formating and reformat where needed
  #Error messages, warnings and reformating
  #check that value is numeric and if not then make it so
  if(any(!is.numeric(dspk.Values))){
    warning('Value is not numeric, NAs may be introduced through coercion into numeric ')
    dspk.Values <- as.numeric(dspk.Values)
  }
  #check that Value and DateTime are the same length
  if(length(dspk.Values)!=length(dspk.DateTimeNum)) stop('Value and Datetime vectors must be of the same length')

  #check that datetime is numeric and if not make it so
  if(is.character(dspk.DateTimeNum)&!is.null(datetime.format)){ #if datetime is character and a format is given
    pre <- sum(is.na(dspk.DateTimeNum))
    dspk.DateTimeNum <-as.numeric(as.POSIXct(dspk.DateTimeNum, format = datetime.format, tz= datetime.timezone ))
    post <- sum(is.na(dspk.DateTimeNum))
    if(pre<post) warning('NAs introduced during conversion to datetime POSIX class, check formating') #check if there are more NAs after the conversion
    if(post == length(DateTime)) stop('Datetimes did not convert correctly to POSIX class, check formating') #check if all the data are NAs
  }
  if(is.character(dspk.DateTimeNum)){
    warning('Datetime is not numeric, NAs may be introduced through coercion into numeric ')
    dspk.DateTimeNum <- as.numeric(dspk.DateTimeNum)
    if(sum(is.na(dspk.DateTimeNum)) == length(DateTime)) stop('Datetimes did not convert correctly to numeric class, check that you entered the correct format if it is writen in characters') #check if all the data are NAs
  }
  #check if some values are missing time stamps
  if(any(is.na(dspk.DateTimeNum)&!is.na(dspk.Values))) warning('Some values are missing timestamps')
  #check that all times are unique
  if(anyDuplicated(na.omit(dspk.DateTimeNum))>0) stop('Times are not all unique')


  #combine vectors into a dataframe
  tb <- data.frame(dspk.Values,dspk.DateTimeNum)
  #create a state of value vector with default code for unchecked
  if("dspk.StateOfValue" %in% names(Data)){ #if there is already a dspk.StateOfValue column in the data table
    tb$dspk.StateOfValue <- Data$dspk.StateOfValue  #then just save those values
  }else{
    tb$dspk.StateOfValue <- state.of.value.code
    tb$dspk.StateOfValue[is.na(tb$dspk.Values)] <- state.of.value.code.na} #else use the state.of.value.code signifying unchecked for all the values


  #adding original data to output
  if(add.original.data) {
    if(length(DateTime)!=1&!is.null(DateTime)) {tb <- cbind(DateTime,tb)}
    if(length(Value)!=1) {tb <- cbind(Value,tb)}
    if(!is.null(Data)) {
      if(any(names(Data) %in% c("Value","DateTime"))&any(names(tb) %in% c("Value","DateTime"))){
        con <- which(names(tb) %in% c("Value","DateTime"))
        names(tb)[con] = paste0(names(tb)[con],".1")
      }
      if(any(names(Data) %in% c("dspk.Values","dspk.DateTimeNum","dspk.StateOfValue"))){
        tb <- cbind(Data[,-which(names(Data) %in% c("dspk.Values","dspk.DateTimeNum","dspk.StateOfValue"))],tb) #if the entered data table already has columns of the specified names then they will be omited to prevent having duplicate column names, aka old data will be overwriten
        warning('Data columns were overwriten. The entered datatable had column names that were either "dspk.Values","dspk.DateTimeNum" or "dspk.StateOfValue", which are the same names as the automatically generaged columns')
      }else{tb <- cbind(Data,tb)}
    }
  }
  return(tb)
}


#Function min max filter---------------------------------------------------------------------

#' Min max filter
#'
#' Mininum maximum filter.
#'
#' @param Data A datatable or NULL
#' @param Value A vector, datatable column name as a string or datatable column number as numeric
#' @param Min Minimum value to keep
#' @param Max Maximum value to keep
#' @param State.of.value.data A vector, datatable column name as a string or datatable column number as numeric
#' @param state.of.value.code number that deleted values will be taged with
#' @param default.state.of.value.code number that values will be tagged with if no State.of.value.data is given
#' @param NAvalue Value that should be read as NA
#' @param logoutput TRUE if you want to have a logged record of what the function did
#'
#' @return returns a datatable with columns dspk.Values and dspk.StateOfValue containing the filtered data.
#' If logoutput is TRUE, then $data contains the datatable and $logdata contains the info for the log file
#'
#' @export
#'
#' @examples
#' SomeValues <- c(5,NA,NA,NA,5,3,2,2,3,NA,8,2,3,3)
#' SomeTimes <- c(1,2,3,4,5,6,7,8,9,22,23,24,25,26)
#' ADataframe <- data.frame(SomeValues,SomeTimes)
#'
#' #entering in a vector into the function.
#' dspk.MinMaxfilter(Value = SomeValues, Min = 3,Max = 5)
#' #entering a dataframe and column name into the function
#' dspk.MinMaxfilter(Data = ADataframe, Value = "SomeValues", Min = 3,Max = 5)
#' #entering a dataframe and column number into the function
#' dspk.MinMaxfilter(Data = ADataframe, Value = 1, Min = 3,Max = 5)
dspk.MinMaxfilter <- function(Data = NULL, Value, Min, Max, State.of.value.data = NULL, state.of.value.code = 91, default.state.of.value.code = 110, NAvalue = NULL, logoutput = F){
  #you must enter a "min = " and a "max = " value
  if(!(is.numeric(state.of.value.code)&length(state.of.value.code)==1)) stop('State of value codes must be a number')

  #vectorize Value
  dspk.Values <- dspk.vectorize(Data = Data, Value = Value, na.strings = c(NAvalue,NA),sup.warn = T)
  #vectorize State of Value
  if(is.null(State.of.value.data)){
    dspk.StateOfValue <- rep(default.state.of.value.code,length(dspk.Values)) #if no state of value is provided make one labled 110 (not evaluated)
  }else{dspk.StateOfValue <- dspk.vectorize(Data = Data, Value = State.of.value.data, na.strings = c(NAvalue,NA),sup.warn = T)}
  #quality controle
  if (length(dspk.Values)!=length(dspk.StateOfValue)) stop('Values vector and State of Value vector need to be the same length')
  if(!is.numeric(dspk.StateOfValue)) stop('State of value must be in numeric format')
  if(!is.numeric(dspk.Values)) stop('Values must be in numeric format')

  #conditional vector for finding too high or low values
  #value must be greater than max or less than min
  condition <- ((dspk.Values>Max|dspk.Values<Min)&!is.na(dspk.Values))
  #if value meets the conditional criteria then set state of value as 'min max filter' (default 91) and delete value
  n<-length(dspk.Values) #get length of the vector
  for (i in (1:n)[condition]) {
    dspk.StateOfValue[i] <- state.of.value.code
    dspk.Values[i] <- NA
  }
  logdata <- paste('Removed',sum(condition),'out of',n,'data points')
  message(paste('Removed',sum(condition),'out of',n,'data points'))

  #combine vectors into a dataframe for output
  tb <- data.frame(dspk.Values,dspk.StateOfValue)

  #add activity log output if asked for
  if(logoutput){
    tb<-list(tb,logdata)
    names(tb)<-c('data','logdata')
  }

  return(tb) #output
}



#§Function spike removal---------------------------------------------------------------------

#' Spike removal algorithm
#'
#' Outlier removal function using standard deviation or median absolute deviation thresholds
#' from the 10 surrounding datapoints.
#'
#' With the default “Method” median
#' and the default “threshold” 3: all data points that are more than 3
#' median absolute deviations away from the median of the 10 surrounding data
#' points (5 before and 5 after) will be deleted. At least 5 surrounding data points
#' is required for the sample to be evaluated. The algorithm will not look farther
#' than 5 sampling intervals before and after the data point, for handling data gaps.
#' If a “sampling.interval” is not provided then it will be calculated as the mode of
#' the interval between samples.
#'
#' Spikes must be greater than 4 times the precision of the data, for example if there
#' is one decimal place then the spike must be at least 0.4 greater than the median/mean
#' of the surrounding data. This is a bug fix for when there is very little change in the
#' values and most of the values are exactly the same, then any change what so ever
#' will be taken as a spike (e.g. 4,4,4,5,4,4,4 the 5 in this list would be taken as a
#' spike otherwise).
#'
#' To keep track of what vales were evaluated and removed the output "dspk.StateOfValue" is generated.
#' The state of value of the deleted values will be set to “state.of.value.code” (default 92). The
#' state of value of the values that passed the despike test will be set to
#' “good.state.of.value.code” (default 80). If the value was unchecked (due to too few point)
#' then the state of value will be left as is.
#' If the no "sate.of.value.data" was provided then the state of value for unckecked will be
#' "default.state.of.value.code" (default 110).
#'
#' @param Data A dataframe. Leave as NULL if you are entering in vectors directly.
#' @param Value A vector, datatable column name as a string or datatable column number as numeric
#' @param NumDateTime Numeric datetime. A vector, datatable column name as a string or datatable column number as numeric
#' @param sampling.interval Time between samples for use in handling data gaps. May be left as NULL and the function will calculate it.
#' @param State.of.value.data A vector, datatable column name as a string or datatable column number as numeric
#' @param state.of.value.code Number that deleted values are marked with
#' @param good.state.of.value.code Number that checked values are marked with
#' @param default.state.of.value.code Number that unchecked values are marked with if no State.of.value.data was provided.
#' @param NAvalue Value that is read as NA
#' @param threshold Number indicating the threshold for defining a spike. By default it is 3, which corresponds to 3 median absolute deviations or 3 standard deviations.
#' @param precision The number of decimals in your dataset. Will be calculated if left as NULL.
#' @param Method Character string "median" or “mean” indicating the method to use for the despiking. By default “median”.
#' @param logoutput TRUE if you want to have a logged record of what the function did
#'
#' @return returns a datatable with columns dspk.Values and dspk.StateOfValue containing the filtered data.
#' If logoutput is TRUE, then $data contains the datatable and $logdata contains the info for the log file
#' @export
#'
#' @examples
#' SomeValues <- c(5,6,2,3,5,66,2,2,3,69,8,2,3,3)
#' SomeTimes <- c(1,2,3,4,5,6,7,8,9,22,23,24,25,26)
#' ADataframe <- data.frame(SomeValues,SomeTimes)
#'
#' #entering in a vector into the function.
#' dspk.Spikefilter(Value = SomeValues)
#' #entering a dataframe and column name into the function
#' dspk.Spikefilter(Data = ADataframe, Value = "SomeValues")
#' #entering a dataframe and column number into the function
#' dspk.Spikefilter(Data = ADataframe, Value = 1)
#'
#' #If you have data gaps then provide sampling times so that the
#' #function won't compare data that isn't actually next to each
#' #other in time. The time must be provided as a numeric class.
#' dspk.Spikefilter(Value = SomeValues, NumDateTime = SomeTimes)
#'
#'
#' #------------------------------------------------------------------------
#' # If you enter in no "Method" or "threshold" it will evaluate the
#' #despiking using the default method of a threshold of 3 median absolute
#' #deviations from the median.
#'
#' #Running the despiking using the threshold of 5 median absolute deviations from the median.
#' dspk.Spikefilter(Value = SomeValues, threshold = 5)
#' #Running the despiking using the threshold of 5 standard deviations from the mean.
#' dspk.Spikefilter(Value = SomeValues, Method = "mean", threshold = 5)
#'
dspk.Spikefilter = function(Data = NULL, Value, NumDateTime=NULL, sampling.interval = NULL, State.of.value.data = NULL, state.of.value.code = 92, good.state.of.value.code = 80, default.state.of.value.code = 110,  NAvalue = NULL, threshold = 3, precision = NULL, Method = "median",logoutput = F){
  #message("Spike removal")
  if(!(is.numeric(good.state.of.value.code)&length(good.state.of.value.code)==1)) stop('State of value codes must be a number')
  if(!(is.numeric(state.of.value.code)&length(state.of.value.code)==1)) stop('State of value codes must be a number')

  if(!(Method == "median"|Method == "mean")) stop('Method must be "median" or "mean"')

  #vectorize Value
  dspk.Values <- dspk.vectorize(Data = Data, Value = Value, na.strings = c(NAvalue,NA),sup.warn = T)
  #vectorize State of Value
  if(is.null(State.of.value.data)){
    dspk.StateOfValue <- rep(default.state.of.value.code,length(dspk.Values)) #if no state of value is provided make one labled 110 (not evaluated)
  }else{dspk.StateOfValue <- dspk.vectorize(Data = Data, Value = State.of.value.data, na.strings = c(NAvalue,NA),sup.warn = T)}
  #vectorize numeric date time
  if(is.null(NumDateTime)){ #if no datetime given
    message("no datetime: a datetime was automatically generated numbering the samples consecutivly")
    DateTimeNum <- seq(max(length(dspk.Values))) #the functions need a time col-ust number the samples consecutivly which will work as a time column substitute
  }else{ #if datetime given
    DateTimeNum <- dspk.vectorize(Data = Data, Value = NumDateTime, na.strings = c(NAvalue,NA),sup.warn = T)
  }

  #quality controle
  if (length(dspk.Values)!=length(dspk.StateOfValue)) stop('Values vector and State of Value vector need to be the same length')
  if (length(dspk.Values)!=length(DateTimeNum)) stop('Values vector and numeric date time vector need to be the same length')
  if(!is.numeric(DateTimeNum)) stop('Date time must be in numeric format')
  if(!is.numeric(dspk.StateOfValue)) stop('State of value must be in numeric format')
  if(!is.numeric(dspk.Values)) stop('Values must be in numeric format')

  #make the activity log element
  logdata <- NULL

  #getting decimal precision
  if(is.null(precision)){ #if no precision was provided
    havedecimals <- grepl('.',format(dspk.Values,scientific = F),fixed = T)
    if(any(havedecimals)){#do any of the values have decimal points?
      ndecimals <- function(x){tryCatch({nchar(strsplit(format(x,scientific = F), ".", fixed = TRUE)[[1]][[2]])},error=function(e){return(0)})}

      precision<-eval(parse(text = paste0('1e-', max(unlist( lapply(dspk.Values[havedecimals],ndecimals) )) )))
    }else{
      precision<-1 #round to no decimal places
    }
    message(paste('No data precision provided for determining minimum spike size. Data precision taken to be',precision,'and minimum spike size will be four times this value.'))
    logdata <- rbind(logdata,paste('No data precision provided for determining minimum spike size. Data precision taken to be',precision,'and minimum spike size will be four times this value.'))
  }
  #setting the minimum spike size
  minimumSpike <- precision*4

  #calculating sampling interval if sampling interval is NULL
  if(is.null(sampling.interval)){
    a <- na.omit(DateTimeNum[-1]-DateTimeNum[-length(DateTimeNum)]) #subtract times from eachother shifted by one
    sampling.interval2<-dspk.getmode(a[a>0]) #find the minimum time difference that is greater than 0
    message(paste('sampling interval calculated to be:',sampling.interval2))
    logdata <- rbind(logdata,paste('sampling interval calculated as to be:',sampling.interval2))
  }else{sampling.interval2<-sampling.interval}
  if(is.null(NumDateTime)){sampling.interval <- 1} #sampling interval set to 1 if no datetime data was provided since the records are just numbered consecutivly
  #despiking-------------
  #conditional vecotr Value must not be NA
  condition <- (!is.na(dspk.Values))

  n<-length(dspk.Values) #number data points in the vector

  #despiking the first 5 data points
  for (i in (1:5)) {
    if (condition[i]) {
      t<-DateTimeNum[i] #time that the sample was taken

      maxrange <- sampling.interval2*5+1 #maximum range (5 times the sampling interval plus one) for collecting the surounding data points to handle data gaps

      #collecting the 5 following datapoints
      v <- c(ifelse((abs(t-DateTimeNum[(i+1)]))<maxrange,dspk.Values[(i+1)],NA),
            ifelse((abs(t-DateTimeNum[(i+2)]))<maxrange,dspk.Values[(i+2)],NA),
            ifelse((abs(t-DateTimeNum[(i+3)]))<maxrange,dspk.Values[(i+3)],NA),
            ifelse((abs(t-DateTimeNum[(i+4)]))<maxrange,dspk.Values[(i+4)],NA),
            ifelse((abs(t-DateTimeNum[(i+5)]))<maxrange,dspk.Values[(i+5)],NA))

      #collecting the 5 previous data points making sure that the index for pulling the value out of the vector is not a negative number
      if (i>1) {
        for (k in (1:(i-1))) {
          v <- c(v,ifelse((abs(t-DateTimeNum[(i-k)]))<maxrange,dspk.Values[(i+1)],NA))
        }
      }


      v <- as.vector(na.omit(v)) #remove NAs

      if (length(v)>4){
        x <- dspk.Values[i] #value of this sample site
        spike <- ifelse(Method == "median",abs(median(v)-x)>max(threshold*mad(v, na.rm = T),minimumSpike),abs(mean(v)-x)>max(threshold*sd(v, na.rm = T),minimumSpike))
        dspk.StateOfValue[i] <- ifelse(spike,state.of.value.code,good.state.of.value.code) #if statment to determine if it is a spike
      }
    }
  }
  #despiking the rest of the data
  for (i in (6:n)[condition[-c(1:5)]]) {

    t<-DateTimeNum[i] #time that the sample was taken

    maxrange <- sampling.interval2*5+1 #maximum range (5 times the sampling interval plus one) for collecting the surounding data points to handle data gaps

    #collect the 10 surounding data point values that are within 50 min plus or minus of the data point
    v <- c(ifelse((abs(t-DateTimeNum[(i-5)]))<maxrange,dspk.Values[(i-5)],NA),
          ifelse((abs(t-DateTimeNum[(i-4)]))<maxrange,dspk.Values[(i-4)],NA),
          ifelse((abs(t-DateTimeNum[(i-3)]))<maxrange,dspk.Values[(i-3)],NA),
          ifelse((abs(t-DateTimeNum[(i-2)]))<maxrange,dspk.Values[(i-2)],NA),
          ifelse((abs(t-DateTimeNum[(i-1)]))<maxrange,dspk.Values[(i-1)],NA),
          ifelse((abs(t-DateTimeNum[(i+1)]))<maxrange,dspk.Values[(i+1)],NA),
          ifelse((abs(t-DateTimeNum[(i+2)]))<maxrange,dspk.Values[(i+2)],NA),
          ifelse((abs(t-DateTimeNum[(i+3)]))<maxrange,dspk.Values[(i+3)],NA),
          ifelse((abs(t-DateTimeNum[(i+4)]))<maxrange,dspk.Values[(i+4)],NA),
          ifelse((abs(t-DateTimeNum[(i+5)]))<maxrange,dspk.Values[(i+5)],NA))

    v <- as.vector(na.omit(v)) #remove NAs

    if (length(v)>4){
      x <- dspk.Values[i] #value of this sample site
      spike <- ifelse(Method == "median",abs(median(v)-x)>max(threshold*mad(v, na.rm = T),minimumSpike),abs(mean(v)-x)>max(threshold*sd(v, na.rm = T),minimumSpike))
      dspk.StateOfValue[i] <- ifelse(spike,state.of.value.code,good.state.of.value.code) #if statment to determine if it is a spike
    }
  }

  #remove value if it is a spike
  Condition<- dspk.StateOfValue == state.of.value.code
  for(i in (1:n)[Condition]){
    dspk.Values[i] <- NA
  }
  message(paste('Removed',sum(Condition),'out of',n,'data points'))
  logdata <- rbind(logdata,paste('Removed',sum(Condition),'out of',n,'data points'))


  #combine vectors into a dataframe for output
  tb <- data.frame(dspk.Values,dspk.StateOfValue)
  #add activity log output if asked for
  if(logoutput){
    tb<-list(tb,logdata)
    names(tb)<-c('data','logdata')
  }

  return(tb) #output
}



#§Function Data gap interpolation-------------------------------------------------------------

#' Linear interpolation
#'
#' Linear interpolation between gaps. Excludes gaps greater than "max.gap". If no "NumDateTime" is provided
#' then max.gap is in the unit samplings. If a "NumDateTime" is provided then max.gap must be in the same unit.
#' The entered in "NumDateTime" must be numeric class.
#'
#' @param Data A dataframe. Leave as NULL if you are entering in vectors.
#' @param Value A vector, datatable column name as a string or datatable column number as numeric
#' @param precision If you enter in a precision then it will round the interpolated values. Precision is the is the smallest measurable unit on the scale. e.g. 13000 would have precision = 1000 and 0.23 would have precision = 0.01
#' @param NumDateTime A Numeric datetime. Vector, datatable column name as a string or datatable column number as numeric
#' @param max.gap is the largest data gap that you want to perform linear interpolation on.
#' @param State.of.value.data A vector, datatable column name as a string or datatable column number as numeric
#' @param state.of.value.code A number that lables all values that were interpolated
#' @param default.state.of.value.code Number that values are marked with if no State.of.value.data was provided.
#' @param NAvalue This values is read as NA
#' @param logoutput TRUE if you want to have a logged record of what the function did
#'
#' @return returns a datatable with columns dspk.Values and dspk.StateOfValue containing the interpolated data.
#' If logoutput is TRUE, then $data contains the datatable and $logdata contains the info for the log file
#'
#' @export
#'
#' @examples
#' SomeValues <- c(5,NA,NA,NA,5,3,2,2,3,NA,8,2,3,3)
#' SomeTimes <- c(1,2,3,4,5,6,7,8,9,22,23,24,25,26)
#' ADataframe <- data.frame(SomeValues,SomeTimes)
#'
#' #entering in a vector into the function.
#' dspk.DataGapInterpolation(Value = SomeValues)
#' #entering a dataframe and column name into the function
#' dspk.DataGapInterpolation(Data = ADataframe, Value = "SomeValues")
#' #entering a dataframe and column number into the function
#' dspk.DataGapInterpolation(Data = ADataframe, Value = 1)
#'
#' #If you don't want gaps larger than 2 samplings to be interpolated
#' dspk.DataGapInterpolation(Value = SomeValues, max.gap = 2)
#'
#' #If you have data gaps then provide sampling times so that the
#' #function will take the sampling times into account when assesing max.gap.
#' #The time must be provided as a numeric class and max.gap must be in the
#' #same unit as your provided time.
#' dspk.DataGapInterpolation(Value = SomeValues, NumDateTime = SomeTimes, max.gap = 2)
dspk.DataGapInterpolation = function(Data = NULL, Value, precision = NULL, NumDateTime=NULL, max.gap = Inf, State.of.value.data = NULL, state.of.value.code = 93, default.state.of.value.code = 110, NAvalue = NULL,logoutput = F){
  #message("gap interpolation")
  if(!(is.numeric(state.of.value.code)&length(state.of.value.code)==1)) stop('State of value codes must be a number')
  if(!((is.numeric(precision)&length(precision)==1)|is.null(precision))) stop('The precision must be a number or NULL. This is the smallest measurable unit on the scale. e.g. 13000 would have precision = 1000 and 0.23 would have precision = 0.01')

  #vectorize Value
  dspk.Values <- dspk.vectorize(Data = Data, Value = Value, na.strings = c(NAvalue,NA),sup.warn = T)
  #vectorize State of Value
  if(is.null(State.of.value.data)){
    dspk.StateOfValue <- rep(default.state.of.value.code,length(dspk.Values)) #if no state of value is provided make one labled 110 (not evaluated)
  }else{dspk.StateOfValue <- dspk.vectorize(Data = Data, Value = State.of.value.data, na.strings = c(NAvalue,NA),sup.warn = T)}
  #vectorize numeric date time
  if(is.null(NumDateTime)){ #if no datetime given
    message("no datetime: a datetime was automatically generated numbering the samples consecutivly")
    DateTimeNum <- seq(max(length(dspk.Values))) #the functions need a time col-ust number the samples consecutivly which will work as a time column substitute
  }else{ #if datetime given
    DateTimeNum <- dspk.vectorize(Data = Data, Value = NumDateTime, na.strings = c(NAvalue,NA),sup.warn = T)
  }

  #quality controle
  if (length(dspk.Values)!=length(dspk.StateOfValue)) stop('Values vector and State of Value vector need to be the same length')
  if (length(dspk.Values)!=length(DateTimeNum)) stop('Values vector and numeric date time vector need to be the same length')
  if(!is.numeric(DateTimeNum)) stop('Date time must be in numeric format')
  if(!is.numeric(dspk.StateOfValue)) stop('State of value must be in numeric format')
  if(!is.numeric(dspk.Values)) stop('Values must be in numeric format')

  #creating activity log object
  logdata <- NULL

  #getting decimal precision
  if(is.null(precision)){ #if no precision was provided
    havedecimals <- grepl('.',format(dspk.Values,scientific = F),fixed = T)
    if(any(havedecimals)){#do any of the values have decimal points?
      ndecimals <- function(x){tryCatch({nchar(strsplit(format(x,scientific = F), ".", fixed = TRUE)[[1]][[2]])},error=function(e){return(0)})}

      precision<-eval(parse(text = paste0('1e-', max(unlist( lapply(dspk.Values[havedecimals],ndecimals) )) )))
    }else{
      precision<-1 #round to no decimal places
    }
    message(paste('No data precision provided for rounding. Data precision taken to be',precision))
    logdata <- rbind(logdata,paste('No data precision provided for rounding. Data precision taken to be',precision))
  }

  #gap interpolation-------------
  #conditional vecotr for finding auto interpolated values
  #value must be NA
  condition <- (is.na(dspk.Values))
  message(paste("     n missing values:", sum(condition)))
  #value must not be NA
  convalue <- !is.na(dspk.Values)
  #if value meets the conditional criteria then
  n=length(dspk.Values) #number of samples in the vector
  gapendt <- 0
  for (i in (2:(n-1))[condition[-c(1,n)]]) { #loop through all values that are NA except the first and last value of the vector

    #find previous time and following time where there is data that isn't interpolated
    t <- DateTimeNum[i]
    if(t>gapendt){ #check if sample time is after the gap that is larger than maximum gap width

      a<-1
      prevv <- dspk.Values[i-a] #get previous record
      while (is.na(prevv)&(i-a-1)>0) { #as long as previous record is NA and it is not the first record then grab the record before it
        a<-a+1
        prevv <- dspk.Values[i-a]
      }
      prevt <- DateTimeNum[i-a] #the time of that previous record

      b<-1
      folv <- dspk.Values[i+b] #get following record
      while (is.na(folv)&(i+b+1)<n) { #as long as following record is NA and it is not the last record then grab the record after it
        b<-b+1
        folv <- dspk.Values[i+b]
      }
      folt <- DateTimeNum[i+b] #the time of that following record

      #if data gap is less than or equal to max.gap and no values are NA
      if (folt-prevt<=max.gap&!(is.na(prevv)|is.na(folv)|is.na(prevt)|is.na(folt))) {
        dspk.Values[i] <- round((prevv+((t-prevt)/(folt-prevt)*(folv-prevv)))/precision)*precision #linear interpolation and rounding to the precision
        dspk.StateOfValue[i] <- state.of.value.code
      }else if(!is.na(folt)){gapendt <- folt} #saves the sample time of the end of the gap if the gap is too big. This allows us to jump to the end of that large gap and speed up computations
    }
  }

  Condition <- dspk.StateOfValue == state.of.value.code
  message(paste('Interpolated',sum(Condition),'out of',sum(condition),'data gaps out of',n,'data points'))
  logdata <- rbind(logdata,paste('Interpolated',sum(Condition),'out of',sum(condition),'data gaps out of',n,'data points'))


  #combine vectors into a dataframe for output
  tb <- data.frame(dspk.Values,dspk.StateOfValue)

  #add activity log output if asked for
  if(logoutput){
    tb<-list(tb,logdata)
    names(tb)<-c('data','logdata')
  }

  return(tb) #output
}



#Function Batch process directory of CSV files------------------------------------------------



#' Batch process directory of CSV files
#'
#' This function is not exported
#'
#' !!!!!!!!!!enter CSV.table in as the datasource datatable in the quoted process!!!!!!!!!!!!!!!
#' process that you wish to batch needs to be written out between quotations (") and within the process script make sure to only use apostrophies (') and not quotations (")
#' e.g.: batched.process = "dspk.TableFormatting(Data = CSV.table, Value = 'Value', DateTime =  'DateTime', NAvalue = -777, datetime.format = '%Y-%m-%d %H:%M:%S', datetime.timezone = 'Etc/GMT-1', add.original.data = T)"
#'
#' make sure that when you enter the path name for the directory it has forward-slashes(/) or double-back-slashes (\\\\) and not back-slashes(\).
#' If you copy the directory path from windows, it will have back-slashes(\) and these need to be changed to forward-slashes(/) or double-back-slashes (\\\\)
#'
#' The NA value by default is blank "". You can enter your NA value into "NAvalue = "
#'
#' @param directory
#' @param output.directory
#' @param file.name.note
#' @param sep
#' @param dec
#' @param header
#' @param NAvalue
#' @param batched.process
#'
#' @return
#'
#' @examples
dspk.BatchProcess <- function(directory, output.directory, file.name.note = '', sep = ',', dec = '.', header = T, NAvalue = '', batched.process){
  message("!!!!Enter CSV.table in as the datasource datatable in the batched process!!!")
  #list files in directory
  files <- list.files(directory)
  #number of files in the directory
  f <- length(files)
  for(J in 1:f){  #load all the files in the directory. Files 1 to f
    tryCatch({   #in case of error, catch error and tell me the file name
      message(paste("file",J,"of",f))

      #path for file number J
      CONfile <- paste(directory,files[J],sep = "/")

      #--file data--
      #load the Data table from the text file
      CSV.table <- read.table(CONfile, sep= sep, na.strings = c(NAvalue, "NA"), header = header,dec = dec)

      #--run the batched process and save it to a new table--
      New.table <- eval(parse(text = batched.process))

      #--save data to csv's--
      #create directory output.directory if it doen't exist
      if(!dir.exists(output.directory)){
        dir.create(output.directory)
        message(paste("created directory:",output.directory))
      }
      write.csv(New.table, file = paste0(output.directory,"/",files[J],file.name.note,".csv"),row.names = F)

    },
    #in case of error give file name and error message
    error=function(cond){
      message(paste("File caused error:",files[J]))
      message("Original error:")
      message(cond)
      return(NULL)
    }
    )#end of try catch
  }#end of for loop
}



#Function Full work flow saving into CSV files--------------------------------------------


#' Batch process: Despike and autovalidate continuous data
#'
#' A full work flow for auto validation of continuous data. It may be run as a batch
#' process or on individual R objects. It performs min max filtering, despiking and
#' linear gap interpolation. You may run all these steps or a select few.
#' Files are all outputted as csv files to your working directory at each step.
#'
#' Each csv file will be process separately and saved into new csv files at each
#' step in the despiking process (pre-process: formatting, step 1: min/max filter,
#' step 2: despiking, step 3: gap interpolation). The final data will be found
#' in the folder “autodespikeYYYYMMDDHHMMSS” within the subfolder
#' “step3Interpol.FinalData”. Use the function
#' getwd() to find your working directory. State of value codes are added to the
#' data to keep track of how each value was handled during the auto-validation
#' process (110 unchecked, 255 missing, 80 auto good value, 91 deleted during min/max filter,
#' 92 deleted during despiking). The original data will still be
#' in the newly generated csv files, with the processed data saved in new
#' columns.
#'
#' Please see the FunctionLogFile.txt that was generated to see any error messages and details about the selected preferences and calculated preferences.
#'
#' \subsection{Quick Start}
#' Place all your data you wish to auto-despike into one folder as csv files. The
#' data values must be numeric. Date and time should be both in the same
#' column with no time zone corrections (e.g. 13:20 +2   The +2 is a time zone
#' correction). Datetime may also be numeric. If there are no interruptions in the
#' sampling causing data gaps, then a datetime is not needed.
#'
#' \subsection{The default state of values codes}
#' 110 Unchecked \cr
#' 255 Missing value \cr
#' 80 Auto good \cr
#' 91 Auto delete min max filter \cr
#' 92 Auto delete Spike \cr
#'
#' \subsection{Algorithm overview and workflow}
#' Pre-processing: Formatting and compatibility check: If an
#' ‘input.directory’ containing multiple csv files was provided, then each file will
#' be processed and saved separately. The ‘Value’ data is checked that it is
#' numeric and are then saved into a new column "dspk.Values" to not overwrite
#' old data. NA codes ‘val.NAvalue’ will be replaced with the value NA. The
#'  ‘DateTime’ data will be checked if it is numeric and saved into a new column
#' "dspk.DateTimeNum" to not overwrite old data. If it is a character string, then
#' it will be converted to numeric using the provided ‘datetime.format’ and
#'  ‘datetime.timezone’. If no datetime is provided then the samples will be
#' numbered consecutively and saved as the datetime. A new column
#' "dspk.StateOfValue" will be generated with all values equal to the
#'  ‘unchecked.state.of.value.code’ (default 110). NA values will be given the
#'  'NA.state.of.value.code' (default 255). If the entered csv data table
#' already has a "dspk.StateOfValue" column, then the original state of values
#' from that column will be used. If ‘add.original.data’ is equal to TRUE (this is the
#' default) then the original data will be included in the formatted data table. The
#' formatted data table will be saved to a new csv file in the directory
#'  “autodespikeYYYYMMDDHHMMSS/preprocFormat” within your working
#' directory.
#'
#' Step 1: Min/Max filter: Each csv file generated from the previous step will be
#' processed and saved separately. All data points that are above the entered
#'  ‘Max’ or below entered ‘Min’ will be deleted. The "dspk.StateOfValue" of the
#' deleted values will be set to ‘minmax.state.of.value.code’ (default 91). The
#' data will be saved in a new csv file in the directory
#' “autodespikeYYYYMMDDHHMMSS/step1MinMax” within your working
#' directory.
#'
#' Step 2: Despiking: Each csv file generated from the previous step will be
#' processed and saved separately. With the default “despike.Method” median
#' and the default “despike.threshold” 3: all data points that are more than 3
#' median absolute deviations away from the median of the 10 surrounding data
#' points (5 before and 5 after) will be deleted. At least 5 surrounding data points
#' is required for the sample to be evaluated. The algorithm will not look farther
#' than 5 sampling intervals before and after the data point, for handling data gaps.
#' If a “sampling.interval” is not provided then it will be calculated as the mode of
#' the interval between samples. The "dspk.StateOfValue" of the
#' deleted values will be set to “despiked.state.of.value.code” (default 92). The
#' "dspk.StateOfValue" of the values that passed the despike test will be set to
#' “good.state.of.value.code” (default 80). The data will be saved in a new csv file
#' in the directory  “autodespikeYYYYMMDDHHMMSS/step2Despike” within your
#' working directory.'
#'
#' Step 3: Data gap interpolation: Each csv file generated from the previous step
#' will be processed and saved separately. All data gaps will be linear
#' interpolated unless a ‘max.gap’ length for interpolation is given. If a ‘precision’
#' is given, then the interpolated values will be rounded to that precision. The
#' state of values will not be changed to know what the original state of the value was.
#' If the value has a state of value deleted or missing but there is a value then it can
#' be assumed that it was interpolated. The data will be saved in a new
#' csv file in the directory
#'  “autodespikeYYYYMMDDHHMMSS/step3Interpol.FinalData” within your
#' working directory.
#'
#' \subsection{Details}
#' Make sure that when you enter the path name for the directory it has
#' forward-slashes(/) or double-back-slashes(\\\\) and not back-slashes(\). If you copy the directory path from
#' windows, it will have back-slashes(\) and these need to be changed to
#' forward-slashes(/) or double-back-slashes(\\\\).
#'
#' Interpolated the values will be rounded to the same decimal places
#' as the original data. If you wish you may enter a custom precision.
#' Examples: 0.566 would be ‘precision = 0.001’ 1200 would be ‘precision = 100’
#' Measurements with steps of 5 would be ‘precision = 5’ Measurements to the
#' nearest half unit would be ‘precision = 0.5’
#'
#' You are not required to supply ‘DateTime’ for this function. This is only
#' necessary for handling data gaps while despiking and interpolating the data. If
#' you omit this argument, then it will generate a datetime column which
#' contains the samples numbered consecutively so you can still indicate with
#'  ‘max.gap’ the maximum data gap you wish to interpolate by filling in the
#' number of missing samples into ‘max.gap’. The ‘DateTime’ data can be
#' numeric values or as a datetime character strings (e.g. “2018-04-23 15:32:18”).
#' If the datetime data are character strings, then a ‘datetime.format’ must be
#' provided (e.g. datetime.format = "%Y-%m-%d %H:%M:%S"). See the strptime()
#' function documentation for help on syntax. If you enter a date time as a
#' character string, then it will be converted into UNIX seconds. The default time
#' zone is GMT but it can be changed to GMT+1 with 'datetime.timezone =
#' "Etc/GMT-1"' Use the OlsonNames() function for a list of all time zones. If you enter
#' a datetime, then the date and time must be in the same cell, as in they cannot
#' be in separate columns. Often a time conversion to GMT is supplied with the
#' time (e.g. 16:32 +2). This function uses the as.POSIXct function and cannot
#' handle these conversions (like the "+2" in the example). They need to be
#' removed and dealt with prior to analysis.
#'
#' When the data is being formatted the columns "dspk.Values",
#' "dspk.DateTimeNum" and "dspk.StateOfValue" will be generated. If these
#' columns already exist in the data table, then they will be overwritten. This may
#' or may not be desired. If the column "dspk.StateOfValue" is in the original
#' data, then that data will be used for the 'state of values', otherwise the
#' 'unchecked.state.of.value.code' (default 110) will be used for all data signifying 'unchecked' status.
#'
#' The spike removal algorithm is by default (despike.Method = "median") all
#' sample points that are more than the threshold 3 median absolute deviations
#'  (the scale factor 1.4826 is used assuming normal distribution) from the
#' median of the 10 surrounding data points (5 before and 5 after) are
#' automatically deleted. The threshold of 3 can be changed with
#' "despike.threshold =" You can set despike.Method = "mean" to use standard
#' deviations and mean for the algorithm instead of the default median absolute
#' deviations and median. However median is a much more robust statistic for
#' handling outliers.
#'
#' If you have data gaps and you have the sampling times and you don”t want the
#' despiking algorithm to look past the data gaps, then make sure to to supply
#' “DateTime”. The time interval between samples "sampling.interval = " will be
#' calculated as the mode of the difference between consecutive samples. If there
#' are irregularities in the sampling interval that will prevent this calculation
#' then you can enter in the sampling interval in "sampling.interval = " in the
#' numeric-datetime unit. If you entered in a character datatime column then it is
#' POSIX time converted to numeric so the unit is in UNIX seconds.
#'
#' The default is that it will do linear interpolation of all data gaps. You can
#' restrict the size of the data gap with "max.gap.interpolate". This needs to be in
#' the unit of your numeric datetime. If you entered in a datetime column
#' containing character strings then it is POSIX time converted to numeric so the
#' unit is in UNIX seconds. If you did not entered a time column, then the unit is in
#' samples, for example "max.gap.interpolate = 5" will only interpolate gaps of
#' up to 5 samples long.
#'
#' @param steps Numeric vector containing the values 1, 2 and/or 3 corresponding to
#' step 1 min max filter, step 2 despiking and step 3 gap interpolation. For
#' example ‘steps=c(1,3)’ will run a min max filter and  then interpolate the gaps.
#' Will run all steps by default.
#' @param input.directory Character string of the path to the folder containing all the
#' csv files that you wish to batch process. This argument may be omitted if you
#' are entering vectors directly into the ‘Value’ and ‘DateTime’ arguments.
#' @param sep Arguments indicating the formatting of the input csv files.
#'  It is the field separator character. Values are separated by this character.
#' By default it is comma ",".
#' @param dec Arguments indicating the formatting of the input csv files.
#' It the character used for decimal points. By default
#' if is a period ".".
#' @param header Arguments indicating the formatting of the input csv files.
#' It is a logical value indicating if the first line is the column
#' titles. By default it is TRUE.
#' @param Data A dataframe object. If you only wish to process one data frame, then it can
#' be entered directly from the R environment with this argument. If you enter in an
#' input.directory then 'Data = ' will be ignored and the files from the input directory
#' will be processed.
#' @param Value If the data is from a csv file or a dataframe, it is a quoted character string indicating the
#' column name or an integer indicating the column number of the column
#' containing the data values that you wish to despike.  Data may also be entered
#' as a single vector object (unquoted) such as ‘Value = mydata$values’ or ‘Value = values’
#' @param val.NAvalue The value indicating an NA value in your input data. If this value is NA, then this argument can be omitted.
#' @param unchecked.state.of.value.code Number indicating that a given value is unchecked. By default 110.
#' @param NA.state.of.value.code State of value code given to missing data.
#' @param add.original.data A logical value indicating if the original input data should
#' be included in the output tables. Note that if you input a csv file, then every
#' column in that file will be kept. TRUE by default.
#' @param DateTime If the data is from a csv file, it is a quoted character string indicating the
#' column name or an integer indicating the column number of the column
#' containing the datetime values of the samples.  Data may also be entered as a
#' single vector object (unquoted) such as ‘DateTime = mydata$time’ or
#'  ‘DateTime = time’
#' @param datetime.format Character string giving the datetime format. See the strptime() help file for additional help.
#' @param datetime.timezone Character string giving the time zone of the datetime. By default “GMT”.  Use OlsonNames() for a list of all time zones.
#' @param ConditionalMinMaxColumn The column name in quotes or column number or vector object that contains the factor variable to base your conditional min max filter on
#' @param ConditionalMinMaxValues A vector containing the factor values to base the conditional min max filter on
#' @param ConditionalMin A vector containing the condition minimums that correspond to the respective values in ConditionalMinMaxValues
#' @param ConditionalMax A vector containing the condition maximums that correspond to the respective values in ConditionalMinMaxValues
#' @param Min Number giving the minimum reasonable value. All values below this will be deleted.
#' @param Max Number giving the maximum reasonable value. All values above this will be deleted.
#' @param minmax.state.of.value.code Number indicating that the value has been deleted during the min max filter. By default 91.
#' @param sampling.interval As numeric, the time between samples. If you enter NULL then it will calculate it for you. By default NULL.
#' @param despiked.state.of.value.code Number indicating that a given value was deleted during the despiking. By default 92.
#' @param good.state.of.value.code Number indicating that a given value has been check and deemed not a spike during the despiking. By default 80.
#' @param despike.threshold Number indicating the threshold for defining a spike. By default it is 3, which corresponds to 3 median absolute deviations or 3 standard deviations.
#' @param despike.Method Character string "median" or “mean” indicating the method to use for the despiking. By default “median”.
#' @param precision A number indicating the precision of the input values. Interpolated values will be rounded to this precision. If left as NULL then the numbers will be rounded to the largest decimal length found in the data.
#' @param max.gap As numeric, the time span of the maximum data gap you wish to interpolate
#'
#' @return Your final data can be found in “autodespikeYYYYMMDDHHMMSS/step3Interpol.FinalData” within your
#' working directory as comma separated csv files. The cleaned values will be in column "dspk.Values", the state of values in column "dspk.StateOfValue", and the numeric datetime will be in column "dspk.DateTimeNum".
#' @export
#'
#' @examples
#' #HIC data cleaning and validation protocol
#' #Batch process HIC database files that were formatted
#' #with the HIC.Continuous.Data.Import.Format() function.
#' #With default despiking algorithm (threshold of 3 MAD from the median).
#' dspk.DespikingWorkflow.CSVfileBatchProcess(
#'    input.directory = 'data/HIC.data', #Load csv files from the folder 'HIC.data'
#'    sep = ',', dec = '.', header = T,  #The csv files are separated by commas with
#'    #point decimals and the first line is the column names.
#'    Value = 3, val.NAvalue = -777, #The values are found in the third column.
#'    #-777 stands for no-value
#'    DateTime = "DateTimeUnix",
#'    #The numeric datetime column name.
#'    #Because it is numeric no datetime formatting info is needed
#'    ConditionalMinMaxColumn = 'Parameter.Name',
#'    ConditionalMinMaxValues = c('DO','pH','chfyla','PPFD1','PPFD'),
#'    ConditionalMin = c(0,0,0,0,0), ConditionalMax = c(30,15,1000,2000,2000)
#'    #conditional min max filter based on parameter with minimum reasonable value
#'    #for oxygen, pH, chlorophyll a, and PPFD being 0 and the maximum
#'    #reasonable value for oxygen and pH being 15 and chlorophyll 1000 and PPFD being 2000
#'    max.gap = 900)
#'    #The maximum data gap that should be interpolated is 15 minutes or 900 seconds.
#'
#' #Example: Running full despiking work flow with batch process from a folder of
#' #csv files, with default despiking algorithm (threshold of 3 MAD from the median)
#' dspk.DespikingWorkflow.CSVfileBatchProcess(
#'    input.directory = 'data/PPFD.data', #Load csv files from the folder 'PPFD.data'
#'    sep = ',', dec = '.', header = T,  #The csv files are separated by commas with
#'    #point decimals and the first line is the column names.
#'    Value = 3, val.NAvalue = -777, #The values are found in the third column.
#'    #-777 stands for no-value
#'    DateTime = "DateTime",
#'    #The datetime column name is "DateTime".
#'    datetime.format = "%Y-%m-%d %H:%M:%S",
#'    #The datetimes are character strings with this format "2018-04-23 15:32:18".
#'    datetime.timezone = 'Etc/GMT-1', #The time zone is UTC+1.
#'    Min=(-50), Max=1700, #The minimum reasonable value is -50 and the maximum  is 1700
#'    sampling.interval = NULL,
#'    #Sampling interval is regular and it will be calculated from the provided data
#'    precision = NULL, #The data precision will be calculated from the data.
#'    max.gap = 3600)
#'    #The maximum data gap that should be interpolated is on hour or 3600 seconds.
#'
#' #Example: Max filter on a vector with interpolation of resulting data gaps of up to 2 records
#' example.data = c(2,2,4,16,-4,2,0,96,8,12,26,66,2)
#' dspk.DespikingWorkflow.CSVfileBatchProcess(
#'    steps = c(1,3), #run step 1 min/max filter and step 3 data gap interpolation
#'    Value = example.data, #The values are found in the vector 'example.data'
#'    Max=10, #Filter out all values above 10
#'    sampling.interval = 60,
#'    #This is extraneous information since no time data was given, it will be ignored
#'    precision = 2, #The data are all even so precision is set to 2.
#'    max.gap = 2)  #The maximum data gap that should be interpolated is 2 missing values.
#' >Output CSV file:
#' >"Value","dspk.Values","dspk.DateTimeNum","dspk.StateOfValue"
#' >  2,         2,                1,                110
#' >  2,         2,                2,                110
#' >  4,         4,                3,                110
#' >  16,        0,                4,                91
#' >  -4,       -4,                5,                110
#' >  2,         2,                6,                110
#' >  0,         0,                7,                110
#' >  96,        4,                8,                91
#' >  8,         8,                9,                110
#' >  12,        NA,               10,               91
#' >  26,        NA,               11,               91
#' >  66,        NA,               12,               91
#' >  2,         2,                13,               110
#'
#' #Example: Full despiking work flow on an R dataframe with no data gaps.
#' #999 is the NA value. Despiking is done with a threshold of 4 using the
#' #standard deviations from the mean.
#' dspk.DespikingWorkflow.CSVfileBatchProcess(
#'      Value = datatable$values, val.NAvalue = 999,
#'      #The values are found in datatable$values. 999 stands for no-value
#'      Min=(0), Max=200, #The minimum resonable value is 0 and the maximum  is 200
#'      despike.threshold = 4, despike.Method = "mean",
#'      #The despiking algorithm is all data points more than 4 standard deviations
#'      #from the mean of the surrounding 10 data points
#'      precision = 0.01, #The data has a precision to the hundredth decimal place.
#'      max.gap = 10) #the maximum data gap that should be interpolated is 10 samples.
#'
#' #Example: The same example as above but now entering in the dataframe in
#' #the Data =' argument.
#' dspk.DespikingWorkflow.CSVfileBatchProcess(
#'      Data = datatable, Value = 'values', val.NAvalue = 999,
#'      #The values are found in datatable$values. 999 stands for no-value
#'      Min=(0), Max=200, #The minimum reasonable value is 0 and the maximum is 200
#'      despike.threshold = 4, despike.Method = "mean",
#'      #The despiking algorithm is all data points more than 4 standard deviations
#'      #from the mean of the surrounding 10 data points
#'      max.gap = 10)  #the maximum data gap that should be interpolated is 10 samples.
#'
#' #Example: Full despiking work flow with a conditional min max filter. No Min or Max
#' #was given so if the conditions are not met, then the min and max will be set to
#' #infinity.
#' dspk.DespikingWorkflow.CSVfileBatchProcess(
#'      Data = datatable, Value = "values", #The values are found in datatable$values.
#'      ConditionalMinMaxColumn = "Parameter.Name",
#'      #The factors for basing the conditional min max are in column "Parameter.Name"
#'      ConditionalMinMaxValues = c('PercentO2','Temp'),
#'      ConditionalMin = c(0,-30), ConditionalMax = c(100,150)
#'      #conditional min max filter based on parameter with minimum reasonable value
#'      #for percent oxygen being 0% and for temperature being -30C and maximum
#'      #reasonable value for percent oxygen being 100% and for temperature being 150C
#'      max.gap = 10) #The maximum data gap that should be interpolated is 10 samples.
#'
#' #Example: Full despiking work flow with a conditional min max filter. A Min and a
#' #Max was now given so if the conditions are not met, then the min and max will be
#' #set to 10 and 100. If you give a Min and a Max in addition to the conditional
#' #values, then if the conditions are not met the min and the max will be set to
#' #those values given.
#' dspk.DespikingWorkflow.CSVfileBatchProcess(
#'      Data = datatable, Value = "values", #The values are found in datatable$values.
#'      Min = 10, Max = 100, #if the bellow conditional min max values are not met,
#'      #then it will take these values as the min and max
#'      ConditionalMinMaxColumn = "Parameter.Name",
#'      #The factors for basing the conditional min max are in column "Parameter.Name"
#'      ConditionalMinMaxValues = c('PercentO2','Temp'),
#'      ConditionalMin = c(0,-30),
#'      ConditionalMax = c(100,150)
#'      #conditional min max filter based on parameter with minimum reasonable value
#'      #for percent oxygen being 0% and for temperature being -30C and maximum
#'      #reasonable value for percent oxygen being 100% and for temperature being 150C
#'      max.gap = 10) #The maximum data gap that should be interpolated is 10 samples.

dspk.DespikingWorkflow.CSVfileBatchProcess <- function(steps = c(1,2,3),input.directory = NULL,
                                                       sep = ',', dec = '.', header = T,
                                                       #formating function
                                                       Data = NULL, #you can enter a dataframe here but this will be ignored if an input directory is provided
                                                       Value, val.NAvalue = NULL, unchecked.state.of.value.code = 110, NA.state.of.value.code = 255, add.original.data = T,
                                                       DateTime = NULL, datetime.format = NULL, datetime.timezone = "GMT",
                                                       #min max
                                                       ConditionalMinMaxColumn = NULL, ConditionalMinMaxValues = NULL, ConditionalMin = NULL, ConditionalMax = NULL,
                                                       Min = (-Inf), Max = Inf, minmax.state.of.value.code = 91,
                                                       #Despike
                                                       sampling.interval = NULL, despiked.state.of.value.code = 92, good.state.of.value.code = 80, despike.threshold = 3, despike.Method = "median",
                                                       #interpolate
                                                       precision = NULL, max.gap = Inf) {

  #function to identify type of input and covert to text conserving quotes and vectors
  #It is importaint that you know exactly what was entered into the function arguments
  itqv <- function(x){
    fff<-function(x){
      if(is.null(x)){return("NULL")}
      else if(is.na(x)){return("NA")}
      else if(is.character(x)){return(paste0("'",x,"'"))}
      else{return(x)}
    }
    if(length(x)>1){return(paste0("c(",paste(lapply(x, fff),collapse = ","),")"))
    }else if(length(x)<1&!is.null(x)){return("''")
    }else{return(fff(x))}
  }

  logdata <- '----Log file for batch process despiking function dspk.DespikingWorkflow.CSVfileBatchProcess()-----'
  logdata <- rbind(logdata,'---------------------------------------------------------------------------------------------------')
  logdata <- rbind(logdata,'Below you will find a quick overview of the function, a list of the arguments provided to the function and a work log')
  logdata <- rbind(logdata,'showing the output directory, original CSV files processed, summaries of each step, files-specific calculated arguments and error reports.')
  logdata <- rbind(logdata,'For support and bug reporting please contact PaliFelice.Gelsomini@uantwerpen.be')
  logdata <- rbind(logdata,'')
  logdata <- rbind(logdata,'')
  logdata <- rbind(logdata,paste('Date and time of data processing:',format(Sys.time(),format = '%Y.%m.%d %H:%M:%S %Z')))
  logdata <- rbind(logdata,'')
  logdata <- rbind(logdata,'')
  logdata <- rbind(logdata,'---Overview---')
  logdata <- rbind(logdata,'Despiking workflow: preproces-formatting, step 1 min max filter, step 2 despiking and step 3 data gap interpolation')
  logdata <- rbind(logdata,paste('Performing preproces and steps:',paste(steps,collapse = ',')))
  logdata <- rbind(logdata,'')
  logdata <- rbind(logdata,paste('On CSV files in directory:',input.directory))
  if(is.null(input.directory))logdata <- rbind(logdata,paste('On dataframe from R environment:', ifelse(is.data.frame(Data),deparse(substitute(Data)),itqv(Data))))
  logdata <- rbind(logdata,'')
  logdata <- rbind(logdata,'')
  logdata <- rbind(logdata,'Despiking algorithm: With the default “despike.Method” median and the default “despike.threshold” 3: all data points that are more than 3 median absolute deviations ')
  logdata <- rbind(logdata,'away from the median of the 10 surrounding data points (5 before and 5 after) will be deleted. At least 5 surrounding data points is required for the sample to be evaluated. ')
  logdata <- rbind(logdata,'The algorithm will not look farther than 5 sampling intervals before and after the data point, for handling data gaps. If a “sampling.interval” is not provided then it will ')
  logdata <- rbind(logdata,'be calculated as the mode of the interval between samples. The "dspk.StateOfValue" of the deleted values will be set to “despiked.state.of.value.code” (default 92). The ')
  logdata <- rbind(logdata,'"dspk.StateOfValue" of the values that passed the despike test will be set to “good.state.of.value.code” (default 80). ')
  logdata <- rbind(logdata,'')
  logdata <- rbind(logdata,'')
  logdata <- rbind(logdata,'---Function Arguments:---')
  logdata <- rbind(logdata,paste('steps =',itqv(steps)))
  logdata <- rbind(logdata,paste('input.directory =', itqv(input.directory),', Data =',ifelse(is.data.frame(Data),deparse(substitute(Data)),itqv(Data))))
  logdata <- rbind(logdata,paste('preprocessing:'))
  logdata <- rbind(logdata,paste('sep =',itqv(sep),', dec =', itqv(dec), ', header =', itqv(header),', add.original.data =', itqv(add.original.data)))
  logdata <- rbind(logdata,paste('Value =',itqv(Value), ', val.NAvalue =',itqv(val.NAvalue)))
  logdata <- rbind(logdata,paste('unchecked.state.of.value.code =', itqv(unchecked.state.of.value.code), ', NA.state.of.value.code =', itqv(NA.state.of.value.code)))
  logdata <- rbind(logdata,paste('DateTime =', itqv(DateTime), ', datetime.format =', itqv(datetime.format), ', datetime.timezone =', itqv(datetime.timezone)))
  logdata <- rbind(logdata,paste('Step 1 min max filter:'))
  logdata <- rbind(logdata,paste('ConditionalMinMaxColumn =', itqv(ConditionalMinMaxColumn), ', ConditionalMinMaxValues =', itqv(ConditionalMinMaxValues), ', ConditionalMin =', itqv(ConditionalMin), ', ConditionalMax =', itqv(ConditionalMax)))
  logdata <- rbind(logdata,paste('Min =', itqv(Min), ', Max =', itqv(Max), ', minmax.state.of.value.code =', itqv(minmax.state.of.value.code)))
  logdata <- rbind(logdata,paste('Step 2 despiking:'))
  logdata <- rbind(logdata,paste('sampling.interval =', itqv(sampling.interval), ', despiked.state.of.value.code =', itqv(despiked.state.of.value.code), ', good.state.of.value.code =', itqv(good.state.of.value.code), ', despike.threshold =', itqv(despike.threshold), ', despike.Method =', itqv(despike.Method)))
  logdata <- rbind(logdata,paste('Step 3 data gap interpolation:'))
  logdata <- rbind(logdata,paste('precision =', itqv(precision), ', max.gap =', itqv(max.gap)))

  logdata <- rbind(logdata,'')
  logdata <- rbind(logdata,'')
  logdata <- rbind(logdata,'')
  logdata <- rbind(logdata,'---Work log---')

  #check that steps are entered correctly
  if(!is.vector(steps))  stop('steps must be entered in as a numeric vector of one through three e.g. c(1,2,3)')
  if(!is.numeric(steps)) stop('steps must be entered in as a numeric vector of one through three e.g. c(1,2,3)')
  message('performing steps:')
  message(steps)


  #create a unique directory with the current time to save data into
  CurrentTime <- format(Sys.time(),"%Y%m%d%H%M%S")
  #create directory if it doen't exist, else add a number to the end till it doen't exist
  dir0 <- paste0("autodespike",CurrentTime)
  if(dir.exists(dir0)){
    dup <- 1
    while(dir.exists(paste0(dir0,'_',dup))){dup <- dup+1}
    dir0 <- paste0(dir0,'_',dup)
    }
  dir.create(dir0)
  message(paste("created directory:",dir0))
  logdata <- rbind(logdata,paste("All data saved into output directory:",dir0))

  dir1 <- paste0(dir0,"/preprocFormat")
  dir2 <- paste0(dir0,"/step1MinMax")
  dir3 <- paste0(dir0,"/step2Despike")
  dir4 <- paste0(dir0,"/step3Interpol.FinalData")

  logdata <- rbind(logdata,'')
  logdata <- rbind(logdata,'')


  #Formatting--------------
  #Note check if the table is already formated
  output.directory <- dir1
  directory<-input.directory
  file.name.note <- '.formatted'
  NAvalue = ''
  if(is.null(directory)){ #if no directory is given
    f<-1                  #then we can only process one file
  }else{
    #list files in directory
    files <- list.files(directory)
    logdata <- rbind(logdata,paste('Files in input directory:',paste(files,collapse = ',')))
    #number of files in the directory
    f <- length(files)
  }

  logdata <- rbind(logdata,'')
  logdata <- rbind(logdata,'')
  message('--------------');message('--------------');message('--------------')

  for(J in 1:f){  #load all the files in the directory. Files 1 to f
    tryCatch({   #in case of error, catch error and tell me the file name
      message(paste("Pre-process: File formatting: file",J,"of",f))
      logdata <- rbind(logdata,'')


      if(!is.null(directory)){
        #path for file number J
        CONfile <- paste(directory,files[J],sep = "/")

        #--file data--
        #load the Data table from the text file
        CSV.table <- read.table(CONfile, sep= sep, header = header,dec = dec)
        logdata <- rbind(logdata,paste('Preprocess: formatting: file',files[J]))
      }else{
        files <- "InputData"
        CSV.table <- Data
        if(!(is.null(CSV.table)|is.data.frame(CSV.table))){stop('"Data =" input must either be NULL or a dataframe.')}
        if(is.null(input.directory))logdata <- rbind(logdata,paste('Loaded dataframe from R environment:', ifelse(is.data.frame(Data),deparse(substitute(Data)),itqv(Data))))
      }
      #--run the batched process and save it to a new table--
      New.table <- dspk.TableFormatting(Data = CSV.table, Value = Value, DateTime = DateTime, NAvalue = val.NAvalue, datetime.format = datetime.format, datetime.timezone = datetime.timezone, state.of.value.code = unchecked.state.of.value.code, state.of.value.code.na = NA.state.of.value.code, add.original.data = add.original.data)


      #--save data to csv's--
      #create directory output.directory if it doen't exist
      if(!dir.exists(output.directory)){
        dir.create(output.directory)
        message(paste("created directory:",output.directory))
      }
      write.csv(New.table, file = paste0(output.directory,"/",files[J],file.name.note,".csv"),row.names = F)

    },
    #in case of error give file name and error message
    error=function(cond){
      message(paste("File caused error:",files[J]))
      message("Original error:")
      message(cond)
      logdata <<- rbind(logdata,paste("File caused error:",files[J]))
      logdata <<- rbind(logdata,paste("Original error:",cond))
      #return(NULL)
    }
    )#end of try catch
  }#end of for loop
  logdata <- rbind(logdata,'')
  logdata <- rbind(logdata,'')


  #min max-----------
  if(1 %in% steps){
    directory = dir1
    output.directory = dir2
    file.name.note = '.minmax'
    sep = ','
    dec = '.'
    header = T
    NAvalue = ''
    message('--------------');message('--------------');message('--------------')

    #list files in directory
    files <- list.files(directory)
    #number of files in the directory
    f <- length(files)
    for(J in 1:f){  #load all the files in the directory. Files 1 to f
      tryCatch({   #in case of error, catch error and tell me the file name
        message(paste("Step 1: min/max: file",J,"of",f))
        logdata <- rbind(logdata,'')
        logdata <- rbind(logdata,paste("Step 1: min/max: file ",files[J]))

        #path for file number J
        CONfile <- paste(directory,files[J],sep = "/")

        #--file data--
        #load the Data table from the text file
        CSV.table <- read.table(CONfile, sep= sep, header = header,dec = dec)

        #--conditional min max--
        #if all the conditional variables are filled in and the given vectors are all the same length
        if(is.null(Min)){Min = -Inf}
        if(is.null(Max)){Max = Inf}
        if(!is.null(ConditionalMinMaxColumn)&!is.null(ConditionalMinMaxValues)&!is.null(ConditionalMin)&!is.null(ConditionalMax)&
           length(ConditionalMinMaxValues)==length(ConditionalMin)&length(ConditionalMax)==length(ConditionalMin)){
          conditionalvalue <- as.character(dspk.vectorize(ConditionalMinMaxColumn, Data = CSV.table)[1])#get the conditional value out of the datatable
          minmaxindex <- which(ConditionalMinMaxValues==conditionalvalue)
          if(length(minmaxindex)==1){ #check that there is only and atleast one mach
            Min1<-ConditionalMin[minmaxindex]
            Max1<-ConditionalMax[minmaxindex]
          }else{
            Min1 <- Min
            Max1 <- Max
          }
          if(is.na(Min1)){Min1 = Min}
          if(is.na(Max1)){Max1 = Max}
          message(paste('conditional min max value:',conditionalvalue))
          logdata <- rbind(logdata,paste('conditional min max value:',conditionalvalue))
          message(paste('conditional min:',Min1))
          logdata <- rbind(logdata,paste('conditional min:',Min1))
          message(paste('conditional min:',Max1))
          logdata <- rbind(logdata,paste('conditional min:',Max1))
        }else{
          Min1 = Min
          Max1 = Max
        }
        #if the conditional vector returns an NA because for example the current parameter was not given in the given vectors, then take the non conditional min and max
        if(is.na(Min1)){Min1 = Min}
        if(is.na(Max1)){Max1 = Max}
        Min1 = (as.numeric(Min1))
        Max1 = (as.numeric(Max1))

        #--run the batched process and save it to a new table--
        New.table <- dspk.MinMaxfilter(Data = CSV.table, Value = "dspk.Values", Min = Min1, Max = Max1, State.of.value.data = "dspk.StateOfValue", state.of.value.code = minmax.state.of.value.code, NAvalue = NULL,logoutput = T)
        logdata <- rbind(logdata,t(t(unlist(New.table$logdata))))
        New.table <- as.data.frame(New.table$data)
        #!!!!!!!!!This is a work around for not adding all the data from the before function
        CSV.table$dspk.Values <- New.table$dspk.Values
        CSV.table$dspk.StateOfValue <- New.table$dspk.StateOfValue
        New.table <- CSV.table

        #--save data to csv's--
        #create directory output.directory if it doen't exist
        if(!dir.exists(output.directory)){
          dir.create(output.directory)
          message(paste("created directory:",output.directory))
        }
        write.csv(New.table, file = paste0(output.directory,"/",files[J],file.name.note,".csv"),row.names = F)

      },
      #in case of error give file name and error message
      error=function(cond){
        message(paste("File caused error:",files[J]))
        message("Original error:")
        message(cond)
        logdata <<- rbind(logdata,paste("File caused error:",files[J]))
        logdata <<- rbind(logdata,paste("Original error:",cond))
        #return(NULL)
      }
      )#end of try catch
    }#end of for loop
  }
  logdata <- rbind(logdata,'')
  logdata <- rbind(logdata,'')


  #despike-------
  if(2 %in% steps){
    if(1 %in% steps){directory = dir2}else{directory = dir1}
    output.directory = dir3
    file.name.note = '.despiked'
    sep = ','
    dec = '.'
    header = T
    NAvalue = ''

    message('--------------');message('--------------');message('--------------')

    #list files in directory
    files <- list.files(directory)
    #number of files in the directory
    f <- length(files)
    for(J in 1:f){  #load all the files in the directory. Files 1 to f
      tryCatch({   #in case of error, catch error and tell me the file name
        message(paste("Step 2: despike: file",J,"of",f))
        logdata <- rbind(logdata,'')
        logdata <- rbind(logdata,paste("Step 2: despike: file ",files[J]))

        #path for file number J
        CONfile <- paste(directory,files[J],sep = "/")

        #--file data--
        #load the Data table from the text file
        CSV.table <- read.table(CONfile, sep= sep, header = header,dec = dec)

        #--get sampling interval--
        if(is.null(DateTime)){ #if a date time was not given, then the datetime comumn is just the samples numbered so the interval is 1
          sampling.interval2<-1 #this would be done by the subfunction dspk.Spikefilter but it needs to be done here first since technically a datatime was entered into that subfunction from the previous step
        }else{sampling.interval2<-sampling.interval}

        #--run the batched process and save it to a new table--
        New.table <- dspk.Spikefilter(Data = CSV.table, Value = "dspk.Values", NumDateTime = "dspk.DateTimeNum", sampling.interval = sampling.interval2, State.of.value.data = "dspk.StateOfValue", state.of.value.code = despiked.state.of.value.code, good.state.of.value.code = good.state.of.value.code, NAvalue = NULL, threshold = despike.threshold, precision = precision, Method = despike.Method,logoutput = T)
        logdata <- rbind(logdata,t(t(unlist(New.table$logdata))))
        New.table <- as.data.frame(New.table$data)
        #!!!!!!!!!This is a work around for not adding all the data from the before function
        CSV.table$dspk.Values <- New.table$dspk.Values
        CSV.table$dspk.StateOfValue <- New.table$dspk.StateOfValue
        New.table <- CSV.table

        #--save data to csv's--
        #create directory output.directory if it doen't exist
        if(!dir.exists(output.directory)){
          dir.create(output.directory)
          message(paste("created directory:",output.directory))
        }
        write.csv(New.table, file = paste0(output.directory,"/",files[J],file.name.note,".csv"),row.names = F)

      },
      #in case of error give file name and error message
      error=function(cond){
        message(paste("File caused error:",files[J]))
        message("Original error:")
        message(cond)
        logdata <<- rbind(logdata,paste("File caused error:",files[J]))
        logdata <<- rbind(logdata,paste("Original error:",cond))
        #return(NULL)
      }
      )#end of try catch
    }#end of for loop
  }
  logdata <- rbind(logdata,'')
  logdata <- rbind(logdata,'')


  #interpolate----------
  if(3 %in% steps){

    if(2 %in% steps){
      directory = dir3
    }else if((1 %in% steps)){
      directory = dir2
    }else{
      directory = dir1
    }
    output.directory = dir4
    file.name.note = '.interpol'
    sep = ','
    dec = '.'
    header = T
    NAvalue = ''

    message('--------------');message('--------------');message('--------------')

    #list files in directory
    files <- list.files(directory)
    #number of files in the directory
    f <- length(files)
    for(J in 1:f){  #load all the files in the directory. Files 1 to f
      tryCatch({   #in case of error, catch error and tell me the file name
        logdata <- rbind(logdata,'')
        message(paste("Step 3: gap interpolation: file",J,"of",f))
        logdata <- rbind(logdata,paste("Step 3: gap interpolation: file ",files[J]))

        #path for file number J
        CONfile <- paste(directory,files[J],sep = "/")

        #--file data--
        #load the Data table from the text file
        CSV.table <- read.table(CONfile, sep= sep, header = header,dec = dec)

        #--run the batched process and save it to a new table--
        New.table <- dspk.DataGapInterpolation(Data = CSV.table, Value = "dspk.Values", precision = precision, NumDateTime = "dspk.DateTimeNum", max.gap = max.gap, State.of.value.data = "dspk.StateOfValue", state.of.value.code = 93, NAvalue = NULL,logoutput = T)
        logdata <- rbind(logdata,t(t(unlist(New.table$logdata))))
        New.table <- as.data.frame(New.table$data)
        #!!!!!!!!!This is a work around for not adding all the data from the before function
        CSV.table$dspk.Values <- New.table$dspk.Values
        #CSV.table$dspk.StateOfValue <- New.table$dspk.StateOfValue #i am not adding the new state of values from the interpolation so that we know that their original state was.
        New.table <- CSV.table

        #--save data to csv's--
        #create directory output.directory if it doen't exist
        if(!dir.exists(output.directory)){
          dir.create(output.directory)
          message(paste("created directory:",output.directory))
        }
        write.csv(New.table, file = paste0(output.directory,"/",files[J],file.name.note,".csv"),row.names = F)

      },
      #in case of error give file name and error message
      error=function(cond){
        message(paste("File caused error:",files[J]))
        message("Original error:")
        message(cond)
        logdata <<- rbind(logdata,paste("File caused error:",files[J]))
        logdata <<- rbind(logdata,paste("Original error:",cond))
        #return(NULL)
      }
      )#end of try catch
    }#end of for loop
  }


  fileConn<-file(paste0(dir0,"/FunctionLogFile.txt"))
  writeLines(logdata, fileConn)
  close(fileConn)


}
pgelsomini/HICbioclean documentation built on Dec. 28, 2021, 5:22 p.m.
rdrr.io home R language documentation Run R code online
CRAN packages Bioconductor packages R-Forge packages GitHub packages
Note that we can't provide technical support on individual packages. You should contact the package authors for that.
pgelsomini/HICbioclean
Auto and Manual Validation and Calibration of Continuous Biological Parameters from the Flemish Hydrological Information Center (HIC) Database

R/SpikeRemovalFunctions.R
In pgelsomini/HICbioclean: Auto and Manual Validation and Calibration of Continuous Biological Parameters from the Flemish Hydrological Information Center (HIC) Database

Defines functions dspk.DespikingWorkflow.CSVfileBatchProcess dspk.BatchProcess dspk.DataGapInterpolation dspk.Spikefilter dspk.MinMaxfilter dspk.TableFormatting dspk.vectorize dspk.getmode

Documented in dspk.BatchProcess dspk.DataGapInterpolation dspk.DespikingWorkflow.CSVfileBatchProcess dspk.getmode dspk.MinMaxfilter dspk.Spikefilter dspk.TableFormatting dspk.vectorize

R Package Documentation

Browse R Packages

We want your feedback!

pgelsomini/HICbioclean Auto and Manual Validation and Calibration of Continuous Biological Parameters from the Flemish Hydrological Information Center (HIC) Database

R/SpikeRemovalFunctions.R In pgelsomini/HICbioclean: Auto and Manual Validation and Calibration of Continuous Biological Parameters from the Flemish Hydrological Information Center (HIC) Database

Defines functions dspk.DespikingWorkflow.CSVfileBatchProcess dspk.BatchProcess dspk.DataGapInterpolation dspk.Spikefilter dspk.MinMaxfilter dspk.TableFormatting dspk.vectorize dspk.getmode

Documented in dspk.BatchProcess dspk.DataGapInterpolation dspk.DespikingWorkflow.CSVfileBatchProcess dspk.getmode dspk.MinMaxfilter dspk.Spikefilter dspk.TableFormatting dspk.vectorize

R Package Documentation

Browse R Packages

We want your feedback!

pgelsomini/HICbioclean
Auto and Manual Validation and Calibration of Continuous Biological Parameters from the Flemish Hydrological Information Center (HIC) Database

R/SpikeRemovalFunctions.R
In pgelsomini/HICbioclean: Auto and Manual Validation and Calibration of Continuous Biological Parameters from the Flemish Hydrological Information Center (HIC) Database