R/importADMS.R
In openair: Tools for the Analysis of Air Pollution Data

Documented in importADMS importADMSBgd importADMSMet importADMSMop importADMSPst

#' CERC Atmospheric Dispersion Modelling System (ADMS) data import function(s)
#' for openair
#'
#' Function(s) to import various ADMS file types into openair. Currently handles
#' ".met", ".bgd", ".mop" and ".pst" file structures. Uses [utils::read.csv()]
#' to read in data, format for R and openair and apply some file structure
#' testing.
#'
#' The \code{importADMS} function were developed to help import various ADMS
#' file types into openair. In most cases the parent import function should work
#' in default configuration, e.g. \code{mydata <- importADMS()}. The function
#' currently recognises four file formats: \code{.bgd}, \code{.met}, \code{.mop}
#' and \code{.pst}.  Where other file extensions have been set but the file
#' structure is known, the import call can be forced by, e.g, \code{mydata <-
#' importADMS(file.type="bgd")}. Other options can be adjusted to provide fine
#' control of the data structuring and renaming.
#'
#' @aliases importADMS importADMSBgd importADMSMet importADMSMop importADMSPst
#' @param file The ADMS file to be imported. Default, [file.choose()] opens
#'   browser. Use of [utils::read.csv()] also allows this to be a readable
#'   text-mode connection or url (although these options are currently not fully
#'   tested).
#' @param file.type Type of ADMS file to be imported. With default, "unknown",
#'   the import uses the file extension to identify the file type and, where
#'   recognised, uses this to identify the file structure and import method to
#'   be applied. Where file extension is not recognised the choice may be forced
#'   by setting \code{file.type} to one of the known \code{file.type} options:
#'   "bgd", "met", "mop" or "pst".
#' @param drop.case Option to convert all data names to lower case. Default,
#'   \code{TRUE}.  Alternative, \code{FALSE}, returns data with name cases as
#'   defined in file.
#' @param drop.input.dates Option to remove ADMS "hour", "day", and "year" data
#'   columns after generating openair "date" timeseries. Default, \code{TRUE}.
#'   Alternative, \code{FALSE}, returns both "date" and the associated ADMS data
#'   columns as part of openair data frame.
#' @param keep.units Option to retain ADMS data units. Default, \code{TRUE},
#'   retains units (if recoverable) as character vector in data frame comment if
#'   defined in \code{file}.  Alternative, \code{FALSE}, discards units. (NOTE:
#'   currently, only \code{.bgd} and \code{.pst} files assign units. So, this
#'   option is ignored when importing \code{.met} or \code{.mop} files.)
#' @param simplify.names Option to simplify data names in accordance with common
#'   \code{openair} practices. Default, \code{TRUE}. Alternative, \code{FALSE},
#'   returns data with names as interpreted by standard R. (NOTE: Some ADMS file
#'   data names include symbols and structures that R does not allow as part of
#'   a name, so some renaming is automatic regardless of \code{simplify.names}
#'   setting.  For example, brackets or symbols are removed from names or
#'   replaced with ".", and names in the form "1/x" may be returned as "X1.x" or
#'   "recip.x".)
#' @param test.file.structure Option to test file structure before trying to
#'   import. Default, \code{TRUE}, tests for expected file structure and halts
#'   import operation if this is not found.  Alternative, \code{FALSE}, attempts
#'   import regardless of structure.
#' @param drop.delim Option to remove delim columns from the data frame. ADMS
#'   .mop files include two columns, "INPUT_DATA:" and "PROCESSED_DATA:", to
#'   separate model input and output types.  Default, \code{TRUE}, removes
#'   these. Alternative, \code{FALSE}, retains them as part of import.  (Note:
#'   Option ignored when importing \code{.bgd}, \code{.met} or \code{.pst}
#'   files.)
#' @param add.prefixes Option to add prefixes to data names. ADMS .mop files
#'   include a number of input and process data types with shared names.
#'   Prefixes can be automatically added to these so individual data can be
#'   readily identified in the R/openair environment. Default, \code{TRUE}, adds
#'   "process." as a prefix to processed data. Other options include:
#'   \code{FALSE} which uses no prefixes and leave all name rationalisation to
#'   R, and character vectors which are treated as the required prefixes. If one
#'   vector is sent, this is treated as processed data prefix. If two (or more)
#'   vectors are sent, the first and second are treated as the input and
#'   processed data prefixes, respectively. For example, the argument
#'   (\code{add.prefixes="out"}) would add the "out" prefix to processed data
#'   names, while the argument (\code{add.prefixes=c("in","out")}) would add
#'   "in" and "out" prefixes to input and output data names, respectively.
#'   (Note: Option ignored when importing \code{.bgd}, \code{.met} or
#'   \code{.pst} files.)
#' @param names Option applied by \code{simplifyNamesADMS} when
#'   \code{simplify.names} is enabled.  All names are simplified for the default
#'   setting, \code{NULL}.
#' @param all For .MOP files, return all variables or not. If \code{all = TRUE}
#'   a large number of processed variables are returned.
#' @inheritDotParams utils::read.csv
#' @export
#' @return In standard use [importADMS()] returns a data frame for use in
#'   openair. By comparison to the original file, the resulting data frame is
#'   modified as follows:
#'
#'   Time and date information will combined in a single column "date",
#'   formatted as a conventional timeseries (\code{as.POSIX*}). If
#'   \code{drop.input.dates} is enabled data series combined to generated the
#'   new "date" data series will also be removed.
#'
#'   If \code{simplify.names} is enabled common chemical names may be
#'   simplified, and some other parameters may be reset to openair standards
#'   (e.g. "ws", "wd" and "temp") according to operations defined in
#'   \code{simplifyNamesADMS}. A summary of simplification operations can be
#'   obtained using, e.g., the call \code{importADMS(simplify.names)}.
#'
#'   If \code{drop.case} is enabled all upper case characters in names will be
#'   converted to lower case.
#'
#'   If \code{keep.units} is enabled data units information may also be retained
#'   as part of the data frame comment if available.
#'
#'   With \code{.mop} files, input and processed data series names may also been
#'   modified on the basis of \code{drop.delim} and \code{add.prefixes} settings
#' @note Times are assumed to be in GMT. Zero wind directions reset to 360 as
#'   part of \code{.mop} file import.
#' @author Karl Ropkins, David Carslaw and Matthew Williams (CERC).
#' @family import functions
#' @examples
#' ##########
#' #example 1
#' ##########
#' #To be confirmed
#'
#' #all current simplify.names operations
#' importADMS(simplify.names)
#'
#' #to see what simplify.names does to adms data series name PHI
#' new.name <- importADMS(simplify.names, names="PHI")
#' new.name
importADMS <- function(file = file.choose(), file.type = "unknown",
                       drop.case = TRUE, drop.input.dates = TRUE, keep.units = TRUE,
                       simplify.names = TRUE, test.file.structure = TRUE,
                       drop.delim = TRUE, add.prefixes = TRUE,
                       names = NULL,
                       all = FALSE,
                       ...) {

  # importADMS
  # v0.2 kr
  # parent with four daughters (below)
  # Bgd, Mop and Met import methods and simplifyNamesADMS

  if (substitute(file) == "simplify.names") {
    return(simplifyNamesADMS(names))
  }

  if (file.type == "unknown") {
    file.type <- tolower(substr(file, nchar(file) - 3, nchar(file)))
    if (substr(file.type, 1, 1) == ".") {
      file.type <- substr(file.type, 2, 4)
    } else {
      stop(
        "File extension not recognised\n       [If valid ADMS file, try setting file.type to one of: bgd, mop or met]",
        call. = FALSE
      )
    }
  }

  if (file.type == "bgd") {
    return(importADMSBgd(
      file = file, drop.case = drop.case, drop.input.dates = drop.input.dates,
      keep.units = keep.units, simplify.names = simplify.names,
      test.file.structure = test.file.structure,
      drop.delim = drop.delim, add.prefixes = add.prefixes,
      ...
    ))
  }
  if (file.type == "mop") {
    return(importADMSMop(
      file = file, drop.case = drop.case, drop.input.dates = drop.input.dates,
      keep.units = keep.units, simplify.names = simplify.names,
      test.file.structure = test.file.structure,
      drop.delim = drop.delim, add.prefixes = add.prefixes, all = all,
      ...
    ))
  }
  if (file.type == "met") {
    return(importADMSMet(
      file = file, drop.case = drop.case, drop.input.dates = drop.input.dates,
      keep.units = keep.units, simplify.names = simplify.names,
      test.file.structure = test.file.structure,
      drop.delim = drop.delim, add.prefixes = add.prefixes,
      ...
    ))
  }
  if (file.type == "pst") {
    return(importADMSPst(
      file = file, drop.case = drop.case, drop.input.dates = drop.input.dates,
      keep.units = keep.units, simplify.names = simplify.names,
      test.file.structure = test.file.structure,
      drop.delim = drop.delim, add.prefixes = add.prefixes,
      ...
    ))
  }

  stop(
    "File extension not recognised\n       [If valid ADMS file, try setting file.type to one of: bgd, mop, met or pst]",
    call. = FALSE
  )
}

###############
## daughter
## importADMSBgd

importADMSBgd <- function(file = file.choose(),
                          drop.case = TRUE, drop.input.dates = TRUE,
                          keep.units = TRUE, simplify.names = TRUE,
                          test.file.structure = TRUE,
                          drop.delim = TRUE, add.prefixes = TRUE,
                          ...) {
  bgd <- readLines(file, n = -1)
  bgd <- sub("[[:space:]]+$", "", bgd) # strip out tail spaces

  loc.start <- which(tolower(bgd) == "backgroundversion2")
  if (test.file.structure) {
    if (length(loc.start) == 0) {
      stop(
        "File not recognised ADMS.bgd structure\n       [please contact openair if valid]",
        call. = FALSE
      )
    }
  }
  if (length(loc.start) > 1) {
    warning(
      "Multiple possible variable starts, taking last\n       [please contact openair problems encountered]",
      call. = FALSE
    )
    loc.start <- loc.start[length(loc.start)]
  }
  no.var <- suppressWarnings(as.numeric(bgd[loc.start + 1]))[1]
  if (test.file.structure & is.na(no.var)) {
    stop(
      "File not recognised ADMS.bgd structure\n       [please contact openair if valid]",
      call. = FALSE
    )
  }
  variables <- bgd[(loc.start + 2):(loc.start + 1 + no.var)]

  if (simplify.names) {
    variables <- simplifyNamesADMS(variables)
  }

  # drop messy name handling
  variables <- gsub("[.][.]", ".", variables)
  variables <- gsub("^[.]", "", variables)

  if (drop.case) {
    variables <- tolower(variables)
  }

  units.start <- which(substr(bgd, 1, 6) == "UNITS:")
  if (length(units.start) == 0) {
    warning(
      "Data units not extracted from ADMS.bgd\n       [please contact file structure if problems encountered]",
      call. = FALSE
    )
    units <- "units: undefined"
  }
  if (length(units.start) > 1) {
    warning(
      "Multiple possible unit starts, taking last\n       [please contact openair problems encountered]",
      call. = FALSE
    )
    units.start <- units.start[length(units.start)]
  }
  units <- bgd[(units.start + 1):(units.start + no.var)]
  if (length(units) == 0) {
    units <- "units: undefined"
  } else {
    units <- paste("units: ", paste(units, sep = "", collapse = ", "), sep = "")
  }
  data.start <- which(substr(bgd, 1, 5) == "DATA:")
  if (length(data.start) == 0) {
    stop(
      "Data start not not located ADMS.bgd\n       [please contact file structure if problems encountered]",
      call. = FALSE
    )
  }
  if (length(data.start) > 1) {
    warning(
      "Multiple possible data starts, taking last\n       [please contact openair problems encountered]",
      call. = FALSE
    )
    data.start <- data.start[length(data.start)]
  }

  ans <- read.csv(
    file,
    header = FALSE, skip = data.start,
    na.strings = c("", "NA", "-999", "-999.0"),
    ...
  )
  ans[] <- lapply(ans, function(x) {
    replace(x, x == -999, NA)
  })
  ########################
  # screening for missing data
  # confirm formats, if they get any with bgd files, etc.
  # might not be necessary

  date <- paste(ans[, 1], ans[, 2], ans[, 3], sep = "-")
  date <- as.POSIXct(strptime(date, format = "%Y-%j-%H"), "GMT")
  ans <- cbind(date = date, ans)
  if (length(variables) != ncol(ans) - 4) {
    warning(
      "Variable data mismatch, taking shortest\n       [please contact if openair problems encountered]",
      call. = FALSE
    )
    variables <- variables[1:min(c(length(variables), ncol(ans) - 4), na.rm = TRUE)]
    ans <- ans[, 1:(length(variables) + 4)]
  }

  names(ans) <- c("date", "bgd.year", "bgd.day", "bgd.hour", variables)
  if (drop.input.dates == TRUE) {
    ans <- ans[, c(1, 5:ncol(ans))]
  }
  if (keep.units) {
    comment(ans) <- c(comment(ans), units)
  }

  # error handling for bad days
  ids <- which(is.na(ans$date))
  if (length(ids) > 0) {
    if (length(ids) == nrow(ans)) {
      stop(
        "Invalid date (and time) format requested\n       [compare openair import settings and data structure]",
        call. = FALSE
      )
    }
    ans <- ans[-ids, ]
    reply <- paste("Missing dates detected, removing", length(ids), "line", sep = " ")
    if (length(ids) > 1) {
      reply <- paste(reply, "s", sep = "")
    }
    warning(reply, call. = FALSE)
  }
  print(unlist(sapply(ans, class)))
  ans
}

###############
## daughter
## importADMSMet

importADMSMet <- function(file = file.choose(),
                          drop.case = TRUE, drop.input.dates = TRUE,
                          keep.units = TRUE, simplify.names = TRUE,
                          test.file.structure = TRUE,
                          drop.delim = TRUE, add.prefixes = TRUE,
                          ...) {
  met <- readLines(file, n = -1)
  met <- sub("[[:space:]]+$", "", met) # strip out tail spaces
  loc.start <- which(met == "VARIABLES:")
  if (test.file.structure) {
    if (length(loc.start) == 0) {
      stop(
        "File not recognised ADMS.met structure\n       [please contact openair if valid]",
        call. = FALSE
      )
    }
  }
  if (length(loc.start) > 1) {
    warning(
      "Multiple possible variable starts, taking last\n       [please contact openair problems encountered]",
      call. = FALSE
    )
    loc.start <- loc.start[length(loc.start)]
  }
  variables <- suppressWarnings(as.numeric(met[loc.start + 1]))[1]
  if (test.file.structure & is.na(variables)) {
    stop(
      "File not recognised ADMS.met structure\n       [please contact openair if valid]",
      call. = FALSE
    )
  }
  variables <- met[(loc.start + 2):(loc.start + 1 + variables)]

  data.start <- which(met == "DATA:")
  if (test.file.structure) {
    if (length(data.start) == 0) {
      stop(
        "File not recognised ADMS.met structure\n       [please contact openair if valid]",
        call. = FALSE
      )
    }
  }
  if (length(data.start) > 1) {
    warning(
      "Multiple possible data starts, taking last\n       [please contact openair if problems encountered]",
      call. = FALSE
    )
    data.start <- data.start[length(data.start)]
  }

  met <- read.csv(file, skip = data.start, header = FALSE, na.strings = c(
    "-999",
    "-999.0"
  ))
  met[] <- lapply(met, function(x) {
    replace(x, x == -999, NA)
  })

  ## met <- met[, sapply(met, function(x) !all(is.na(x)))]

  if (length(variables) != ncol(met)) {
    warning(
      "Variable data mismatch, taking shortest\n       [please contact if openair problems encountered]",
      call. = FALSE
    )
    variables <- variables[1:min(c(length(variables), ncol(met)), na.rm = TRUE)]
    met <- met[, 1:length(variables)]
  }
  names(met) <- make.names(variables, unique = TRUE)

  # multiple year day hour name options
  fun.temp <- function(x, y, z) {
    if (all(!y %in% names(x))) {
      stop(paste(
        z,
        " not extracted\n       [please contact openair if valid file]",
        sep = ""
      ), call. = FALSE)
    }
    ans <- x[, y[y %in% names(x)]]
    if (!is.null(ncol(ans))) {
      ans <- ans[, 1]
    }
    ans
  }
  year <- fun.temp(met, c("YEAR"), "year")
  day <- fun.temp(met, c("DAY", "TDAY"), "day")
  hour <- fun.temp(met, c("HOUR", "THOUR"), "hour")

  met <- cbind(date = paste(year, day, hour, sep = "-"), met)
  met$date <- as.POSIXct(
    strptime(met$date, format = "%Y-%j-%H"),
    "GMT"
  )
  if (drop.input.dates) {
    met <- met[, !names(met) %in%
      c("YEAR", "TDAY", "THOUR", "DAY", "HOUR", "MONTH", "DAY.OF.MONTH")]
  }

  if (simplify.names) {
    names(met) <- simplifyNamesADMS(names(met))
  }


  # drop messy name handling
  names(met) <- gsub("[.][.]", ".", names(met))
  names(met) <- gsub("^[.]", "", names(met))

  if (drop.case) {
    names(met) <- tolower(names(met))
  }
  met[] <- lapply(met, function(x) {
    replace(x, x == -999, NA)
  })
  ids <- which(is.na(met$date))
  if (length(ids) > 0) {
    if (length(ids) == nrow(met)) {
      stop(
        "Invalid date (and time) format requested\n       [compare openair import settings and data structure]",
        call. = FALSE
      )
    }
    met <- met[-ids, ]
    reply <- paste(
      "Missing dates detected, removing", length(ids),
      "line",
      sep = " "
    )
    if (length(ids) > 1) {
      reply <- paste(reply, "s", sep = "")
    }
    warning(reply, call. = FALSE)
  }
  print(unlist(sapply(met, class)))
  met
}

###############
## daughter
## importADMSMop

importADMSMop <- function(file = file.choose(),
                          drop.case = TRUE, drop.input.dates = TRUE,
                          keep.units = TRUE, simplify.names = TRUE,
                          test.file.structure = TRUE,
                          drop.delim = TRUE, add.prefixes = TRUE, all = FALSE,
                          ...) {

  # problem
  # mismatch in file header line end with lr; data lines end with comma then lr
  #########
  # written a catch for this

  # problem
  # no obvious file structure for testing
  ########
  # provisional tester based on delim names

  # problem
  # r handling of x(y) names and x: names is messy
  ############
  # added tidy to correct for this
  # might need to rationalise names

  # problem
  # file contains lots of same names, input and processed
  ##############
  # added an add.prefixes option to handle this

  # problem
  # no keep.units options
  ##############
  # no option to use

  ######################
  # code

  # read top line/data headers
  check.names <- read.csv(file, header = FALSE, nrow = 1, ...)
  check.names <- make.names(as.vector(apply(check.names, 1, as.character)))
  ## tidy () handling; renaming x(y) as x.y. is messy
  check.names <- ifelse(
    substr(check.names, nchar(check.names), nchar(check.names)) == ".",
    substr(check.names, 1, nchar(check.names) - 1),
    check.names
  )
  ## tidy 1/LMN
  check.names <- gsub("X1.LMO", "RECIP.LMO", check.names)

  x.1 <- which(check.names == "INPUT_DATA")
  x.2 <- which(check.names == "PROCESSED_DATA")

  if (test.file.structure) {
    # check for delim columns
    if (length(x.1) == 0 | length(x.2) == 0) {
      stop(
        "File not recognised ADMS.mop structure\n       [please contact openair if valid]",
        call. = FALSE
      )
    }
  }

  # read in data
  ans <- read.csv(
    file,
    header = FALSE, skip = 1,
    na.strings = c("", "NA", "-999", "-999.0"),
    ...
  )
  ans[] <- lapply(ans, function(x) {
    replace(x, x == -999, NA)
  })

  ## check for extra empty column
  if (length(ans[, ncol(ans)][!is.na(ans[, ncol(ans)])]) == 0) {
    ans <- ans[, 1:(ncol(ans) - 1)]
  }
  if (ncol(ans) != length(check.names)) {
    warning(
      "Unexpected name/data mismatch, handled pragmatically\n       [compare openair import settings and data structure]",
      call. = FALSE
    )
  }

  if (simplify.names) check.names <- simplifyNamesADMS(check.names)

  ## restructure names and data according to arguments and put together
  if (is.logical(add.prefixes) == TRUE) {
    if (add.prefixes == TRUE) {
      check.names[(x.2[1] + 1):length(check.names)] <- paste("PROCESS", check.names[(x.2[1] + 1):length(check.names)], sep = ".")
    }
  } else {
    if (length(add.prefixes) == 1) {
      check.names[(x.2[1] + 1):length(check.names)] <- paste(add.prefixes[1], check.names[(x.2[1] + 1):length(check.names)], sep = ".")
    } else {
      if (length(add.prefixes) > 1) {
        check.names[(x.1[1] + 4):(x.2[1] - 1)] <- paste(add.prefixes[1], check.names[(x.1[1] + 4):(x.2[1] - 1)], sep = ".")
        check.names[(x.2[1] + 1):length(check.names)] <- paste(add.prefixes[2], check.names[(x.2[1] + 1):length(check.names)], sep = ".")
      } else {
        warning(
          "Unexpected add.prefixes option, option treated as FALSE\n       [check openair import settings]",
          call. = FALSE
        )
      }
    }
  }

  names(ans) <- make.names(check.names, unique = TRUE)

  ## reset wd 0 to 360
  ## get current PHI terminology
  temp <- if (simplify.names) simplifyNamesADMS("PHI") else "PHI"
  temp <- if (length(add.prefixes) > 1) paste(add.prefixes[1], temp, sep = ".") else temp
  if (temp %in% names(ans)) {
    ans[, temp][ans[, temp] == 0] <- 360
  }


  # drop messy name handling
  names(ans) <- gsub("[.][.]", ".", names(ans))
  names(ans) <- gsub("^[.]", "", names(ans))

  date <- paste(ans$TYEAR, ans$TDAY, ans$THOUR, sep = "-")
  date <- as.POSIXct(strptime(date, format = "%Y-%j-%H"), "GMT")
  if (drop.input.dates == TRUE) {
    ans <- ans[, !names(ans) %in% c("TYEAR", "TDAY", "THOUR")]
  }

  if (drop.delim == TRUE) {
    ans <- ans[, !names(ans) %in% c("PROCESSED_DATA", "INPUT_DATA")]
  }
  ans <- cbind(date = date, ans)

  if (drop.case == TRUE) {
    names(ans) <- tolower(names(ans))
  }

  # error handling for bad days
  ids <- which(is.na(ans$date))
  if (length(ids) > 0) {
    if (length(ids) == nrow(ans)) {
      stop(
        "Invalid date (and time) format requested\n       [compare openair import settings and data structure]",
        call. = FALSE
      )
    }
    ans <- ans[-ids, ]
    reply <- paste("Missing dates detected, removing", length(ids), "line", sep = " ")
    if (length(ids) > 1) {
      reply <- paste(reply, "s", sep = "")
    }
    warning(reply, call. = FALSE)
  }
  
  # add stability
  
  ans <-  ans %>% 
    mutate(H_LMO = process.recip.lmo * process.h,
  stability = case_when(H_LMO > 2 ~ "Stable",
                        H_LMO < -0.6 ~ "Unstable",
                        is.na(H_LMO) ~ NA,
                        .default = "Neutral"),
  stability = factor(stability, 
                     levels = c("Stable", "Neutral", "Unstable"))) %>% 
    rename(air_temp = temp,
           recip_lmo = process.recip.lmo,
           H = process.h)
  
  if (all) {
    
    return(ans)
    
  } else {
    
    # select variables only
    ans <- ans %>% 
      select(any_of(c("date", "ws", "wd", "air_temp", "rhu", 
                      "cl", "H",
                      "recip_lmo", "H_LMO", "stability")))
  }

  return(tibble(ans))
}


###############
# daughter
# importADMSPst
# kr v0.2
# 08 nov 2010

importADMSPst <- function(file = file.choose(),
                          drop.case = TRUE, drop.input.dates = TRUE,
                          keep.units = TRUE, simplify.names = TRUE,
                          test.file.structure = TRUE,
                          drop.delim = TRUE, add.prefixes = TRUE,
                          ...) {

  # notes
  #########
  # used the header/data dimension mismatcher handler from importADMSMop
  # maybe not be needed.
  #########
  # no obvious file structure for testing
  # provisional tester checks Hour, Day, Year and Receptor.name  in file names
  #########
  # units are recovered from names row
  #########
  # drops Time(s) if empty

  # problems
  #########
  # my name simplifications on Conc terms may need work
  # due to compiler can't catch mu.g/m3 units
  # talk to Matt/David re mug, ug and mg

  ######################
  # code

  # read top line/data headers
  check.names <- read.csv(file, header = FALSE, nrow = 1, ...)
  check.names <- as.vector(apply(check.names, 1, as.character))
  check.names <- sub("[[:space:]]+$", "", check.names) # strip out tail spaces
  check.names <- sub("^[[:space:]]{1,}", "", check.names) # strip leading space (safer?)
  check.names <- make.names(check.names) # after removing front spaces or X.m. conflict

  # test structure
  if (test.file.structure) {
    temp <- c("Hour", "Day", "Year", "Receptor.name")
    test <- temp[temp %in% check.names]
    if (!identical(temp, test)) {
      stop(
        "File not recognised ADMS.pst structure\n       [please contact openair if valid]",
        call. = FALSE
      )
    }
  }

  # read in data
  ans <- read.csv(
    file,
    header = FALSE, skip = 1,
    na.strings = c("", "NA", "-999", "-999.0"),
    ...
  )
  ans[] <- lapply(ans, function(x) {
    replace(x, x == -999, NA)
  })

  # match up data and names
  if (ncol(ans) != length(check.names)) {
    warning(
      "Unexpected name/data mismatch, handled pragmatically\n       [compare openair import settings and data structure]",
      call. = FALSE
    )
  }

  names(ans) <- make.names(check.names, unique = TRUE)

  # setup date/time
  date <- paste(ans$Year, ans$Day, ans$Hour, sep = "-")
  date <- as.POSIXct(strptime(date, format = "%Y-%j-%H"), "GMT")

  if (drop.input.dates == TRUE) {
    ans <- ans[, !names(ans) %in% c("Year", "Day", "Hour")]
  }

  # drop Time.s. if empty
  if (all(is.na(ans$Time.s.))) {
    ans <- ans[, !names(ans) %in% c("Time.s.")]
  }

  # recover units from names
  units <- rep(NA, ncol(ans))
  units[grep("[.]s[.]", names(ans))] <- "s"
  units[grep("^.[.]m", names(ans))] <- "m"
  units[grep("[.]ug.m.", names(ans))] <- "ug/m3" # both 3 and superscript 3
  units[grep("[.]ppb", names(ans))] <- "ppb"
  units[grep("[.]ppm", names(ans))] <- "ppm"
  if (length(na.omit(units)) == 0) {
    units <- "units: unknown"
  } else {
    units <- paste("units: ", paste(units, sep = "", collapse = ", "), sep = "")
  }

  if (simplify.names) {
    names(ans) <- simplifyNamesADMS(names(ans))
  }

  ans <- cbind(date = date, ans)

  if (drop.case == TRUE) {
    names(ans) <- tolower(names(ans))
  }

  comment(ans) <- c(comment(ans), units)

  # error handling for bad days
  ids <- which(is.na(ans$date))
  if (length(ids) > 0) {
    if (length(ids) == nrow(ans)) {
      stop(
        "Invalid date (and time) format requested\n       [compare openair import settings and data structure]",
        call. = FALSE
      )
    }
    ans <- ans[-ids, ]
    reply <- paste("Missing dates detected, removing", length(ids), "line", sep = " ")
    if (length(ids) > 1) {
      reply <- paste(reply, "s", sep = "")
    }
    warning(reply, call. = FALSE)
  }

  print(unlist(sapply(ans, class)))
  names(ans) <- gsub("..", replacement = "", names(ans), fixed = TRUE)
  ans
}


###############
## daughter
## simplifyNamesADMS
# kr v0.5
# 08 nov 2010

# notes
################
# two handlers: fun.temp and fun.temp.2
################
# fun.temp(x,y,z)
# replaces full term y with full term z in x
################
# fun.temp.2(x,y,z, y.names)
# replaces partial term y with partial term 2 in x
# if y.names = TRUE makes y r-friend first
# [make.names(y)...]
#

simplifyNamesADMS <- function(names = NULL) {
  # simplify.names lookup table for import.adms functions
  # v0.2 kr
  # handles as inputs (don't use after drop.case option)
  # names=NULL returns simplification operation summary

  if (is.null(names)) {
    message("Simplification operation summary")
    message("[ADMS => R => OPENAIR]:")
    fun.temp <- function(x, y, z) {
      temp <- c(y, make.names(y), z)
      message(paste("\t", paste(temp, collapse = " => "), sep = ""))
      temp <- data.frame(
        cbind(adms.input = temp[1], r.handling = temp[2], simplify.names = temp[3]),
        stringsAsFactors = FALSE
      )
      x <- rbind(x, temp)
      x
    }
    fun.temp.2 <- function(x, y, z, y.name) x
  } else {
    names <- make.names(names)
    fun.temp <- function(x, y, z) {
      x[which(x == make.names(y))] <- z
      x
    }
    fun.temp.2 <- function(x, y, z, y.name) {
      x <- if (y.name) gsub(make.names(y), z, x) else (gsub(y, z, x))
    }
  }

  ############
  # update list
  ############
  # 1/LMO
  names <- fun.temp(names, "1/LMO", "RECIP.LMO")
  # 1/MONIN-OBUKHOV LENGTH
  names <- fun.temp(names, "1/MONIN-OBUKHOV LENGTH", "RECIP.LMO")
  # ALBEDO(D)
  names <- fun.temp(names, "ALBEDO(D)", "ALBEDO.DISP")
  names <- fun.temp(names, "ALBEDO (D)", "ALBEDO.DISP")
  # ALBEDO(DISP)
  names <- fun.temp(names, "ALBEDO(DISP)", "ALBEDO.DISP")
  names <- fun.temp(names, "ALBEDO (DISP)", "ALBEDO.DISP")
  # ALBEDO (DISPERSION AREA)
  names <- fun.temp(names, "ALBEDO (DISPERSION AREA)", "ALBEDO.DISP")
  # ALBEDO(M)
  names <- fun.temp(names, "ALBEDO(M)", "ALBEDO.MET")
  names <- fun.temp(names, "ALBEDO (M)", "ALBEDO.MET")
  # ALBEDO(MET)
  names <- fun.temp(names, "ALBEDO(MET)", "ALBEDO.MET")
  names <- fun.temp(names, "ALBEDO (MET)", "ALBEDO.MET")
  # ALBEDO (MET SITE)
  names <- fun.temp(names, "ALBEDO (MET SITE)", "ALBEDO.MET")
  # ALPHA
  ##########
  ## conflict
  ##########
  ## both alpha.disp and alpha.met seem to have been abbrev. to alpha
  # ALPHA(D)
  names <- fun.temp(names, "ALPHA(D)", "ALPHA.DISP")
  names <- fun.temp(names, "ALPHA (D)", "ALPHA.DISP")
  # ALPHA(DISP)
  names <- fun.temp(names, "ALPHA(DISP)", "ALPHA.DISP")
  names <- fun.temp(names, "ALPHA (DISP)", "ALPHA.DISP")
  # ALPHA(M)
  names <- fun.temp(names, "ALPHA(M)", "ALPHA.MET")
  names <- fun.temp(names, "ALPHA (M)", "ALPHA.MET")
  # ALPHA(MET)
  names <- fun.temp(names, "ALPHA(MET)", "ALPHA.MET")
  names <- fun.temp(names, "ALPHA (MET)", "ALPHA.MET")
  # BL DEPTH
  names <- fun.temp(names, "BL DEPTH", "H")
  # BOUNDARY LAYER DEPTH
  names <- fun.temp(names, "BOUNDARY LAYER DEPTH", "H")
  # BUOYANCY FREQUENCY ABOVE BOUNDARY LAYER
  names <- fun.temp(names, "BUOYANCY FREQUENCY ABOVE BOUNDARY LAYER", "NU")
  # BUTADIENE
  # CL
  # CLOUD
  names <- fun.temp(names, "CLOUD", "CL")
  # CLOUD AMOUNT (OKTAS)
  names <- fun.temp(names, "CLOUD AMOUNT (OKTAS)", "CL")

  # Conc|ppb|NAME|All sources|-| 1hr
  names <- fun.temp(names, "Conc|ppb|NAME|SOURCES|-| RESOLUTION", "NAME.SOURCES.RESOLUTION")
  # Conc|ppm|NAME|All sources|-| 1hr
  names <- fun.temp(names, "Conc|ppm|NAME|SOURCES|-| RESOLUTION", "NAME.SOURCES.RESOLUTION")
  # Conc|ug/m3|NAME|All sources|-| 1hr
  names <- fun.temp(names, "Conc|ug/m3|NAME|SOURCES|-| RESOLUTION", "NAME.SOURCES.RESOLUTION")
  # NAME.All.sources.1hr
  names <- fun.temp(names, "NAME.All.sources.1hr", "NAME")
  # NAME.All.sources.RESOLUTION
  names <- fun.temp(names, "NAME.All.sources.RESOLUTION", "NAME.RESOLUTION")
  # NAME.SOURCE.1hr
  names <- fun.temp(names, "NAME.SOURCE.1hr", "NAME.SOURCE")

  # general for above
  names <- fun.temp.2(names, "Conc|ppb|", "", TRUE)
  names <- fun.temp.2(names, "Conc|ppm|", "", TRUE)
  names <- fun.temp.2(names, "Conc|.g/m.|", "", TRUE)
  # above covers
  ## u, m and mu for 1st and
  ## 3 and superscript3 for second
  names <- fun.temp.2(names, "[.][.][.][.]", ".", FALSE)
  names <- fun.temp.2(names, "[.][.][.]", ".", FALSE)
  names <- fun.temp.2(names, "[.]All[.]sources", "", FALSE)
  names <- fun.temp.2(names, "[.]1hr", "", FALSE)

  # D(RELATIVE HUMIDITY)/DZ ABOVE BOUNDARY LAYER (PERCENT/M)
  names <- fun.temp(names, "D(RELATIVE HUMIDITY)/DZ ABOVE BOUNDARY LAYER (PERCENT/M)", "DRHDZU")
  # DAY
  # DELTAPHI
  names <- fun.temp(names, "DELTAPHI", "DELTA.WD")
  # DELTAT
  names <- fun.temp(names, "DELTAT", "DELTA.T")
  names <- fun.temp(names, "DELTA T", "DELTA.T")
  # DELTATHETA
  names <- fun.temp(names, "DELTATHETA", "DELTA.THETA")
  names <- fun.temp(names, "DELTA THETA", "DELTA.THETA")
  # DIRN CHANGE
  names <- fun.temp(names, "DIRN CHANGE", "DELTA.WD")
  # DRH/DZ
  names <- fun.temp(names, "DRH/DZ", "DRHDZU")
  # DRHDZU
  # FR
  # FREQUENCY
  # FTHETA0
  # GEOSTROPHIC MINUS SURFACE WIND DIRECTION (DEGREES)
  names <- fun.temp(names, "GEOSTROPHIC MINUS SURFACE WIND DIRECTION (DEGREES)", "DELTA.WD")
  # H
  # HEAT FLUX
  names <- fun.temp(names, "HEAT FLUX", "FTHETA0")
  # HOUR
  # HOURS
  # INCOMING SOLAR RADIATION
  names <- fun.temp(names, "INCOMING SOLAR RADIATION", "K")
  # INPUT_DATA:
  # K
  # LAMBDAE
  # LATENT HEAT FLUX
  names <- fun.temp(names, "LATENT HEAT FLUX", "LAMBDAE")
  # LAT HT FLUX
  names <- fun.temp(names, "LAT HT FLUX", "LAMBDAE")
  # MODIFIED PRIESTLEY-TAYLOR PARAMETER (DISPERSION AREA)
  names <- fun.temp(names, "MODIFIED PRIESTLEY-TAYLOR PARAMETER (DISPERSION AREA)", "ALPHA.DISP")
  # MODIFIED PRIESTLEY-TAYLOR PARAMETER (MET SITE)
  names <- fun.temp(names, "MODIFIED PRIESTLEY-TAYLOR PARAMETER (MET SITE)", "ALPHA.MET")
  # MONTHS
  # N ABOVE BL
  names <- fun.temp(names, "N ABOVE BL", "NU")
  # NO2
  # NOx
  # NU
  # P
  # PM10
  # PM2.5
  # O3
  # Q0
  # PHI
  names <- fun.temp(names, "PHI", "WD")
  # PHI0
  names <- fun.temp(names, "PHI0", "WD.0")
  # PHIG
  names <- fun.temp(names, "PHIG", "WD.G")
  # PHISEC
  names <- fun.temp(names, "PHISEC", "WD.SEC")
  # PRECIP
  names <- fun.temp(names, "PRECIP", "P")
  # PRECIPITATION RATE (MM/HOUR)
  names <- fun.temp(names, "PRECIPITATION RATE (MM/HOUR)", "P")
  # PROCESSED_DATA:
  # R
  names <- fun.temp(names, "R", "ALBEDO.MET")
  # RECIPLMO
  names <- fun.temp(names, "RECIPLMO", "RECIP.LMO")
  # RELATIVE HUMIDITY ABOVE BOUNDARY LAYER (PERCENT)
  names <- fun.temp(names, "RELATIVE HUMIDITY ABOVE BOUNDARY LAYER (PERCENT)", "RHU")
  # RH ABOVE BL
  names <- fun.temp(names, "RH ABOVE BL", "RHU")
  # RHU
  # RHUM
  names <- fun.temp(names, "RHUM", "RHU")
  # ROUGHNESS LENGTH (DISPERSION AREA)
  names <- fun.temp(names, "ROUGHNESS LENGTH (DISPERSION AREA)", "Z0.DISP")
  # ROUGHNESS LENGTH (MET SITE)
  names <- fun.temp(names, "ROUGHNESS LENGTH (MET SITE)", "Z0.MET")
  # RUN
  # S HUMIDITY
  names <- fun.temp(names, "S HUMIDITY", "SHU")
  # SEA SURFACE TEMPERATURE (C)
  names <- fun.temp(names, "SEA SURFACE TEMPERATURE (C)", "TSEA")
  # SEA TEMP
  names <- fun.temp(names, "SEA TEMP", "TSEA")
  # SENSIBLE HEAT FLUX
  names <- fun.temp(names, "SENSIBLE HEAT FLUX", "FTHETA0")
  # SIGMATHETA
  names <- fun.temp(names, "SIGMATHETA", "SIGMA.THETA")
  names <- fun.temp(names, "SIGMA THETA", "SIGMA.THETA")
  # SIGMA THETA (DEGREES)
  names <- fun.temp(names, "SIGMA THETA (DEGREES)", "SIGMA.THETA")
  # SO2
  # SOLAR RAD
  names <- fun.temp(names, "SOLAR RAD", "K")
  # SPECIFIC HUMIDITY
  names <- fun.temp(names, "SPECIFIC HUMIDITY", "SHU")
  # T0C
  names <- fun.temp(names, "T0C", "TEMP")
  # TDAY
  # TEMPERATURE
  names <- fun.temp(names, "TEMPERATURE", "TEMP")
  # TEMPERATURE (C)
  names <- fun.temp(names, "TEMPERATURE (C)", "TEMP")
  # TEMPERATURE JUMP ACROSS BOUNDARY LAYER TOP
  names <- fun.temp(names, "TEMPERATURE JUMP ACROSS BOUNDARY LAYER TOP", "DELTA.THETA")
  # THOUR
  # TEMPERATURE OVER LAND MINUS SEA SURFACE TEMPERATURE
  names <- fun.temp(names, "TEMPERATURE OVER LAND MINUS SEA SURFACE TEMPERATURE", "DELTA.T")
  # Time(s)
  # X(m)
  names <- fun.temp(names, "Time(s)", "Time")
  # TSEA
  # TYEAR
  # U
  names <- fun.temp(names, "U", "WS")
  # UG
  names <- fun.temp(names, "UG", "WS.G")
  # UGSTAR
  names <- fun.temp(names, "UGSTAR", "WS.GSTAR")
  # USTAR
  names <- fun.temp(names, "USTAR", "WS.STAR")
  # WIND DIRN
  names <- fun.temp(names, "WIND DIRN", "WD")
  # WIND DIRECTION (DEGREES)
  names <- fun.temp(names, "WIND DIRECTION (DEGREES)", "WD")
  # WIND HEIGHT
  names <- fun.temp(names, "WIND HEIGHT", "WIND.HEIGHT")
  # WIND MEASUREMENT HEIGHT
  names <- fun.temp(names, "WIND MEASUREMENT HEIGHT", "WIND.HEIGHT")
  # WIND SPEED
  names <- fun.temp(names, "WIND SPEED", "WS")
  # WSTAR
  # X(m)
  names <- fun.temp(names, "X(m)", "X")
  # Y(m)
  names <- fun.temp(names, "Y(m)", "Y")
  # YEAR
  # Z(m)
  names <- fun.temp(names, "Z(m)", "Z")
  # Z0(D)
  names <- fun.temp(names, "Z0(D)", "Z0.DISP")
  names <- fun.temp(names, "Z0 (D)", "Z0.DISP")
  # Z0(DISP)
  names <- fun.temp(names, "Z0(DISP)", "Z0.DISP")
  names <- fun.temp(names, "Z0 (DISP)", "Z0.DISP")
  # Z0(M)
  names <- fun.temp(names, "Z0(M)", "Z0.MET")
  names <- fun.temp(names, "Z0 (M)", "Z0.MET")
  # Z0(MET)
  names <- fun.temp(names, "Z0(MET)", "Z0.MET")
  names <- fun.temp(names, "Z0 (MET)", "Z0.MET")

  ########
  # outputs
  ########
  invisible(names)
}
Any scripts or data that you put into this service are public.
openair documentation built on May 29, 2024, 11:07 a.m.
rdrr.io home R language documentation Run R code online
CRAN packages Bioconductor packages R-Forge packages GitHub packages
Note that we can't provide technical support on individual packages. You should contact the package authors for that.
openair
Tools for the Analysis of Air Pollution Data

R/importADMS.R
In openair: Tools for the Analysis of Air Pollution Data

Defines functions simplifyNamesADMS importADMSPst importADMSMop importADMSMet importADMSBgd importADMS

Documented in importADMS importADMSBgd importADMSMet importADMSMop importADMSPst

Try the openair package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

openair Tools for the Analysis of Air Pollution Data

R/importADMS.R In openair: Tools for the Analysis of Air Pollution Data

Defines functions simplifyNamesADMS importADMSPst importADMSMop importADMSMet importADMSBgd importADMS

Documented in importADMS importADMSBgd importADMSMet importADMSMop importADMSPst

Try the openair package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

openair
Tools for the Analysis of Air Pollution Data

R/importADMS.R
In openair: Tools for the Analysis of Air Pollution Data