prospectr: Miscellaneous Functions for Processing and Sample Selection of Spectroscopic Data

Documented in get_nircal_comments get_nircal_description get_nircal_indices get_nircal_lengthspc get_nircal_metadata get_nircal_response get_nircal_spectra read_nircal

#' @title Import BUCHI NIRCal files
#' @description
#'
#' \ifelse{html}{\out{<a href='https://www.tidyverse.org/lifecycle/#maturing'><img src='figures/lifecycle-maturing.svg' alt='Maturing lifecycle'></a>}}{\strong{Maturing}}
#'
#' This function imports .nir files generated by BUCHI NIRCal software.
#' @usage
#' read_nircal(file, response = TRUE, spectra = TRUE,
#'             metadata = TRUE, progress = TRUE, verbose = TRUE)
#' @param file the name of the NIRCal (.nir) file which the data are to be read
#' from. For URLs a temporary file is first downloaded and is then read.
#' @param response a logical indicating if the data of the response variables
#' must be returned (default is `TRUE`).
#' @param spectra a logical indicating if the spectral data must be returned
#' (default is `TRUE`).
#' @param metadata a logical indicating if the metadada must be returned
#' (default is `TRUE`).
#' @param progress a logical indicating if a progress bar must be printed
#' (default is `TRUE`).
#' @param verbose a logical indicating if the number of spectra and response
#' variables (an also the ID's of the spectra without gain and/or temperature
#' information) must be printed (default is `TRUE`).
#' @return a data.frame containing the metadata, response variables (if
#' `response = TRUE`) and spectra (if `spectra = TRUE`, embedded in the
#' `data.frame` as a matrix named `...$spc`).
#' @details
#' The extension of the BUCHI NIRCal files is .nir. These files are used to
#' store spectra generated by BUCHI N-500 and BUCHI NIRMaster FT-NIR sensors.
#' See
#' \href{https://assets.buchi.com/image/upload/v1605790933/pdf/Technical-Datasheet/TDS_11593569_NIRCal.pdf}{NIRCal technical data sheet.}
#' @author \href{https://orcid.org/0000-0002-5369-5120}{Leonardo Ramirez-Lopez}
#' @importFrom utils download.file
#' @export

## 23.03.2018 (leo): Since the scan function was crashing with some files, it
##                   was removed and replaced by readBin.
##                   Comment and Description fields are now read and added to
##                   the resulting data.frame
## 17.05.2018 (leo): A bug on comment_s was fixed. Characters with special
##                   accent were not recognized when [0-9 A-Z a-z 0-9A-Za-z [:punct:]]
##                   was used, this was fixed with [ [:punct:][:alnum:]]
## 17.05.2018 (leo): A new sanity check for repeated property names has been added.
## 04.07.2018 (leo): A bug on comment_f was fixed. Characters with special
##                   accent were not recognized when [0-9 A-Z a-z 0-9A-Za-z [:punct:]]
##                   was used, this was fixed with [ [:punct:][:alnum:]].
##                   The same prvious bug as for comment_s
## 16.07.2018 (leo): Now the function also reads the sample temperature
##                   (is only available for transmission measurements, i.e.
##                   measuremnt cell liquids)
## 08.08.2018 (leo): Variables identified as wavelengths are now correctly
##                   identified as wavenumbers
## 08.08.2018 (leo): Heaviliy re-coded for computational eficiency
## 08.08.2018 (leo): It now allows parallel computing by using the foreach
##                   package
## 08.08.2018 (leo): A new argument was added: preschedule
## 17.09.2018 (leo): A bug when repeated property names was corrected
## 19.09.2018 (leo): A new argument was added (metadata = TRUE). This is a
##                   workaround to avoid errors in reading metadata of files
##                   which are
##                   created in NIRCal which are the result of NIRCal computations
##                   and do not store metadata of the original measurements
## 10.01.2019 (leo): Function was crashing when property names contained special
##                   characters
## 23.01.2019 (leo): The collection of the description filed was failing since
##                   not all nircal files have the description right on top of
##                   the sample GUID.
##                   This is now corrected by extracting the description info
##                   from anoher place in the file
## 11.02.2019 (leo): The code was releasing the following warning (now at at
##                   ac <- unlist(strsplit(x = ab[1],
##                   split = "\n", useBytes = TRUE))); Warning in
##                   strsplit(ab[1], "\n") :
##                   input string 1 is invalid in this locale. Now the function
##                   uses the argument useBytes = TRUE to avoid this issue.
##                   If TRUE the matching is done byte-by-byte
##                   rather than character-by-character, and inputs with marked
##                   encodings are not converted. Now useBytes = TRUE is used
##                   whenever the strsplit() is used.
## 20.02.2019 (leo): The metadata was collected based on the position of the
##                   InstrumentType info (assuming it was NIRFlex). Now this is
##                   not asumed as it is possible that this information is
##                   missing or that a different sensor is registered in the
##                   file. Now the only reference is the name of the software
##                   used to create the file (which is always nircal)
## 20.02.2019 (leo): The device name is now an output (e.g. NIRFlex)
## 13.06.2019 (leo): Property names with a slash character "/" (e.g. TMA/ABVT)
##                   were causing the function to crash. This was fixed
##                   Any "/" in a property name will be replaced with "_"
## 13.03.2020 (leo): using now connections in conjunction with raw vectors
##                   (hexView no longer used). Code for reading responses,
##                   spectra and metadata has been vectorized (significantly).
##                   Function compartmentalization.
## 13.03.2020 (leo): bug fix. from 1:n[idxdescription] to (1:n)[idxdescription]
## 13.05.2020 (leo): reads from URLs
## 25.01.2022 (leo): bug fix in get_nircal_description(): it reads the proper
##                   number of recods based on n
read_nircal <- function(file,
                        response = TRUE,
                        spectra = TRUE,
                        metadata = TRUE,
                        progress = TRUE,
                        verbose = TRUE) {
  con <- file(file, "rb")
  
  if ("url" %in% class(con)) {
    close(con)
    tmp <- tempfile(pattern = "", fileext = ".nir")
    download.file(
      url = file, destfile = tmp, method = "auto", quiet = FALSE, mode = "wb",
      cacheOK = TRUE,
      extra = getOption("download.file.extra"),
      headers = NULL
    )
    message("File at: '", tmp, "'\n")
    file <- tmp
    con <- file(file, "rb")
  } else {
    if (!file.exists(file)) {
      stop("File does not exists!")
    }
  }
  
  seek(con, where = 1, origin = "start")
  nircalraw <- readBin(con,
                       what = "raw",
                       n = file.info(file)$size
  )
  
  first_lines <- readLines(con = file, 1:2)
  if (length(first_lines) >= 1) {
    isnircal <- grepRaw("NIRCAL Project File", first_lines, all = TRUE)
  } else {
    isnircal <- NULL
  }
  
  if (length(isnircal) == 0) {
    stop("Ups! This does not look like a BUCHI NIRCal file")
  }
  
  rawcoords <- get_nircal_indices(x = nircalraw)
  
  ids <- get_nircal_ids(connection = con, from = rawcoords$values_s[5], to = rawcoords$values_s2)
  nd <- length(ids)
  
  nms <- c(
    "ID", "GUID", "Scans", "resolution", "nWavenumbers", "WavenumberSteps", "WavenumberStart",
    "Device", "Software Version", "Created", "Modified", "Creator", "Creator login", "Modified by",
    "Modifier login", "Instrument serial", "Measurement cell", "Option serial", "Gain factor",
    "Gain", "Instrument temperature", "Sample temperature", "Comment", "Description"
  )
  
  dextracted <- matrix(NA, length(ids), length(nms))
  dextracted <- data.frame(dextracted)
  colnames(dextracted) <- nms
  
  dextracted$ID <- ids
  
  if (response) {
    responses <- get_nircal_response(x = nircalraw, n = nd)
    dextracted <- cbind(dextracted, responses$properties)
    nms <- colnames(dextracted)
    if (!is.null(responses$warning)) {
      warning(responses$warning)
    }
  }
  
  readmessage <- paste(
    "File contains", nd, ifelse(nd == 1, "spectrum", "spectra"),
    ifelse(response,
           ifelse(length(responses$property_names) == 1 & "<<undef>>" %in% responses$property_names, "and one unnamed response variable",
                  paste("and", length(responses$property_names), "response variables")
           ), ""
    )
  )
  
  readmessage <- paste(readmessage, "\n", sep = "")
  
  if (verbose) {
    cat(readmessage)
  }
  
  if (progress) {
    pb <- txtProgressBar(style = 1)
    g1 <- 0.75
    g2 <- 0.10
    g3 <- 0.07
  }
  
  ## verify that what it was read coincides with the number of samples specified in the file
  if (rawcoords$nsamples != nd) {
    stop("number of samples do not match the number of IDs")
  }
  
  
  if (progress) {
    setTxtProgressBar(pb, g1)
  }
  
  if (!is.null(rawcoords$comment_s) | !is.null(rawcoords$comment_f)) {
    .comment <- get_nircal_comments(
      connection = con,
      metanumbers = rawcoords$metanumbers,
      begin_s = rawcoords$begin_s,
      comment_s = rawcoords$comment_s,
      comment_f = rawcoords$comment_f,
      n = nd
    )
    
    description <- get_nircal_description(
      x = nircalraw,
      begin_s = rawcoords$begin_s,
      spcinfo = rawcoords$spcinfo,
      comment_s = rawcoords$comment_s,
      comment_f = rawcoords$comment_f,
      n = nd
    )
  } else {
    warning("comments and description fields cointain unreadable data, these fields have been skipped")
    .comment <- description <- rep(NA, nd)
  }
  
  dextracted$Comment <- as.character(.comment)
  dextracted$Description <- as.character(description)
  
  if (progress) {
    setTxtProgressBar(pb, g1 + g2)
  }
  
  speclength <- get_nircal_lengthspc(
    connection = con,
    from = rawcoords$values_s[6],
    to = rawcoords$values_s[7] - rawcoords$values_s[6]
  )
  
  
  n_s <- rawcoords$end_s - rawcoords$begin_s - 1
  spctra_start <- rawcoords$begin_s[n_s == speclength * 8]
  ## it is here where each spectrum starts
  spctra_start <- spctra_start[spctra_start > min(rawcoords$spcinfo)]
  
  ## -- Collect the spectra  --
  if (spectra) {
    dspectra <- get_nircal_spectra(
      x = nircalraw,
      values_s = rawcoords$values_s,
      spctra_start = spctra_start,
      speclength = speclength,
      n = nd
    )
    dextracted$spc <- dspectra
  }
  
  if (progress) {
    setTxtProgressBar(pb, g1 + g2 + g3)
    cmpl <- g1 + g2 + g3
    stp <- (1 - cmpl) / nd
  } else {
    stp <- NULL
    cmpl <- NULL
  }
  
  if (metadata) {
    metda <- get_nircal_metadata(
      connection = con,
      n = nd,
      spctra_start = spctra_start,
      spcinfo = rawcoords$spcinfo,
      progress = progress,
      pb = pb,
      progress.start = cmpl,
      progress.steps = stp
    )
    dextracted[, colnames(metda)] <- metda
    
    if (progress) {
      # Close connection
      close(con, type = "rb")
    }
  } else {
    guididx <- unlist(lapply(rawcoords$spcinfo[1:nd],
                             FUN = function(x, l) x:(x + l),
                             l = 41
    ))
    
    if (progress) {
      setTxtProgressBar(pb, 1)
    }
    
    guids <- readChar(nircalraw[guididx], nchars = nd * (41 + 1))
    guids <- strsplit(guids, "\n")[[1]]
    guids <- gsub("[0-9]{2}/", "", guids)
    dextracted$GUID <- guids
    dextracted$nWavenumbers <- speclength
  }
  
  
  
  if (progress) {
    close(pb)
  }
  
  
  numericp <- c(
    "Scans",
    "resolution",
    "nWavenumbers",
    "WavenumberSteps",
    "WavenumberStart",
    "Gain factor",
    "Gain",
    "Instrument temperature",
    "Sample temperature"
  )
  
  dextracted[, numericp] <- as.numeric(unlist(dextracted[, numericp]))
  dextracted$`Gain factor`[dextracted$`Gain factor` == 0] <- NA
  dextracted$`Instrument temperature`[dextracted$`Instrument temperature` == 0] <- NA
  dextracted$Device[dextracted$Device == ""] <- NA
  dextracted$Device[dextracted$Device %in% c("0", "1", "-1")] <- NA
  
  if (verbose) {
    if (sum(is.na(dextracted$`Gain factor`)) > 0) {
      if (nrow(dextracted) == 1) {
        cat("Gain data is not available\n")
      } else {
        cat("Gain data is not available for one or more observations\n")
      }
    }
    if (sum(is.na(dextracted$`Instrument temperature`)) > 0) {
      if (nrow(dextracted) == 1) {
        cat("Instrument temperature data is not available\n")
      } else {
        cat("Instrument temperature data is not available for one or more observations\n")
      }
    }
  }
  
  if (!sum(!is.na(dextracted$spc))) {
    dextracted <- dextracted[, !colnames(dextracted) %in% "spc"]
  }
  
  return(dextracted)
}


#' @title get the positions of relevant data witihi the nircal file
#' @description internal
#' @keywords internal
get_nircal_indices <- function(x) {
  beginchr <- "begin"
  endchr <- "end"
  
  begin_s <- grepRaw(paste("\n", beginchr, "\n", sep = ""), x, all = TRUE) + nchar(beginchr) + 2
  end_s <- grepRaw(paste("\n", endchr, "\n", sep = ""), x, all = TRUE) + 1
  
  
  values_s <- grepRaw("Values", x, all = TRUE)
  values_s2 <- grepRaw("\n1[[:space:]]Values", x, all = TRUE)
  values_s2 <- values_s2[1 + sum(!values_s2 > values_s[5])]
  values_s3 <- grepRaw("[0-9]{1,}[[:space:]]Values", x, all = TRUE)[5]
  
  spcinfo <- 1 + grepRaw("\n38\\/\\{", x, all = TRUE)[-1]
  
  ## The numbers in NIRCal files preceeding the files comment and description (e.g 11/ or 7/)
  metanumbers <- grepRaw("[[0-9]]{0,}\\/Comment\n[[0-9]{0,}\\/Description", x, all = TRUE)[1]
  
  lnss <- values_s[5] - values_s3
  
  nss <- as.numeric(readBin(x[values_s3:(values_s3 + lnss - 1)], "character"))
  
  srchc_s <- "[ [:punct:][:alnum:]]{0,}\n[ [:punct:][:alnum:]]{0,}\\/[ [:alnum:][:punct:]]{0,}\n38\\/\\{"
  srchc_f <- "\n[ [:punct:][:alnum:]]{0,}\\/[ [:punct:][:alnum:]]{0,}\n38\\/\\{"
  comment_s <- grepRaw(srchc_s, x, ignore.case = TRUE, all = TRUE)
  comment_f <- grepRaw(srchc_f, x, ignore.case = TRUE, all = TRUE)
  
  if (length(comment_s) < nss | length(comment_f) < nss) {
    read_comments_description <- FALSE
    comment_s <- comment_f <- NULL
  }
  
  
  return(list(
    nsamples = nss,
    begin_s = begin_s,
    end_s = end_s,
    values_s = values_s,
    values_s2 = values_s2,
    values_s3 = values_s3,
    spcinfo = spcinfo,
    metanumbers = metanumbers,
    comment_s = comment_s,
    comment_f = comment_f
  ))
}

get_nircal_ids <- function(connection, from, to) {
  seek(connection, where = from, origin = "start")
  
  ids <- readBin(
    readBin(connection, what = "raw", n = to - from),
    "character"
  )
  ids <- enc2utf8(ids)
  ids <- strsplit(ids, "\n", useBytes = TRUE)[[1]]
  # ids <- iconv(ids, to = "UTF-8", sub = NA)
  ids2 <- ids[-c(1, length(ids))]

  ids <- try(substr(x = ids2, start = regexpr("/", ids) + 1, stop = 100000))
  if (inherits(ids, "try-error")) {
    ids <- iconv(ids2, from = "Latin1", to = "UTF-8")
  }
  
  flush(connection)
  return(ids)
}


#' @title get the comments of the spectra in the nircal file
#' @description internal
#' @keywords internal
get_nircal_comments <- function(connection, metanumbers, begin_s, comment_s, comment_f, n) {
  if (length(comment_s) < n | length(comment_f) < n) {
    warning("comments cannot be read")
    comment <- rep(NA, n)
    return(comment)
  } else {
    comment_s <- comment_s[1:n]
    comment_f <- comment_f[1:n]
    
    
    seek(connection,
         where = metanumbers + begin_s[sum(metanumbers > begin_s) + 1],
         origin = "start"
    )
    
    readb <- function(..i.., connection, comment_s, comment_f) {
      seek(connection, where = comment_s[..i..], origin = "start")
      
      i.comment <- readBin(
        readBin(connection,
                what = "raw",
                n = comment_f[..i..] - comment_s[..i..]
        ),
        "character"
      )
      # i.comment <- readChar(connection, nchars = comment_f[..i..] - comment_s[..i..])
      i.comment
    }
    
    comment <- description <- rep(NA, n)
    idxcomments <- (comment_f - comment_s) > 2
    i.comment <- sapply(
      X = (1:n)[idxcomments],
      FUN = readb,
      connection = connection,
      comment_s = comment_s,
      comment_f = comment_f
    )
    
    comment[idxcomments] <- i.comment
    comment <- gsub("^[.]{0,}[0-9]{1,}\\/", "", comment)
    
    flush(connection)
    
    return(comment)
  }
}

#' @title get the description of the spectra in the nircal file
#' @description internal
#' @keywords internal
get_nircal_description <- function(x, begin_s, spcinfo, comment_s, comment_f, n) {
  if (length(comment_s) < n | length(comment_f) < n) {
    warning("comments cannot be read")
    description <- rep(NA, )
    return(description)
  } else {
    comment_s <- comment_s[1:n]
    comment_f <- comment_f[1:n]
    ## The numbers in NIRCal files preceeding the files comment and description (e.g 11/ or 7/)
    metanumbers <- grepRaw("[[0-9]]{0,}\\/Comment\n[[0-9]{0,}\\/Description", x, all = TRUE)
    if (length(metanumbers) > 2) {
      metanumbers <- metanumbers[3]
    } else {
      metanumbers <- metanumbers[1]
    }
    
    metanumbersinfo <- readBin(x[metanumbers:(metanumbers + begin_s[sum(metanumbers > begin_s) + 1])], "character")
    
    metanumbersinfo <- strsplit(metanumbersinfo, "\n", useBytes = TRUE)[[1]]
    metadescription <- strsplit(metanumbersinfo[2], "Description", useBytes = TRUE)[[1]]
    
    srchmtd <- paste("[0-9]{1,}/1/cm\n", metadescription, "[A-Z a-z]{1,}\n", sep = "")
    
    srchmtd2 <- "[0-9]{1,}/1/cm\n[0-9]{1,}/[A-Z a-z]{1,}\n"
    
    i_description <- grepRaw(srchmtd,
                             value = TRUE,
                             x,
                             all = TRUE
    )
    
    if (length(i_description) < n) {
      i_description <- grepRaw(srchmtd2,
                               value = TRUE,
                               x,
                               all = TRUE
      )
    }
    
    if (length(i_description) < n) {
      warning("description field could not be read")
      description <- rep(NA, n)
    } else {
      i_description <- i_description[1:n]
      if (length(i_description) > 0) {
        i_description <- sapply(X = i_description, FUN = readBin, what = "character")
        i_description <- strsplit(i_description, paste("[0-9]{1,}/1/cm\n", metadescription, "|\n", sep = ""), useBytes = TRUE)
        description <- sapply(X = i_description, FUN = function(x) x[[2]])
      } else {
        ## This part might fail since not all nircal files have the description right on top of the sample GUID
        readd <- function(..i.., rawf, comment_s, comment_f, spcinfo) {
          rvec <- comment_s[..i..]:(comment_s[..i..] + spcinfo[..i..] - comment_f[..i..])
          i_description <- readChar(rawf[rvec], spcinfo[..i..] - comment_f[..i..] + 1)
          i_description
        }
        
        idxdescription <- (spcinfo[1:n] - comment_f) > 2
        i_description <- sapply(
          X = (1:n)[idxdescription],
          FUN = readd,
          rawf = x,
          comment_s = comment_s,
          comment_f = comment_f,
          spcinfo = spcinfo
        )
        i_description <- iconv(i_description, from = "ASCII", to = "UTF-8", sub = "byte")
        i_description
        
        description <- rep(NA, n)
        
        description[idxdescription] <- i_description
        description[description == ""] <- NA
        description <- gsub("[ [:punct:][:alnum:]]{0,}\n[ [:punct:][:alnum:]]{0,}\\/", "", description)
        description <- gsub("[.]$", "", description)
      }
    }
    return(description)
  }
}

#' @title get the number of spectral variables in the nircaa file
#' @description internal
#' @keywords internal
get_nircal_lengthspc <- function(connection, from, to) {
  seek(connection, where = from, origin = "start")
  speclength <- readChar(connection, nchars = to)
  speclength <- strsplit(speclength, "\n", useBytes = TRUE)[[1]]
  speclength <- as.numeric(speclength[length(speclength)])
  flush(connection)
  return(speclength)
}

#' @title get the response variables in the nircal file
#' @description internal
#' @keywords internal
get_nircal_response <- function(x, n) {
  ## get the data of the response variables
  property_info_start <- grepRaw(
    "10/Properties\n18/Property Selection",
    x,
    all = TRUE
  )[1]
  
  property_info_end <- grepRaw("begin", x, offset = property_info_start, all = TRUE)
  
  property_info_indcs <- property_info_start:(property_info_start + property_info_end[2] - property_info_start)
  
  nproperties_c <- strsplit(readBin(x[property_info_indcs], what = "character"),
                            "\n",
                            useBytes = TRUE
  )[[1]][5]
  nproperties_n <- as.numeric(unlist(strsplit(nproperties_c, "Values", useBytes = TRUE)))
  
  property_names_indcs <- grepRaw(nproperties_c,
                                  x,
                                  offset = property_info_start,
                                  all = TRUE
  )
  
  property_names_indcs <- property_names_indcs[3]:(property_names_indcs[3] + property_info_end[3] - property_names_indcs[3])
  property_names_char <- readBin(
    x[property_names_indcs],
    what = "character"
  )
  
  property_names_utf <- iconv(property_names_char, from = "ASCII", to = "UTF-8", sub = "byte")
  
  property_names_utf <- strsplit(property_names_utf, "\n", useBytes = TRUE)[[1]][1 + c(1:nproperties_n)]
  property_names_char <- strsplit(property_names_char, "\n", useBytes = TRUE)[[1]][1 + c(1:nproperties_n)]
  
  # property_names_search <- sapply(
  #   property_names_char,
  #   FUN = function(x) strsplit(x, "[0-9]{0,}/")[[1]][[2]],
  #   USE.NAMES = FALSE
  # )
  property_names_search <- property_names_char
  
  # property_names_search <- gsub(
  #   pattern = "[[:punct:]]",
  #   replacement = "[[:punct:]]",
  #   x = property_names_search
  # )
  
  # property_names_search <- gsub("[^ -~]", "[^ -~]", property_names_search)
  # property_names_char <- gsub("[^ -~]", "[^ -~]", property_names_char)
  
  property_names <- sapply(
    property_names_utf,
    FUN = function(x) strsplit(x, "[0-9]{0,}/")[[1]][[2]],
    USE.NAMES = FALSE
  )
  
  proppositions <- grepRaw(paste(property_names_char, collapse = "\n"),
                           x,
                           fixed = FALSE,
                           all = TRUE
  )[-1]
  
  
  nval <- paste(length(property_names), "Values")
  
  property_names_search <- paste(
    c(
      nval,
      # paste("[0-9]{0,}\\/", property_names_search, sep = "", collapse = "\n"),
      paste(property_names_search, sep = "", collapse = "\n"),
      "[0-9]{0,}",
      nval,
      "begin"
    ),
    collapse = "\n"
  )
  
  property_indices <- grepRaw(
    property_names_search,
    x,
    all = TRUE
  )
  
  
  lengthproperty_indices <- length(
    grepRaw(
      property_names_search,
      x,
      all = FALSE,
      value = TRUE
    )
  ) + 1
  
  
  if (sum(duplicated(property_names)) > 0) {
    wrn <- c("Some property names are duplicated, please correct the names. Indices have been added to the repeated names")
    dpn <- unique(property_names[duplicated(property_names)])
    for (i in 1:length(dpn)) {
      property_names[property_names == dpn] <- paste(property_names[property_names == dpn],
                                                     1:length(property_names[property_names == dpn]),
                                                     sep = "_"
      )
    }
  } else {
    wrn <- NULL
  }
  property_names <- gsub("/", "_", property_names)
  
  respidx <- unlist(lapply((property_indices + lengthproperty_indices)[1:(n)],
                           FUN = function(x, l) {
                             vec <- x:(x + l)
                             vec
                           },
                           l = 8 * nproperties_n - 1
  ))
  
  properties <- readBin(x[respidx], what = "double", n = n * nproperties_n)
  properties <- t(matrix(properties, nrow = nproperties_n))
  
  
  if (length(property_names) > 0) {
    properties[properties == 0] <- NA
  }
  
  colnames(properties) <- property_names
  
  return(list(
    property_names = property_names,
    properties = properties,
    warning = wrn
  ))
}


#' @title get the metadata of the samples in the nircal file
#' @description internal
#' @keywords internal
get_nircal_metadata <- function(connection, n, spctra_start, spcinfo, progress, pb, progress.start, progress.steps) {
  metadata <- matrix(NA, nrow = n, ncol = 32)
  iprogress <- progress.start
  for (i in 1:n) {
    
    ## read just the segment with the info (including binary data for numeric info)
    seek(connection, where = spcinfo[i], origin = "start")
    ac <- readBin(
      readBin(connection, what = "raw", n = (spctra_start[i] - spcinfo[i])),
      "character"
    )
    
    ac <- iconv(ac, from = "latin1", to = "UTF-8", sub = "byte")
    ac <- strsplit(ac, "\n")[[1]]
    ac <- ac[-length(ac)]
    
    ncac <- 10 + grep("1/cm", ac)
    ncac2 <- grep("NIRCal", ac)
    ncac2 <- ncac2[length(ncac2)]
    ncac <- c(ncac, ncac2)
    
    
    # nflx <- grep("NIRFlex N500", ac)[1]
    nflx <- ncac2 + 11
    
    ## Device
    i.device <- ac[nflx]
    
    ## Created: year, month, day, hours, minutes, seconds
    i.created_ymdhms <- ac[ncac[1] - 1:6]
    
    ## Modified: year, month, day, hours, minutes, seconds
    i.modified_ymdhms <- ac[ncac[2] - 1:6]
    
    ## Software version, creator, creator log in
    i.versioncreatorlogin <- ac[ncac[1] + c(1, 2, 5)]
    
    ## Modified by, modified by log in
    i.moodifiedby <- ac[ncac[2] + c(2, 5)]
    
    ## Instrument serial number, measurement cell, serial number measurement cell
    i.snmcsn <- ac[nflx - c(4, 3, 2)]
    
    
    gain_s <- grep("Gain Factor", ac)
    temp_s <- grep("Instrument Temperature", ac)
    temp_sample <- grep("Sample Temperature", ac)
    
    if (length(gain_s) != 0) {
      i.gain_factor <- ac[gain_s + c(1, 3)]
    } else {
      i.gain_factor <- c(NA, NA)
    }
    
    if (length(temp_s) != 0) {
      i.instrumenttemp <- ac[temp_s + 1]
    } else {
      i.instrumenttemp <- NA
    }
    
    if (length(temp_sample) != 0) {
      i.sampletemp <- ac[temp_sample + 1]
    } else {
      i.sampletemp <- NA
    }
    
    i.metadata <- c(
      i,
      ac[c(1, 7:11)],
      i.device,
      i.versioncreatorlogin,
      i.created_ymdhms,
      i.modified_ymdhms,
      i.moodifiedby,
      i.snmcsn,
      i.gain_factor,
      i.instrumenttemp,
      i.sampletemp
    )
    
    if (progress) {
      rprogress <- (i * progress.steps) + iprogress
      setTxtProgressBar(pb, rprogress)
    }
    
    metadata[i, ] <- i.metadata
  }
  
  # if (!"matrix" %in% class(metadata)) {
  #   metadata <- t(metadata)
  # }
  
  
  
  ## not really necessary
  metadata <- metadata[order(as.numeric(metadata[, 1])), -1, drop = FALSE]
  
  sf <- function(..i.., ex) {
    substring(ex[, ..i..], first = 1 + regexpr(pattern = "\\/", text = ex[, ..i..]))
  }
  
  dataex <- sapply(c(1:10, 23:31), FUN = sf, ex = metadata)
  
  
  datesc <- as.vector(sapply(c(11:22), FUN = sf, ex = metadata))
  
  datesc[nchar(datesc) == 1 & !is.na(datesc)] <- paste("0", datesc[nchar(datesc) == 1 & !is.na(datesc)], sep = "")
  
  datesc <- matrix(datesc, nrow(dataex))
  
  datecreated <- paste(datesc[, 1],
                       "/",
                       datesc[, 2],
                       "/",
                       datesc[, 3],
                       " ",
                       datesc[, 4],
                       ":",
                       datesc[, 5],
                       ":",
                       datesc[, 6],
                       sep = ""
  )
  
  datemodified <- paste(datesc[, 7],
                        "/",
                        datesc[, 8],
                        "/",
                        datesc[, 9],
                        " ",
                        datesc[, 10],
                        ":",
                        datesc[, 11],
                        ":",
                        datesc[, 12],
                        sep = ""
  )
  
  nms <- c(
    "GUID",
    "Scans",
    "resolution",
    "nWavenumbers",
    "WavenumberSteps",
    "WavenumberStart",
    "Device",
    "Software Version",
    "Creator",
    "Creator login",
    "Modified by",
    "Modifier login",
    "Instrument serial",
    "Measurement cell",
    "Option serial",
    "Gain factor",
    "Gain",
    "Instrument temperature",
    "Sample temperature"
  )
  
  colnames(dataex) <- nms
  
  dataex <- cbind(dataex, Created = datecreated, Modified = datemodified)
  flush(connection)
  return(dataex)
}

#' @title get the spectra in the nircal file
#' @description internal
#' @keywords internal
get_nircal_spectra <- function(x, values_s, spctra_start, speclength, n) {
  spcidx <- unlist(lapply(spctra_start[1:n],
                          FUN = function(x, l) x:(x + l),
                          l = 8 * speclength - 1
  ))
  
  dspectra <- readBin(x[spcidx], what = "double", n = n * speclength)
  dspectra <- t(matrix(dspectra, nrow = speclength, ncol = n))
  
  wavref <- paste(speclength, "Values")
  wavstart <- grepRaw(wavref, x, all = TRUE)[3] + nchar(wavref) + 1
  
  
  widx <- wavstart:(wavstart + values_s[values_s > wavstart][1] - wavstart)
  waves <- readBin(x[widx], what = "character")
  waves <- strsplit(waves, "\n")[[1]]
  waves <- waves[grep("[0-9]{1}/", waves)]
  waves <- iconv(waves, from = "ASCII", to = "UTF-8", sub = "byte")
  waves <- as.numeric(gsub("[0-9]/", "", waves))
  
  colnames(dspectra) <- waves
  return(dspectra)
}
l-ramirez-lopez/prospectr documentation built on Feb. 18, 2024, 7:52 a.m.
rdrr.io home R language documentation Run R code online
CRAN packages Bioconductor packages R-Forge packages GitHub packages
Note that we can't provide technical support on individual packages. You should contact the package authors for that.
l-ramirez-lopez/prospectr
Miscellaneous Functions for Processing and Sample Selection of Spectroscopic Data

R/read_nircal.R
In l-ramirez-lopez/prospectr: Miscellaneous Functions for Processing and Sample Selection of Spectroscopic Data

Defines functions get_nircal_spectra get_nircal_metadata get_nircal_response get_nircal_lengthspc get_nircal_description get_nircal_comments get_nircal_ids get_nircal_indices read_nircal

Documented in get_nircal_comments get_nircal_description get_nircal_indices get_nircal_lengthspc get_nircal_metadata get_nircal_response get_nircal_spectra read_nircal

R Package Documentation

Browse R Packages

We want your feedback!

l-ramirez-lopez/prospectr Miscellaneous Functions for Processing and Sample Selection of Spectroscopic Data

R/read_nircal.R In l-ramirez-lopez/prospectr: Miscellaneous Functions for Processing and Sample Selection of Spectroscopic Data

Defines functions get_nircal_spectra get_nircal_metadata get_nircal_response get_nircal_lengthspc get_nircal_description get_nircal_comments get_nircal_ids get_nircal_indices read_nircal

Documented in get_nircal_comments get_nircal_description get_nircal_indices get_nircal_lengthspc get_nircal_metadata get_nircal_response get_nircal_spectra read_nircal

R Package Documentation

Browse R Packages

We want your feedback!

l-ramirez-lopez/prospectr
Miscellaneous Functions for Processing and Sample Selection of Spectroscopic Data

R/read_nircal.R
In l-ramirez-lopez/prospectr: Miscellaneous Functions for Processing and Sample Selection of Spectroscopic Data