R/organize_libraries_EI_and_MS2.R

Defines functions change_meta reorganize_mona assign_smiles

Documented in assign_smiles change_meta reorganize_mona

#' Assign SMILES
#'
#' \code{assign_smiles} offers a way to assign SMILES to the library obtained
#' from NIST format (exported by Lib2NIST with structure information separately
#' stored in mol files).
#'
#' The msp file obtained from Lib2NIST has no SMILES and the structure
#' information is stored in multiple mol files. After transforming all mol files
#' into a single sdf file by \code{combine_mol2sdf} and retrieving structure
#' information by \code{extract_structure}, SMILES is available. This function
#' provides a way to assign SMILES to correspondent compound in the msp file.
#' If you are working with Linux-based or Mac OS system, it is better to use
#' "inchikey" for matching. However, if you are working with Windows, the only
#' option is to use "name" for matching, which is a kind of compromise as
#' some chemicals in the *.mol files do not have full chemical names. Hence they
#' will not be matched. This function is useful for both EI and MS2 libraries.
#' This function supports parallel computing.
#'
#' @param lib The library generated by \code{read_lib}.
#' @param structure_data the correspondent structure data generated by
#'   \code{extract_structure}.
#' @param match Correspondence can be done by either "name" or "inchikey".
#'
#' @return A \code{list} with SMILES assigned.
#' @export
#'
#' @import future.apply
assign_smiles <- function(lib, structure_data, match = "name") {
  future.apply::future_lapply(lib, function(x) {
    if (match == "name") {
      x$Smiles <-
        structure_data$Smiles[match(tolower(x$Name), structure_data$Name)]
    } else {
      x$Smiles <-
        structure_data$Smiles[match(x$InChIKey, structure_data$InChIKey)]
    }

    return(x)
  })
}


#' Reorganize MoNA library
#'
#' \code{reorganize_mona} offers a way to reorganize MoNA libray,
#' mainly to retrieve SMILES from the "Comments" field.
#'
#' The msp file from MoNA has no "SMILES" field but has SMILES information
#' stored in the "Comments" field. Therefore, This function tries to retrieve
#' SMILES from the "Comments" field. This function supports parallel computing.
#'
#' @param lib The MoNA library generated by \code{read_lib}.
#'
#' @return A \code{List} with SMILES retrieved.
#' @export
#'
#' @import future.apply
#' @importFrom qdapRegex rm_between
reorganize_mona <- function(lib) {
  future.apply::future_lapply(lib, function(x) {
    tmp <- unlist(qdapRegex::rm_between(x$Comment, '"', '"', extract = TRUE))

    x$Smiles <- tmp[grepl("^SMILES=", tmp, ignore.case = TRUE)]
    x$Smiles <- gsub("^SMILES=", "", x$Smiles, ignore.case = TRUE)

    return(x)
  })
}


#' Change meta data
#'
#' \code{change_meta} offers a way to change meta data (mainly used for in-house
#' library).
#'
#' When you build your own mass spectral library (either EI or MS2 library),
#' you might want to add or change some meta data, such as collision energy,
#' instrument,and comment. This function provides an easy way to achieve this.
#'
#' @param lib The in-house library generated by \code{read_lib}.
#' @param CE User defined collision energy. If no CE is supplied, the
#' CollisionEnergy field will not be changed.
#' @param instrument User define instrument type. If no instrument is supplied,
#'   the InstrumentType field will not be changed.
#' @param comment User define comment, e.g., Principle investigator, data
#'   collector, laboratory, etc.If no comment is supplied, the Comment field
#'   will not be changed. If you want to add new comment, please set "add = TRUE",
#'   then old and new comment will be separated by ";". Otherwise, the old
#'   comment will be covered by the new one.
#' @param add A logical scalar. Whether to keep the old comment and add new
#'   comment behind or just replace the old comment. TRUE or FALSE.
#'
#' @return A \code{list} with meta data assigned.
#' @export
#'
#' @import future.apply
change_meta <-
  function(lib, CE = NA, instrument = NA, comment = NA, add = FALSE) {
    future.apply::future_lapply(lib, function(x) {
      if (is.na(CE)) {
        x$CollisionEnergy <- x$CollisionEnergy
      } else {
        x$CollisionEnergy <- CE
      }

      if (is.na(instrument)) {
        x$InstrumentType <- x$InstrumentType
      } else {
        x$InstrumentType <- instrument
      }

      if (is.na(comment)) {
        x$Comment <- x$comment
      } else {
        if (add == FALSE) {
          x$Comment <- comment
        } else {
          x$Comment <- paste(x$Comment, comment, sep = ";")
        }
      }

      return(x)
    })
  }
QizhiSu/mspcompiler documentation built on May 7, 2024, 4:25 a.m.