R/makefeature.R

Defines functions makefeature

Documented in makefeature

#' @title Generate missing rate matrix and extract 3 features characterizing the missing rate pattern
#' @description
#' \code{makefeature} generates the missing rate matrix from the data frame of metabolomics signal (generated by \code{\link{read.metfile}}), according to specified window size and slide size.
#' With the missing rate matrix, \code{makefeature} gives 3 features characterizing the missing rates pattern of the metabolites, including variance of missing rate, number of switches and longest block.
#' @param data A data frame of metabolites signal. Rows are metabolites and columns are samples in injection order. The data frame can be created by \code{\link{read.metfile}}.
#' @param wsize Window size. Default is 100.
#' @param ssize Slide size. Default is 0.5.
#' @param defswitch Definition of a switch. If the absolute missing rate difference of adjacent windows is greater than defswitch, it will be defined as a switch. Default is 0.2.
#' @return A list contains features generated from the metabolites signal matrix. The list contains the following entries:
#' \itemize{
#' \item mbnames: A vector. Name list of metabolites.
#' \item snames: A vector. Name list of samples.
#' \item wsize: A scalar. Window size.
#' \item ssize: A scalar. Slide size.
#' \item defswitch: A scalar. Definition of switch.
#' \item mrate: Missing rate matrix. Each cell of the matrix is a missing rate of a window.
#' \item variance: A vector. Missing rate variance for each metabolite.
#' \item nswitches: A vector. Number of switches for each metabolite.
#' \item longestblock: A vector. Number of windows of the longest block for each metabolite.
#' }
#' @author Liu Cao
#'
#' @seealso See \code{\link{read.metfile}} for how to generate the the input metabolites signal data frame.
#' @export

makefeature <- function(data, wsize=100,ssize=0.5,defswitch=0.2) {
  mbnames = rownames(data)
  snames = colnames(data)
  # calculating missing rate matrix
  sampleSize = dim(data)[2]
  slideStep = wsize*ssize
  tempResults = matrix(nrow=dim(data)[1],ncol=floor((sampleSize-wsize)/slideStep)+1)
  nstart=1
  iter = 1
  while(nstart <= sampleSize){
    if(nstart+slideStep+wsize-1>sampleSize){
      nend = sampleSize
      tempResults[,iter] = apply( apply(data[,nstart:nend],1,is.na), 2, mean)
      nstart = sampleSize +1
      iter = iter+1
    }
    else{
      nend = nstart+wsize-1
      tempResults[,iter] = apply( apply(data[,nstart:nend],1,is.na), 2, mean)
      nstart = nstart+slideStep
      iter = iter + 1
    }
  }
  mrate = tempResults
  print(paste0(dim(mrate)[2], " windows in total"))

  # 4 features
  meanmrate = apply(mrate,1,mean)
  variance = apply(mrate,1,var)
  dif = abs(mrate[,2:dim(mrate)[2]]-mrate[,1:(dim(mrate)[2]-1)])
  nswitches = c()
  longestblock = c()
  for(j in 1:dim(dif)[1]){
    nswitches[j] = length(which(dif[j,]>=defswitch))
    temp = c(1,which(dif[j,]>=defswitch),dim(mrate)[2])
    l = length(temp)
    longestblock[j] = max(temp[2:l]-temp[1:(l-1)]) + 1
  }

  return(list(mbnames=mbnames,snames=snames,wsize=wsize,ssize=ssize,defswitch=defswitch,
              mrate=mrate, meanmrate = meanmrate, variance=variance,nswitches=nswitches,longestblock=longestblock))
}
liucaomics/genuMet documentation built on Nov. 11, 2019, 12:13 a.m.