#' @title Generate missing rate matrix and extract 3 features characterizing the missing rate pattern
#' @description
#' \code{makefeature} generates the missing rate matrix from the data frame of metabolomics signal (generated by \code{\link{read.metfile}}), according to specified window size and slide size.
#' With the missing rate matrix, \code{makefeature} gives 3 features characterizing the missing rates pattern of the metabolites, including variance of missing rate, number of switches and longest block.
#' @param data A data frame of metabolites signal. Rows are metabolites and columns are samples in injection order. The data frame can be created by \code{\link{read.metfile}}.
#' @param wsize Window size. Default is 100.
#' @param ssize Slide size. Default is 0.5.
#' @param defswitch Definition of a switch. If the absolute missing rate difference of adjacent windows is greater than defswitch, it will be defined as a switch. Default is 0.2.
#' @return A list contains features generated from the metabolites signal matrix. The list contains the following entries:
#' \itemize{
#' \item mbnames: A vector. Name list of metabolites.
#' \item snames: A vector. Name list of samples.
#' \item wsize: A scalar. Window size.
#' \item ssize: A scalar. Slide size.
#' \item defswitch: A scalar. Definition of switch.
#' \item mrate: Missing rate matrix. Each cell of the matrix is a missing rate of a window.
#' \item variance: A vector. Missing rate variance for each metabolite.
#' \item nswitches: A vector. Number of switches for each metabolite.
#' \item longestblock: A vector. Number of windows of the longest block for each metabolite.
#' }
#' @author Liu Cao
#'
#' @seealso See \code{\link{read.metfile}} for how to generate the the input metabolites signal data frame.
#' @export
makefeature <- function(data, wsize=100,ssize=0.5,defswitch=0.2) {
mbnames = rownames(data)
snames = colnames(data)
# calculating missing rate matrix
sampleSize = dim(data)[2]
slideStep = wsize*ssize
tempResults = matrix(nrow=dim(data)[1],ncol=floor((sampleSize-wsize)/slideStep)+1)
nstart=1
iter = 1
while(nstart <= sampleSize){
if(nstart+slideStep+wsize-1>sampleSize){
nend = sampleSize
tempResults[,iter] = apply( apply(data[,nstart:nend],1,is.na), 2, mean)
nstart = sampleSize +1
iter = iter+1
}
else{
nend = nstart+wsize-1
tempResults[,iter] = apply( apply(data[,nstart:nend],1,is.na), 2, mean)
nstart = nstart+slideStep
iter = iter + 1
}
}
mrate = tempResults
print(paste0(dim(mrate)[2], " windows in total"))
# 4 features
meanmrate = apply(mrate,1,mean)
variance = apply(mrate,1,var)
dif = abs(mrate[,2:dim(mrate)[2]]-mrate[,1:(dim(mrate)[2]-1)])
nswitches = c()
longestblock = c()
for(j in 1:dim(dif)[1]){
nswitches[j] = length(which(dif[j,]>=defswitch))
temp = c(1,which(dif[j,]>=defswitch),dim(mrate)[2])
l = length(temp)
longestblock[j] = max(temp[2:l]-temp[1:(l-1)]) + 1
}
return(list(mbnames=mbnames,snames=snames,wsize=wsize,ssize=ssize,defswitch=defswitch,
mrate=mrate, meanmrate = meanmrate, variance=variance,nswitches=nswitches,longestblock=longestblock))
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.