Nothing
#' Read Sample Meta-data from Quantification-Software And/Or Sdrf And Align To Experimental Data
#'
#' Sample/experimental annotation meta-data form \href{https://www.maxquant.org/}{MaxQuant}, ProteomeDiscoverer, FragPipe, Proline or similar, can be read using this function and relevant information extracted.
#' Furthermore, annotation in \href{https://github.com/bigbio/proteomics-sample-metadata}{sdrf-format} can be added (the order of sdrf will be adjated automatically, if possible).
#' This functions returns a list with grouping of samples into replicates and additional information gathered.
#' Input files compressed as .gz can be read as well.
#'
#' @details
#'
#' When initally reading/importing quantitation data, typically very little is known about the setup of different samples in the underlying experiment.
#' The overall aim is to read and mine the corresponding sample-annotation documeneted by the quantitation-software and/or from n sdrf repository and to attach it to the experimental data.
#' This way, in subsequent steps of analysis (eg PCA, statictical tests) the user does not have to bother stuying the experimental setup to figure out which
#' samples should be considered as relicate of whom.
#'
#' Sample annotation meta-data can be obtained from two sources :
#' a) form additional files produced (and exported) by the initial quantitation software (so far MaxQuant and ProteomeDiscoverer have een implemeneted) or
#' b) from the universal sdrf-format (from Pride or user-supplied).
#' Both types can be imported and checked in the same run, if valid sdrf-information is found this will be given priority.
#' For more information about the sdrf format please see \href{https://github.com/bigbio/proteomics-sample-metadata}{sdrf on github}.
#'
#'
#' @param quantMeth (character, length=1) quantification method used; 2-letter abbreviations like 'MQ','PD','PL','FP' etc may be used
#' @param sdrf (character, list or data.frame) optional extraction and adding of experimenal meta-data:
#' if character, this may be the ID at ProteomeExchange or a similarly formatted local file. \code{sdrf} will get priority over \code{suplAnnotFile}, if provided.
#' @param suplAnnotFile (logical or character) optional reading of supplemental files produced by MaxQuant; if \code{gr} is provided, it gets priority for grouping of replicates
#' if \code{TRUE} in case of \code{method=='MQ'} (MaxQuant) default to files 'summary.txt' (needed to match information of \code{sdrf}) and 'parameters.txt' which can be found in the same folder as the main quantitation results;
#' if \code{character} the respective file-names (relative ro absolute path), 1st is expected to correspond to 'summary.txt' (tabulated text, the samples as given to MaxQuant) and 2nd to 'parameters.txt' (tabulated text, all parameters given to MaxQuant)
#' in case of \code{method=='PL'} (Proline), this argument should contain the initial file-name (for the identification and quantification data) in the first position
#' @param path (character) optional path of file(s) to be read
#' @param abund (matrix or data.frame) experimental quantitation data; only column-names will be used for aligning order of annotated samples
#' @param groupPref (list) additional parameters for interpreting meta-data to identify structure of groups (replicates);
#' May contain \code{lowNumberOfGroups=FALSE} for automatically choosing a rather elevated number of groups if possible (defaults to low number of groups, ie higher number of samples per group).
#' A vector of custom sample-names may be provided via \code{sampleNames=...} (must be of correct length);
#' if contains \code{sampleNames="sdrf"} sample-names will be used from trimmed file-names.
#' @param chUnit (logical or character) optional adjustig of group-labels from sample meta-data in case multipl different unit-prefixes are used to single common prefix
#' (eg adjust '100pMol' and '1nMol' to '100pMol' and '1000pMol') for better downstream analysis. This option will call \code{\link[wrMisc]{adjustUnitPrefix}} and \code{\link[wrMisc]{checkUnitPrefix}} from package \code{wrMisc}
#' If \code{character} exatecly this/these unit-names will be searched in sample-names and checked if multiple different decimal prefixes are used;
#' if \code{TRUE} the default set of unit-names ('Mol','mol', 'days','day','m','sec','s','h') will be checked in the sample-names for different decimal prefixes
#' @param silent (logical) suppress messages if \code{TRUE}
#' @param debug (logical) additional messages for debugging
#' @param callFrom (character) allows easier tracking of messages produced
#' @return This function returns a list with \code{$level} (grouping of samples given as integer), and \code{$meth} (method by which grouping as determined).
#' If valid \code{sdrf} was given, the resultant list contains in addition \code{$sdrfDat} (data.frame of annotation).
#' Alternatively it may contain a \code{$sdrfExport} if sufficient information has been gathered (so far only for MaxQuant) for a draft sdrf for export (that should be revised and completed by the user).
#' If software annotation has been found it will be shown in \code{$annotBySoft}.
#' If all entries are invalid or entries do not pass the tests, this functions returns an empty \code{list}.
#' @seealso this function is used internally by \code{\link{readMaxQuantFile}},\code{/link{readProteomeDiscovererFile}} etc; uses \code{\link{readSdrf}} for reading sdrf-files, \code{\link[wrMisc]{replicateStructure}} for mining annotation columns
#' @examples
#' sdrf001819Setup <- readSampleMetaData(quantMeth=NA, sdrf="PXD001819")
#' str(sdrf001819Setup)
#'
#' @export
readSampleMetaData <- function(quantMeth, sdrf=NULL, suplAnnotFile=NULL, path=".", abund=NULL, groupPref=list(lowNumberOfGroups=TRUE, sampleNames=NULL, gr=NULL), chUnit=TRUE, silent=FALSE, debug=FALSE, callFrom=NULL) {
## sdrf..()
## suplAnnotFile..(character or logical)
## quantMeth..(character)
## abund..(matrix or data.frame) column-names will be used to comapre & align sample meta-data)
#### CONTENT
## 1.1 SOFTWARE specific META-DATA .. read additional annotation & documentation files
## Aim : extract/build 'summaryD' (& parametersD) allowing to match colnames of 'abund' to suplAnnotFile and/or sdrf
## 1.2 basic check of summaryD to quant data, extract supl info for sdrf
## evaluate summaryD to consistent format
## 1.3 TRY CHECKING/ADJUSTING ORDER of summaryD
## 1.4 replicateStructure
### 2 READ SDRF annotation & pick groups of replicates; has priority over grouping based on summary.txt
## 2.1 basic check (distinguish full $sampleSetup) form custom data.frame
## 2.2 need to match lines (samples) of sdrf (setupDat) to summaryD and/or colnames of abund
## 2.3 ready to make setupSd
fxNa <- wrMisc::.composeCallName(callFrom, newNa="readSampleMetaData")
if(isTRUE(debug)) silent <- FALSE
if(!isTRUE(silent)) silent <- FALSE
summaryD <- parametersD <- setupSdSoft <- setupSd <- sdrfInf <- annSh <- parametersSd <- NULL # initialize (setupSd needed ?)
## checks
if(length(suplAnnotFile) >1) if(is.na(suplAnnotFile[1])) suplAnnotFile <- NULL
datOK <- length(sdrf) >0 || length(suplAnnotFile) >0
if(length(quantMeth) <1) quantMeth <- NA
if(length(abund) >0 && any(length(dim(abund)) !=2, dim(abund) < 1, na.rm=TRUE)) { datOK <- FALSE
warning("Invalid argument 'abund'; must be matrix or data.frame with min 1 line and 1 col")}
if(debug) {message(fxNa,"Ready search & extract sample meta-data rSM0"); rSM0 <- list(sdrf=sdrf,suplAnnotFile=suplAnnotFile,quantMeth=quantMeth,path=path,abund=abund)}
.corPathW <- function(x) gsub("\\\\", "/", x)
.adjPat <- function(x) { out <- match(x, unique(x)); names(out) <- if(length(names(x)) >0) names(x) else x; out} # reduce to integer-pattern (with names)
.adjTrimPat <- function(x) { x <- wrMisc::rmEnumeratorName(x, incl=c("anyCase","trim0","rmEnum"), sepEnum=c(" ","-","_"), nameEnum=c("Number","No","#","","Replicate","Sample"))
out <- match(x, unique(x)); names(out) <- names(x); out} # used
.redLstToDf <- function(lst) { # transform lst to data.frame; in case some list-entries have different length, choose the entries with most feq common length
leL <- sapply(lst, length)
if(any(duplicated(leL))) { # need to reduce : find most frequent
leL2 <- tabulate(leL)
lst <- lst[which(leL==which.max(leL2))] }
as.data.frame(lst) }
.replSingleWord <- function(ii, tx, se, replBy="") { ## for single word in all tx; moove to wrMisc?
## NOT USED ANY MORE (when using wrMisc::rmSharedWords); moove to wrMisc?
## ii .. word to remove
## tx .. ini char-vector
## se .. possible separators
if(length(se) >1) se <- paste0("(",paste(se,collapse="|"),")")
i2 <- grepl(paste0("^",ii), tx) # heading
out <- rep(NA, length(tx))
## need to protect special characters in ii
ii <- wrMisc::protectSpecChar(ii)
if(any(i2)) out[which(i2)] <- sub(paste0("^",ii,se), replBy, tx[which(i2)]) # heading : remove with following sep (if avail)
if(any(!i2)) out[which(!i2)] <- sub(paste0("(",se,ii,")|(^",ii,")"), replBy, tx[which(!i2)]) # not heading (may be heding now..): remove preceeding sep
out }
.trimRedWord <- function(txt, sep=c(" ","_","-","/"), minLe=3, strict=TRUE, silent=TRUE, callFrom=NULL, debug=FALSE) {
## replaced by wrMisc::rmSharedWords (also used inside .checkSetupGroups () )
## NOT USED ANY MORE (when using wrMisc::rmSharedWords)
## function to trim redundant words (@separator) similar to wrMisc::trimRedundText()
## strict .. (logical) requires separator to occur in each single character-string to be considered
## minLe .. min length for words to be considered (otherwise frequently problem with '1')
##
datOK <- length(txt) >0
if(datOK) { chNA <- is.na(txt)
if(all(chNA)) datOK <- FALSE else tx1 <- txt[which(!chNA)]
}
if(datOK) {
#strict=TRUE
chSe <- sapply(sep, function(x) nchar(tx1) > nchar(gsub(x,"",tx1)))
chS2 <- if(strict) colSums(chSe) ==length(tx1) else colSums(chSe) >0 # if strict require at least instace of 'sep' in each element
if(debug) {message(fxNa,"tRW1"); tRW1 <- list(txt=txt,sep=sep,chSe=chSe,chS2=chS2,strit=strict,tx1=tx1)}
if(any(chS2)) sep <- sep[which(chS2)] else datOK <- FALSE # reduce to sep found
}
if(datOK) {
allW <- unique(unlist(strsplit(tx1, paste(sep, collapse="|")), use.names=FALSE))
## keep only >2 char words
chLe <- nchar(allW) >= minLe
if(any(!chLe)) allW <- allW[which(chLe)]
## check all 'words' for recurring in each char-string
if(length(allW) >0) {
chW <- colSums(sapply(allW, grepl, tx1)) ==length(tx1)
if(any(chW)) {
rmWo <- names(chW[which(chW)])
#chLe <- nchar(rmWo) >0
if(debug) {message(fxNa,"tRW2"); tRW2 <- list()}
if(length(rmWo) >0) {
for(wo in rmWo) tx1 <- .replSingleWord(wo, tx1, sep)
txt[which(!chNA)] <- tx1
}
if(any(chLe)) {
rmWo <- rmWo[which(chLe)]
for(wo in rmWo) tx1 <- .replSingleWord(wo, tx1, sep)
txt[which(!chNA)] <- tx1 }
} }
}
txt }
if( utils::packageVersion("wrMisc") > "1.15.1.1") .trimRedWord <- wrMisc::rmSharedWords
.chColOrder <- function(sdr1, sdr2, colNa=c("comment.file.uri.","comment.data.file.")) { out <- NULL
## use sdr1 as old/inital, sdr2 as new; return vector for re-establishing init order
## NOT USED ANY MORE !!
for(i in colNa) {
if(i %in% colnames(sdr1) && i %in% colnames(sdr2) && sum(duplicated(sdr1[,i])) <1 && length(out) <1) {out <- match(sdr1[,i], sdr2[,i]); break }
}
out }
## end suppl fx
path <- if(length(path) <1) "." else path[1]
nSamp0 <- if(length(dim(abund)) >1) ncol(abund) else 0
chSoft <- c("MQ", "PD", "PL", "FP","MC","AP","IB","NN")
defUnits <- c("Mol","mol", "days","day","m","sec","s","h") # for unit-conversion of sample/column-names
syncColumns <- c(sdrfDat=NA, annotBySoft=NA)
if(datOK) {
if("maxquant" %in% tolower(quantMeth)) quantMeth <- "MQ"
if("proteomediscoverer" %in% tolower(quantMeth)) quantMeth <- "PD"
if("proline" %in% tolower(quantMeth)) quantMeth <- "PL"
if("fragpipe" %in% tolower(quantMeth)) quantMeth <- "FP"
if("masschroq" %in% tolower(quantMeth)) quantMeth <- "MC"
if("alphapept" %in% tolower(quantMeth)) quantMeth <- "AP"
if("ionbot" %in% tolower(quantMeth)) quantMeth <- "IB"
if("dia-nn" %in% tolower(quantMeth) || "diann" %in% tolower(quantMeth)) quantMeth <- "NN"
}
if(datOK) { if(length(abund) >0) if(is.null(colnames(abund))) { abund <- NULL
if(!silent) message(fxNa,"Invalid 'abund' : has NO colnames !") }
### IMPORT SAMPLE META-DATA, if possible GROUPING OF REPLICATES
if(length(suplAnnotFile) ==1) {
if(isFALSE(suplAnnotFile)) suplAnnotFile <- NULL else if(is.na(suplAnnotFile)) suplAnnotFile <- NULL }
## 1.1 SOFTWARE specific META-DATA : read additional annotation & documentation files produced by var software as summaryD & parametersD
if(length(suplAnnotFile) >0) { # read quant software-generated sample annotation
chFiNa <- NULL # initialize
if(debug) {message(fxNa,"rSM1"); rSM1 <- list(sdrf=sdrf,abund=abund,path=path,suplAnnotFile=suplAnnotFile,quantMeth=quantMeth,syncColumns=syncColumns,groupPref=groupPref) }
## option 1 : suplAnnotFile has path (do not use default 'path'), use same path for default suplAnnotFile (if applicable)
## option 2 : suplAnnotFile has no path, use 'path' for sdrf & suplAnnotFile
## Aim : extract/build 'summaryD' (& parametersD) allowing to match colnames of 'abund' to suplAnnotFile and/or sdrf
## MaxQuant : (summary.txt & parameters.txt)
if("MQ" %in% quantMeth && length(suplAnnotFile) >0) {
isDir <- if(is.character(suplAnnotFile)) utils::file_test("-d",suplAnnotFile[1]) else FALSE
if(isDir) { path <- suplAnnotFile[1]; suplAnnotFile <- TRUE}
if(isTRUE(suplAnnotFile)) { # automatic search for standard file-names ('summary.txt','parameters.txt') in same dir as main MaxQuant data
chFiNa <- c("summary.txt","summary.txt.gz","parameters.txt","parameters.txt.gz")
chFi <- file.exists(file.path(path, chFiNa))
if(debug) {message(fxNa,"rSM0mq\n"); rSM0mq <- list(path=path,sdrf=sdrf,suplAnnotFile=suplAnnotFile,quantMeth=quantMeth,abund=abund,chFi=chFi,chFiNa=chFiNa )}
if(any(chFi, na.rm=TRUE)) { suplAnnotFile <- c(summary=chFiNa[1:2][which(chFi[1:2])[1]], parameters=chFiNa[3:4][which(chFi[3:4])[1]] )
if(all(names(suplAnnotFile)=="parameters")) suplAnnotFile <- c(NA, parameters=suplAnnotFile$parameters) # make length=2
chFi <- c(chFi[1] | chFi[2], chFi[3] | chFi[4]) #needed ?
} else suplAnnotFile <- NULL
} else { # specific/non-default file given
if(length(suplAnnotFile) >2) suplAnnotFile <- suplAnnotFile[1:2] # use max length=2
chFi <- rep(FALSE, 2)
if(!is.na(suplAnnotFile[1])) chFi[1] <- file.exists(file.path(path, suplAnnotFile[1]))
if(!is.na(suplAnnotFile[2])) chFi[2] <- file.exists(file.path(path, suplAnnotFile[2]))
}
if(debug) {message(fxNa,"rSM1mq"); rSM1mq <- list(path=path,sdrf=sdrf,summaryD=summaryD,suplAnnotFile=suplAnnotFile,quantMeth=quantMeth,abund=abund,path=path,nSamp0=nSamp0,chFiNa=chFiNa,chFi=chFi )}
## main reading of MQ sample meta-data
if(chFi[1]) summaryD <- try(utils::read.delim(file.path(path, suplAnnotFile[1]), stringsAsFactors=FALSE), silent=TRUE) # 'summary.txt'
if(chFi[2]) parametersD <- try(utils::read.delim(file.path(path, suplAnnotFile[2]), stringsAsFactors=FALSE), silent=TRUE) # 'parameters.txt'
if(inherits(summaryD, "try-error")) {summaryD <- NULL; if(!silent) message(fxNa,"Meta-data: Failed to read '",suplAnnotFile[1],"' for getting additional information about experiment !")} else {
summaryD <- if(nrow(summaryD) >2) summaryD[-nrow(summaryD),] else matrix(summaryD[-nrow(summaryD),], nrow=1,dimnames=list(NULL,colnames(summaryD))) # need to remove last summary-line
if(debug) message(fxNa,"Successfully read sample annotation from '",suplAnnotFile[1],"'") }
if(inherits(parametersD, "try-error")) {if(!silent) message(fxNa,"Meta-data: Failed to read '",suplAnnotFile[2],"' !")} else {
if(debug && chFi[2]) message(fxNa,"Successfully read ",quantMeth," parameters from '",suplAnnotFile[2],"'") }
syncColumns["annotBySoft"] <- FALSE
if(debug) { message(fxNa,"rSM1mq2"); rSM1mq2 <- list()}
}
## ProteomeDiscoverer
## uses suplAnnotFile as path for '.InputFiles\\.txt'
if("PD" %in% quantMeth && length(suplAnnotFile) >0) {
if(debug) {message(fxNa,"rSM1pd"); rSM1pd <- list(sdrf=sdrf,suplAnnotFile=suplAnnotFile,quantMeth=quantMeth)}
if(length(suplAnnotFile) >1) { if(!silent) message(fxNa,"Only 1st value of argument 'suplAnnotFile' can be used with quantMeth=PD")
suplAnnotFile <- suplAnnotFile[1] }
if(isTRUE(suplAnnotFile)) { # automatic search for standard file-name ('InputFiles.txt') in same dir as main MaxQuant data
suplAnnotFile <- list.files(path=path, pattern=".InputFiles\\.txt$|.InputFiles\\.txt\\.gz$")
if(length(suplAnnotFile) >1) { if(!silent) message(fxNa,"Found ",length(suplAnnotFile)," files matching general patter, using ONLY 1st, ie ",suplAnnotFile[1])
suplAnnotFile <- suplAnnotFile[1] }
chFi <- length(suplAnnotFile) >0
if(!chFi && !silent) message(fxNa,"Note: Unable to (automatically) find sample-annotation file. Maybe it was not exported from ProteomeDiscoverer ?")
} else chFi <- try(file.exists(file.path(path, suplAnnotFile)), silent=TRUE)
if(inherits(chFi, "try-error") & silent) {chFi <- FALSE; message(fxNa,"Meta-data: Failed to see file '",suplAnnotFile[1]," ! (check if file exists or rights to read directory ?)")}
if(debug) {message(fxNa,"rSM1pd2") }
## main reading of PD sample meta-data
if(chFi) summaryD <- try(utils::read.delim(file.path(path, suplAnnotFile[1]), stringsAsFactors=FALSE), silent=TRUE)
if(inherits(summaryD, "try-error")) {summaryD <- NULL; if(!silent) message(fxNa,"Meta-data: Failed to read '",suplAnnotFile[1],"' !")
} else {
syncColumns["annotBySoft"] <- FALSE
if("File.Name" %in% colnames(summaryD)) {
chRa <- grep("\\.raw$", tolower(summaryD[,"File.Name"]))
if(length(chRa) <1) chRa <- grep("\\.raw", tolower(summaryD[,"File.Name"]))
if(length(chRa) < nrow(summaryD) && length(chRa) >0) {
if(debug) message(fxNa,"Filter summaryD to '.raw' from ",nrow(summaryD)," to ",length(chRa))
summaryD <- if(length(chRa) > 1) summaryD[chRa,] else matrix(summaryD[chRa,], nrow=length(chRa), dimnames=list(rownames(summaryD)[chRa], colnames(summaryD))) }
} else if("Input.Files.Workflow.ID" %in% colnames(summaryD)) {
chNeg <- try(as.integer(summaryD[,"Input.Files.Workflow.ID"]), silent=TRUE)
if(!inherits(chNeg, "try-error")) { chNeg <- chNeg <0
if(any(chNeg)) summaryD <- if(sum(chNeg) > nrow(summaryD) -2) matrix(summaryD[which(!chNeg),], nrow=sum(!chNeg), dimnames=list(rownames(summaryD)[which(!chNeg)], colnames(summaryD))) else summaryD[which(!chNeg),] }
}
if(debug) message(fxNa,"ProteomeDiscoverer Meta-data successfully read '",suplAnnotFile[1])}
if(debug) {message(fxNa,"rSM1pd3"); rSM1pd3 <- list(summaryD=summaryD,parametersD=parametersD,suplAnnotFile=suplAnnotFile,quantMeth=quantMeth, sdrf=sdrf,path=path,nSamp0=nSamp0,chSoft=chSoft,syncColumns=syncColumns)}
}
## Proline
## so far only for reading out of xslx
if("PL" %in% quantMeth && length(suplAnnotFile) >0) {
if(debug) {message(fxNa,"rSM0pl"); rSM0pl <- list(sdrf=sdrf,suplAnnotFile=suplAnnotFile,quantMeth=quantMeth)}
summaryD <- NULL
## need init filename given via suplAnnotFile
if(length(grep("\\.xlsx$", suplAnnotFile[1])) >0) { # won't enter here if suplAnnotFile==NULL
## Extract out of Excel
reqPa <- c("readxl")
chPa <- sapply(reqPa, requireNamespace, quietly=TRUE)
if(any(!chPa)) message(fxNa,"package( '",paste(reqPa[which(!chPa)], collapse="','"),"' not found ! Please install first from CRAN") else {
sheets <- if(debug) try(readxl::excel_sheets(suplAnnotFile[1]), silent=TRUE) else suppressMessages(try(readxl::excel_sheets(suplAnnotFile[1]), silent=TRUE))
if(debug) {message(fxNa,"rSM2pl"); rSM2pl <- list()}
if(inherits(sheets, "try-error")) { message(fxNa,"Unable to read file '",suplAnnotFile,"' ! Returning NULL; check format & rights to read")
} else {
annShe <- c("Import and filters", "Search settings and infos") # sheets from xslx to try reading for sample/meta-information
annSh <- wrMisc::naOmit(match(annShe, sheets))
annSh <- grep("Import", if(length(annSh) <1) sheets else sheets[annSh])
if(length(annSh) >1) {
if(!silent) message(fxNa,"Multipe sheets containing 'Import' found, using 1st :",sheets[annSh[1]])
annSh <- annSh[1]
} else if(length(annSh) <1 && !silent) {
message(fxNa,"Note: NONE of ANNOTATION SHEETS (",wrMisc::pasteC(annShe),") in '",suplAnnotFile,"' FOUND ! Can't check Matching order of samples to sdrf-anotation !")
}
summaryD <- as.matrix(as.data.frame(if(debug) readxl::read_xlsx(suplAnnotFile[1], sheet=annSh, col_names=FALSE) else suppressMessages(readxl::read_xlsx(suplAnnotFile[1], sheet=annSh, col_names=FALSE))))
rownames(summaryD) <- summaryD[,1]
summaryD <- t(summaryD[,-1])
rownames(summaryD) <- 1:nrow(summaryD)
#syncColumns["annotBySoft"] <- FALSE
}
}
} else if(debug) message(fxNa,"Unknown type of sample/experiment annotation file ('",suplAnnotFile[1],"') for Proline, ignoring !!")
} # finish PL
## FragPipe
##
if("FP" %in% quantMeth && length(suplAnnotFile) >0) {
if(debug) { message(fxNa,"rSM1fp1"); rSM1fp1 <- list()}
## option 1 : suplAnnotFile has path (do not use default 'path'), use same path for default suplAnnotFile (if applicable)
## option 2 : sdrf has no path, use 'path' for sdrf & suplAnnotFile
## Aim : extract/build 'summaryD' allowing to match colnames of 'abund' to suplAnnotFile and/or sdrf
## filelist_ionquant.txt & fragpipe-files.fp-manifest
isDir <- if(is.character(suplAnnotFile)) utils::file_test("-d", suplAnnotFile[1]) else FALSE
if(isDir) { path <- suplAnnotFile[1]; suplAnnotFile <- TRUE}
if(isTRUE(suplAnnotFile)) { # automatic search for standard file-names ('summary.txt','parameters.txt') in same dir as main MaxQuant data
chFiNa <- c("doNotUseDoNotUse","doNotUseDoNotUse", "fragpipe-files.fp-manifest","fragpipe-files.fp-manifest.gz", "fragpipe.workflow","fragpipe.workflow.gz")
chFi <- file.exists(file.path(path, chFiNa))
if(debug) {message(fxNa,"rSM1fp2"); rSM1fp2 <- list(path=path,sdrf=sdrf,suplAnnotFile=suplAnnotFile,quantMeth=quantMeth,abund=abund,chFi=chFi,chFiNa=chFiNa )}
if(any(chFi, na.rm=TRUE)) { suplAnnotFile <- c(summary=chFiNa[1:4][which(chFi[1:4])[1]], parameters=chFiNa[5:6][which(chFi[5:6])[1]] )
if(all(names(suplAnnotFile)=="parameters")) suplAnnotFile <- c(NA, parameters=suplAnnotFile$parameters) # make length=2
chFi <- c(chFi[1] || chFi[2] || chFi[3] || chFi[4], chFi[5] || chFi[6]) # reduce to length=2 (1st for summary, 2nd for parameters)
} else suplAnnotFile <- NULL
} else { # specific/non-default file given (1st for summary, 2nd for parameters)
if(length(suplAnnotFile) >2) suplAnnotFile <- suplAnnotFile[1:2] # use max length=2
chFi <- rep(FALSE, 2)
if(!is.na(suplAnnotFile[1])) chFi[1] <- file.exists(file.path(path, suplAnnotFile[1]))
if(!is.na(suplAnnotFile[2])) chFi[2] <- file.exists(file.path(path, suplAnnotFile[2]))
}
if(debug) {message(fxNa,"rSM1fp3"); rSM1fp3 <- list()}
## main reading of FP sample meta-data
if(chFi[1]) summaryD <- try(utils::read.delim(file.path(path, suplAnnotFile[1]), header=FALSE, stringsAsFactors=FALSE), silent=TRUE)
if(chFi[2]) parametersD <- try(utils::read.delim(file.path(path, suplAnnotFile[2]), header=FALSE, stringsAsFactors=FALSE), silent=TRUE)
if(inherits(summaryD, "try-error")) { summaryD <- NULL; if(!silent) message(fxNa,"Meta-data: Failed to read '",suplAnnotFile[1],"' for getting additional information about experiment !")
} else if(!is.null(summaryD)) {
msg <- c("File '",suplAnnotFile[1],"' is NOT good annotation file ! Ignoring")
if(identical(summaryD[1,], c("flag","value"))) { warning(fxNa, msg); summaryD <- NULL}
if(sum(dim(summaryD) >1) <2) { warning(fxNa, msg); summaryD <- NULL}
if(length(summaryD) >0) {
colnames(summaryD) <- c("file","experiment","bioreplicate","dataType")[1:min(ncol(summaryD), 4)]
summaryD <- as.matrix(summaryD)
summaryD[,1] <- .corPathW(summaryD[,1])
}
#syncColumns["annotBySoft"] <- FALSE
if(debug) message(fxNa,"Successfully read sample annotation from '",suplAnnotFile[1],"'") }
if(inherits(parametersD, "try-error")) {if(!silent) message(fxNa,"Meta-data: Failed to read '",suplAnnotFile[2],"' !")
} else if(!is.null(parametersD)) {
parametersD <- sub("\\\\:",":", gsub("\\\\\\\\","/", as.character(as.matrix(parametersD))[-(2:3)]))
if(debug && chFi[2]) message(fxNa,"Successfully read ",quantMeth," parameters from '",suplAnnotFile[2],"'") }
if(debug) { message(fxNa,"rSM1fp4")}
}
## MassChroq
if("MC" %in% quantMeth && length(suplAnnotFile) >0) {
warning(fxNa,"Reading supplemental meta-data from MassChroq is currently not implemented") }
## FragPipe
if("FP" %in% quantMeth && length(suplAnnotFile) >0) {
warning(fxNa,"Reading supplemental meta-data from FragPipe is currently not implemented") }
## Dia-NN
if("NN" %in% quantMeth && length(suplAnnotFile) >0) {
warning(fxNa,"Reading supplemental meta-data from Dia-NN is currently not implemented") }
## OTHER software ? ..
if(!any(quantMeth %in% chSoft, !silent, na.rm=TRUE)) message(fxNa,"Note: No specific procedure has been implemented so far for gathering meta-data by the analysis-software/method '",quantMeth,"'")
} ## finished main reading of suplAnnotFile into summaryD
if(debug) { message(fxNa,"rSM2"); rSM2 <- list(sdrf=sdrf,abund=abund,suplAnnotFile=suplAnnotFile,quantMeth=quantMeth,summaryD=summaryD,parametersD=parametersD,suplAnnotFile=suplAnnotFile,syncColumns=syncColumns) }
## 1.2 basic check of summaryD to quant data, extract supl info for sdrf
if(length(summaryD) >0) { ## more checks
if(length(abund) <1) message(fxNa,"Can't verify/correct names of annotation since content of 'abund' has was not given (ie NULL) or has no colnames") else {
if(!identical(ncol(abund), nrow(summaryD))) { summaryD <- NULL
if(!silent) message(fxNa,"Note : Number of columns of 'abund' does NOT FIT to number of samples in annotation-data !") }
}
if(length(dim(summaryD)) !=2) summaryD <- matrix(summaryD, ncol=1, dimnames=list(names(summaryD),NULL))
}
if(debug) { message(fxNa,"rSM3"); rSM3 <- list() }
## continue evaluating summaryD to consistent format
if(length(summaryD) >0) { ## define setupSdSoft
## need to match colnames(abund) to (MQ:) $Raw.file or $Experiment .. need to find best partial match
MStype <- "FTMS" # used for extracting (more) sdrf info out of parametersSd
if("MQ" %in% quantMeth) { ## NOT IN SAME ORDER !!
useMQSuCol <- c("Raw.file","Experiment","Enzyme","Variable.modifications", "Fixed.modifications","Multi.modifications")
summaryD <- summaryD[,wrMisc::naOmit(match(useMQSuCol, colnames(summaryD)))] # cor 21oct22, more cols 7jun23
chSd <- length(abund) >0 && nrow(summaryD) == ncol(abund)
## normally colnames(abund) and summaryD should alread be in correct order
if(isTRUE(chSd)) {
if(!silent && length(abund) >0) if(nrow(summaryD) == ncol(abund)) { message(fxNa,"PROBLEM : Meta-data and abundance data do not match ! ",
"Number of samples from ",suplAnnotFile[1]," (",nrow(summaryD),") and from main data (",ncol(abund),") do NOT match !! .. ignoring") }
#if(debug) save(sdrf,abund,suplAnnotFile,quantMeth,summaryD,quantMeth,syncColumns, file="C:\\E\\projects\\TCAmethods\\wrProteoRamus\\rSM4mq.RData")
}
if(length(parametersD) >0) { ## create 'parametersSd' for sdrf
parametersCol <- paste0(c("MS/MS tol.","MS/MS deisotoping tolerance","MS/MS deisotoping tolerance unit")," (",MStype,")") # also "Top MS/MS peaks per Da interval." ?
parametersCol <- c("Modifications included in protein quantification","Match between runs","Fasta file", parametersCol)
parametersSd <- if(parametersCol[4] %in% parametersD[,1]) parametersD[match(parametersCol[4],parametersD[,1]) ,2] else NA # eg '20 ppm'
if(!is.na(parametersSd)) if(grepl("ppm$", parametersSd)) parametersSd <- paste0(1/as.numeric(sub(" ppm$","",parametersSd))," Da")
fragMassT <- if(all(parametersCol[5:6] %in% parametersD[,1])) paste0( parametersD[match(parametersCol[5:6],parametersD[,1]) ,2], collapse=" ") else NA
supPar <- parametersD[match(c("Modifications included in protein quantification","Match between runs"), parametersD[,1]), 2]
parametersSd <- c(precMassTol=parametersSd, fragMassTol=fragMassT, modifs=supPar[1], matchBetwRun=toupper(supPar[2]) )
} else parametersSd <- c(precMassTol=NA, fragMassTol=NA)
parametersSd <- c(assayName="run1", label="NT=label free sample (check if correct)", instrum=NA, parametersSd, cleavAgent=paste0("NT=",summaryD[2,"Enzyme"]) )
## add PTM modifs ...
if(debug) { message(fxNa," .. rSM4mq"); rSM4mq <- list(sdrf=sdrf,abund=abund,suplAnnotFile=suplAnnotFile,quantMeth=quantMeth,summaryD=summaryD,parametersD=parametersD,syncColumns=syncColumns,parametersSd=parametersSd,MStype=MStype)}
}
if("PD" %in% quantMeth) { useCo <- c("Input.Files.","File.ID","File.Name","Instrument.Name") # no suitable 2nd column ...
useCo <- wrMisc::naOmit(match(useCo, colnames(summaryD)))
summaryD <- if(length(useCo) >1) summaryD[,useCo] else matrix(summaryD, ncol=1, dimnames=list(rownames(summaryD), colnames(summaryD)[useCo]))
if(debug) { message(fxNa,"rSM4pd"); rSM4pd <- list(sdrf=sdrf,useCo=useCo,abund=abund,uplAnnotFile=suplAnnotFile,quantMeth=quantMeth,abund=abund,summaryD=summaryD,chFiNa=chFiNa) }
colNa <- wrMisc::trimRedundText(gsub("\\\\","/",as.character(summaryD[,"File.Name"])), silent=debug, debug=debug, callFrom=fxNa)
if(length(colNa) < ncol(abund)) warning(fxNa,"Trouble ahead : Sample annotation data from ProteomeDiscoverer has FEWER samples than data read !") else {
if(length(colNa) > ncol(abund)) { message(fxNa,"Note : Sample annotation data from ProteomeDiscoverer has MORE samples than data read, using only first (might be incorrect)")
colNa <- colNa[1:ncol(abund)]
summaryD <- summaryD[1:ncol(abund),]
} }
## presume that filenames (from summaryD) are in same order as abund, then trim to file-names (if all in same path)
## potential check of order via 'File.ID' to colnames(abund)
coNa1 <- sub("\\.Sample$","", sub("^Abun[[:lower:]]+\\.","", colnames(abund)))
sumDOrd <- match(summaryD$File.ID, coNa1)
chNA <- is.na(sumDOrd)
if(any(chNA)) {
if(!silent) message(fxNa,"NOTE : Unable to match colnames of 'abund' to 'summaryD$File.ID' (NAs at attempt to match) !!")
} else if(!identical(sumDOrd, 1:ncol(abund))) {
summaryD <- summaryD[sumDOrd,]
colNa <- colNa[sumDOrd] }
colnames(abund) <- colNa #
summaryD <- cbind(summaryD, filePath= summaryD[,"File.Name"]) # copy filename+path first to new column
summaryD[,"File.Name"] <- basename(.corPathW(summaryD[,"File.Name"])) # correct to filename only
syncColumns["annotBySoft"] <- TRUE
if(debug) { message(fxNa," .. rSM4pd")}
}
if("PL" %in% quantMeth) { ## order OK ?
chSd <- length(abund) >0 && nrow(summaryD) == ncol(abund)
## normally colnames(abund) and summaryD should alread be in correct order
if(chSd) {
# still need to develope extra verification ?
chCol <- match(c("result_file_name" ,"quant_channel_name","import_params"), colnames(summaryD))
if(debug) { message(fxNa,"rSM4pl"); rSM4pl <- list(sdrf=sdrf,abund=abund,uplAnnotFile=suplAnnotFile,quantMeth=quantMeth,abund=abund,summaryD=summaryD,parametersD=parametersD)
if(all(is.na(chCol))) summaryD <- NULL else {
parametersD <- summaryD[1, 3:ncol(summaryD)] # how to integrate this later ??
summaryD <- summaryD[, chCol]
summaryD[,1] <- sub("\\.mzDB\\.t\\.xml", "", summaryD[,1] ) # remove Proline spefic file-format extensons
chFiNa <- colnames(summaryD) %in% "result_file_name"
if(any(chFiNa)) colnames(summaryD)[which(chFiNa)] <- "File.Name" # this column should be called 'File.Name'
summaryD <- as.data.frame(summaryD)
} # adjust to original raw names
} else {
if(!silent && nrow(summaryD) != ncol(abund)) message(fxNa,"PROBLEM : Invalid meta-data ! ", "Number of samples from ",
suplAnnotFile[1]," (",nrow(summaryD),") and from main data (",ncol(abund),") do NOT match !! .. ignoring") }
}
syncColumns["annotBySoft"] <- TRUE
}
if("FP" %in% quantMeth) { ## NOT IN SAME ORDER !!
mat1 <- match(c("file","experiment"), colnames(summaryD))
if(all(is.na(mat1))) { message(fxNa,"UNABLE to interpret content of ",suplAnnotFile[1]); summaryD <- NULL
} else {
summaryD <- cbind(path=dirname(summaryD[,mat1[1]]), Raw.file= basename(summaryD[,mat1[2]]), Experiment=summaryD[,mat1[2]], trimExp=NA)
summaryD[,4] <- gsub("_+$|-+$|\\.+$| +$|","", sub("[[:digit:]]+$","", wrMisc::trimRedundText(summaryD[,3], side="right", callFrom=fxNa, silent=debug, debug=debug))) # remove tailing numbers (and tailing redundant text to get to numbers)
chSd <- length(abund) >0 && nrow(summaryD) == ncol(abund)
if(length(chSd) <1) chSd <- FALSE
## normally colnames(abund) and summaryD should alread be in same/correct order
if(!chSd) {
if(!silent && length(abund) >0) if(nrow(summaryD) == ncol(abund)) { message(fxNa,"PROBLEM : meta-data and abundance data do not match ! ",
"Number of samples from ",suplAnnotFile[1]," (",nrow(summaryD),") and from main data (",ncol(abund),") do NOT match !! .. ignoring") }
#if(debug) save(sdrf,abund,suplAnnotFile,quantMeth,summaryD,quantMeth, file="C:\\E\\projects\\TCAmethods\\wrProteoRamus\\rSM4mq.RData")
}
}
syncColumns["annotBySoft"] <- TRUE
if(debug) { message(fxNa," .. rSM4mq"); rSM4fp <- list()}
}
## other software ? ...
if(debug) { message(fxNa,"rSM4d"); rSM4d <- list(sdrf=sdrf,abund=abund,suplAnnotFile=suplAnnotFile,quantMeth=quantMeth,abund=abund,summaryD=summaryD,parametersD=parametersD,groupPref=groupPref,syncColumns=syncColumns,setupSdSoft=setupSdSoft,sdrf=sdrf) }
## 1.3 TRY CHECKING/ADJUSTING ORDER of summaryD
if(length(abund) >0 && length(summaryD) >0) {
## some software specific options otherwise check if filenames can be matched to colnames ?
## PD not much possible since colnames ".F1.Sample",".F2.Sample",".F3.Sample",...
## most other software has summaryD in same order as abund
if("MQ" %in% quantMeth) { # colnames of abund not necessarly found in summaryD
summaryD <- wrMisc::matchMatrixLinesToRef(mat=summaryD, ref=colnames(abund), inclInfo=TRUE, silent=TRUE, debug=FALSE, callFrom=fxNa)
syncColumns["annotBySoft"] <- length(summaryD$newOrder) >0
summaryD <- summaryD$mat }
if(any(c("PL","FP") %in% quantMeth, na.rm=TRUE)) {
summaryD <- wrMisc::matchMatrixLinesToRef(mat=summaryD, ref=colnames(abund), inclInfo=TRUE, silent=TRUE, debug=FALSE, callFrom=fxNa)
syncColumns["annotBySoft"] <- length(summaryD$newOrder) >0
summaryD <- summaryD$mat }
}
if(debug) { message(fxNa,"rSM4e"); rSM4e <- list(sdrf=sdrf,abund=abund,suplAnnotFile=suplAnnotFile,quantMeth=quantMeth,abund=abund,summaryD=summaryD,parametersD=parametersD,groupPref=groupPref,syncColumns=syncColumns,setupSdSoft=setupSdSoft,sdrf=sdrf) }
## 1.4 replicateStructure (produce setupSdSoft$level)
grp <- NULL
## 1.4.1 (special case) replicate structure based on custom gr (if given as groupPref$gr)
if(length(groupPref$gr) <1) { setupSdSoft$level <- grp <- .adjPat(groupPref$gr)
} else {
if(length(groupPref$gr)==1) {
## now length(groupPref$gr)==1
if(groupPref$gr=="colnames") grp <- .adjPat(wrMisc::rmEnumeratorName(colnames(abund), incl=c("anyCase","trim0","rmEnum"), sepEnum=c(" ","-","_"), nameEnum=c("Number","No","#","","Replicate","Sample"), silent=silent, debug=debug, callFrom=fxNa))
if(grepl("^sdrf", groupPref$gr)) setupSdSoft$level <- grp <- rep(groupPref$gr, ncol(abund)) # temporal fill (sdrf not read yet)
#if(grepl("^groupPref\\$[[:alpha:]]", groupPref$gr)) {
#} else if(groupPref$gr=="sdrf")
} }
## 1.4.2 (special case) replicate structure based on custom colNames (if 'sampleNames' given and 'gr' not given)
if(length(grp) <1 && length(summaryD) >0 && length(groupPref$sampleNames)==nrow(summaryD) && length(groupPref$gr) <1) {
grp <- .adjPat(wrMisc::rmEnumeratorName(groupPref$sampleNames, incl=c("anyCase","trim0","rmEnum"), sepEnum=c(" ","-","_"), nameEnum=c("Number","No","#","","Replicate","Sample"), silent=silent, debug=debug, callFrom=fxNa))
if(sum(duplicated(grp), na.rm=TRUE) <1) {
grp <- wrMisc::trimRedundText(txt=groupPref$sampleNames, spaceElim=TRUE, silent=debug, debug=debug, callFrom=fxNa)
grp <- .adjPat(wrMisc::rmEnumeratorName(groupPref$sampleNames, incl=c("anyCase","trim0","rmEnum"), sepEnum=c(" ","-","_"), nameEnum=c("Number","No","#","","Replicate","Sample"), silent=silent, debug=debug, callFrom=fxNa))
}
}
if(debug) { message(fxNa,"rSM4f"); rSM4f <- list(sdrf=sdrf,abund=abund,suplAnnotFile=suplAnnotFile,quantMeth=quantMeth,abund=abund,summaryD=summaryD,parametersD=parametersD,groupPref=groupPref,syncColumns=syncColumns,setupSdSoft=setupSdSoft,sdrf=sdrf) }
## 1.4.3 replicateStructure (of summaryD)
if(length(grp) <1) {
setupSdSoft <- wrMisc::replicateStructure(summaryD, silent=silent, debug=debug, callFrom=fxNa)
chLe <- names(setupSdSoft) %in% "lev"
if(any(chLe)) names(setupSdSoft)[which(chLe)] <- "level" # rename ..
if(debug) { message(fxNa,"rSM4f"); rSM4f <- list() }
## so far no direct information about groups (all filenames are different), need to try to find out (remove enumerators)
if(all(!duplicated(setupSdSoft$level)) && length(abund) >0) {
grpA <- wrMisc::trimRedundText(txt=colnames(abund), spaceElim=TRUE, silent=debug, debug=debug, callFrom=fxNa) # 26oct22
colNaGrp <- wrMisc::rmEnumeratorName(grpA, incl=c("anyCase","trim0","rmEnum"), sepEnum=c(" ","-","_"), nameEnum=c("Number","No","#","","Replicate","Sample"), silent=silent, debug=debug, callFrom=fxNa)
colNaGrPref <- TRUE ## preferential to colnames for searching groups
if(any(duplicated(colNaGrp)) && colNaGrPref) { # colnames may be used for designing groups
setupSdSoft$level <- grp <- .adjPat(colNaGrp)
} else {
if(all(setupSdSoft$level ==1:ncol(abund))) {
## note : .adjTrimPat() does NOT allow keeping names of levels
grp2 <- if(ncol(summaryD) >1) apply(summaryD, 2, .adjTrimPat) else as.matrix(.adjTrimPat(summaryD))
if(ncol(summaryD) >1) { grp3 <- apply(grp2, 2, function(x) length(unique(x)))
if(any(grp3 < ncol(abund))) {
if(length(grp3) >0) { useCol <- if(isTRUE(groupPref$lowNumberOfGroups)) which.min(grp3) else which(grp3 ==stats::median(grp3))[1]
setupSdSoft$level <- grp2[,useCol]
names(setupSdSoft$level) <- wrMisc::rmEnumeratorName(wrMisc::trimRedundText(txt=summaryD[,useCol], spaceElim=TRUE, silent=debug, debug=debug, callFrom=fxNa),
incl=c("anyCase","trim0","rmEnum"), sepEnum=c(" ","-","_"), nameEnum=c("Number","No","#","","Replicate","Sample"), silent=silent, debug=debug, callFrom=fxNa)
} }
} else {
names(grp2) <- wrMisc::rmEnumeratorName(wrMisc::trimRedundText(txt=as.character(summaryD), spaceElim=TRUE, silent=debug, debug=debug, callFrom=fxNa),
incl=c("anyCase","trim0","rmEnum"), sepEnum=c(" ","-","_"), nameEnum=c("Number","No","#","","Replicate","Sample"), silent=silent, debug=debug, callFrom=fxNa)
grp <- setupSdSoft$level <- grp2
}
}
}
if(debug) { message(fxNa,"rSM4h"); rSM4h <- list() }
} else { if(!silent) message(fxNa,"Note : Abundance data are ABSENT, CANNOT adjust order of annotation to abundance data")}
}
if(length(grp) >0) { if(length(names(grp)) ==0) names(grp) <- grp
summaryD <- as.data.frame(cbind(summaryD, grp=names(grp), grpInd=grp))} # add presumed grouping to summaryD
} else grp <- NULL
if(debug) { message(fxNa,"rSM5"); rSM5 <- list(sdrf=sdrf,grp=grp,abund=abund,groupPref=groupPref,suplAnnotFile=suplAnnotFile,quantMeth=quantMeth,abund=abund,summaryD=summaryD,parametersD=parametersD,setupSdSoft=setupSdSoft) }
### 2 READ SDRF annotation & pick groups of replicates; has priority over grouping based on summary.txt
###
if(length(sdrf) >0) {
## priority column for groups from sdrf (+ define default colnames for priority)
chGr <- c("sdrfColumn","sdrfCol")
chGrPref <- chGr %in% names(groupPref)
if(any(chGrPref)) {
groupPref$sdrfColumn <- if(length(groupPref[[chGr[which(chGrPref)[1]]]]) >0) groupPref[[chGr[which(chGrPref)[1]]]] else c("factor.value.disease.","characteristics.disease.", "factor.value.treatment.","characteristics.treatment.","comment.technical.replicate.")
}
## 2.0 Check if 'functional' sdrf (ie list) is provided -> use as is
iniSdrfOrder <- sdrfDaIni <- NULL
if(is.list(sdrf) && all(c("sdrfDat","col","level") %in% names(sdrf), na.rm=TRUE)) {
if(debug) message(fxNa,"Custom setupSd provided as sdrf")
if(all(dim(sdrf$sdrfDat)) >0) {
sdrfDat <- sdrf$sdrfDat
} else { sdrf <- NULL
if(!silent) message(fxNa,"PROBLEM : Invalid custom-sdrf (should be list containing sdrf$sdrfDat with matrix or data.frame)")
}
## ? keep initial sdrf ?# sdrf <- "user provided custom object"
} else {
## 'sdrf' may be character vector (length <3) => assume path or sdrf accession, 2nd as sdrf-column to use
## 'sdrf' may be (matrix or) data.frame => to use as table to exploit
if(is.character(sdrf) && length(sdrf) <5 && length(dim(sdrf)) <2) {
## read sdrf from file or github
sdrfDat <- readSdrf(sdrf, silent=silent, debug=debug, callFrom=fxNa)
## check for priority columns, retain 1st of them as sdrf[2]
if(length(groupPref$sdrfColumn) ==1 && length(sdrf) <2) { ch1 <- groupPref$sdrfColumn %in% colnames(sdrfDat)
if(any(ch1)) sdrf[2] <- which(ch1)[1]
}
} else {
## user provided custom sample annotation object
if(length(dim(sdrf)) <2 && !silent) message(fxNa,"Note: 'sdrf' looks bizarre (trouble ahead ?), expecting either file, data.frame or complete list")
sdrfDat <- sdrf
sdrf <- "User provided custom object"}
}
if(debug) { message(fxNa,"rSM6 dim sdrfDat ",nrow(sdrfDat)," ",ncol(sdrfDat)); rSM6 <- list(sdrf=sdrf,sdrfDat=sdrfDat,abund=abund,suplAnnotFile=suplAnnotFile,groupPref=groupPref,quantMeth=quantMeth,abund=abund,summaryD=summaryD,parametersD=parametersD,syncColumns=syncColumns) }
## 2.1 basic check (distinguish full $sampleSetup) from custom data.frame
if(length(sdrfDat) >0) {
syncColumns["sdrfDat"] <- FALSE # initialize
if(is.list(sdrfDat) && "sdrfDat" %in% names(sdrfDat)) { sdrfDat <- sdrfDat$sdrfDat
if("groups" %in% names(sdrfDat)) groupPref$groups <- sdrfDat$groups
if(debug) message(fxNa,"It seems a full $sampleSetup may have been given") }
if(length(dim(sdrfDat)) <2) sdrfDat <- as.matrix(sdrfDat)
if(length(abund) >0 && nrow(sdrfDat) != ncol(abund)) {
if(!silent) message(fxNa,"Note : Ignoring 'sdrf' : it does NOT have the expected number or rows (",nrow(sdrfDat)," given but ",ncol(abund)," expected !)")
sdrf <- sdrfDat <- NULL }}
if(debug) {message(fxNa,"rSM6a"); rSM6a <- list(sdrf=sdrf,sdrfDat=sdrfDat,abund=abund,suplAnnotFile=suplAnnotFile,groupPref=groupPref,quantMeth=quantMeth,abund=abund,summaryD=summaryD,parametersD=parametersD,syncColumns=syncColumns) }
## 2.2 need to match lines (samples) of sdrf (setupDat) to summaryD and/or colnames of abund
if(length(sdrfDat) >0) {
if(length(summaryD) >0) { ## summaryD exist try matching by file-names (ie column 'File.Name' and/or col 'filePath')
chFiNames <- c("File.Name","File","FileName",MQ="Raw.file",PL="raw_file_name","Raw.File","rawfile") # search in summaryD
chFiNa <- chFiNames %in% colnames(summaryD)
if(debug) {message(fxNa,"rSM6a1") }
if(any(chFiNa, na.rm=TRUE) && "comment.file.uri." %in% colnames(sdrfDat)) {
## align by filenames
chFi <- match(sub("\\.zip$|\\.gz$","", basename(.corPathW(summaryD[,chFiNames[which(chFiNa)[1]]]))),
sub("\\.zip$|\\.gz$","", basename(.corPathW(sdrfDat[,"comment.file.uri."])))) # new order
if(any(is.na(chFi)) && any(grepl("\\.raw",sdrfDat[,"comment.file.uri."]), na.rm=TRUE)) {
sumDaFiNa <- sub("\\.raw","",sub("\\.zip$|\\.gz$","", basename(.corPathW(summaryD[,chFiNames[which(chFiNa)[1]]]))))
sdrfFiNa <- sub("\\.raw","",sub("\\.zip$|\\.gz$","", basename(.corPathW(sdrfDat[,"comment.file.uri."]))))
chFi <- match(sumDaFiNa, sdrfFiNa) # new order
if(any(is.na(chFi)) && "FP" %in% quantMeth) {
sumDaFiNa <- wrMisc::rmEnumeratorName(wrMisc::trimRedundText(txt=sumDaFiNa, spaceElim=TRUE, silent=debug, debug=debug, callFrom=fxNa), newSep="_", incl=c("anyCase","trim0"), silent=silent, debug=debug, callFrom=fxNa)
sdrfFiNa <- wrMisc::rmEnumeratorName(wrMisc::trimRedundText(txt=sdrfFiNa, spaceElim=TRUE, silent=debug, debug=debug, callFrom=fxNa), newSep="_", incl=c("anyCase","trim0"), silent=silent, debug=debug, callFrom=fxNa)
chFi <- match(sumDaFiNa, sdrfFiNa) # new order
if(debug) { message(fxNa,"rSM6aa dim sdrfDat ",nrow(sdrfDat)," ",ncol(sdrfDat))}
}
rmRaw <- TRUE
} else rmRaw <- FALSE
if(sum(is.na(chFi)) >0) { warning(fxNa,"UNABLE to match all filenames from sdrf and ",basename(.corPathW(suplAnnotFile)),
" ! \n ++ BEWARE : Grouping of replicates may be incorrect !! \n")
} else {
## Adjust order of sdrf to data (summaryD)
if(!silent && rmRaw) message(fxNa,"Note : Some filenames contain '.raw', others do NOT; solved inconsistency ..")
iniSdrfOrder <- (1:nrow(sdrfDat))[chFi]
sdrfDat <- sdrfDat[chFi,]
if(!silent) message(fxNa,"Successfully adjusted order of sdrf to content of ",basename(.corPathW(suplAnnotFile)))
}
syncColumns["sdrfDat"] <- TRUE
} else if(!silent) message(fxNa, if(debug) "rSM6a "," summaryD exists, but unable to find file-names")
if(debug) { message(fxNa,"rSM6a1 dim sdrfDat ",nrow(sdrfDat)," ",ncol(sdrfDat)); rSM6a1 <- list(sdrf=sdrf,sdrfDat=sdrfDat,abund=abund,suplAnnotFile=suplAnnotFile,groupPref=groupPref,quantMeth=quantMeth,abund=abund,summaryD=summaryD,parametersD=parametersD,syncColumns=syncColumns,iniSdrfOrder=iniSdrfOrder) }
} else { # no summaryD, try colnames of abund
if(length(abund) >0 && length(dim(abund)) >1) { ## valid abund
if(debug) { message(fxNa,"rSM6b dim sdrfDat ",nrow(sdrfDat)," ",ncol(sdrfDat)); rSM6b <- list(sdrf=sdrf,sdrfDat=sdrfDat,abund=abund,uplAnnotFile=suplAnnotFile,quantMeth=quantMeth,abund=abund,summaryD=summaryD,parametersD=parametersD,syncColumns=syncColumns) }
## requires utils::packageVersion("wrMisc") > "1.11.1"
sdrfDaIni <- sdrfDat
#old#sdrfDat <- wrMisc::matchMatrixLinesToRef(mat=cbind(sdrfDat, iniSdrfOrd=1:nrow(sdrfDat)), ref=colnames(abund), addRef=TRUE, exclCol=ncol(sdrfDat)+1, silent=silent, debug=debug, callFrom=fxNa) # 2way-grep
sdrfDat <- wrMisc::matchMatrixLinesToRef(mat=cbind(sdrfDat, iniSdrfOrd=1:nrow(sdrfDat)), ref=colnames(abund), addRef=TRUE, silent=silent, debug=debug, callFrom=fxNa) # 2way-grep
## check matching ?
if(length(sdrfDat) <1) { ## failed to align - further trim names
if(debug) { message(fxNa,"Failed to align - further trim names rSM6a3 ")}
## now look for bad separator '.' before text and remove
colNaAbund <- colnames(abund)
ch1 <- grep("[[:digit:]]\\.[[:alpha:]]", colNaAbund)
if(any(ch1)) {
selLoc <- sapply(gregexpr("[[:digit:]]\\.[[:alpha:]]", colNaAbund[ch1]), function(x) x[[1]])
colNaAbund[ch1] <- paste0(substr(colNaAbund[ch1],1,selLoc), substring(colNaAbund[ch1], selLoc+2)) }
sdrfDat <- wrMisc::matchMatrixLinesToRef(mat=cbind(sdrfDaIni, 1:nrow(sdrfDat)), ref=colNaAbund, exclCol=ncol(sdrfDat)+1, addRef=TRUE, silent=silent, debug=debug, callFrom=fxNa) # 2way-grep
if(length(sdrfDat) <1) {
colNaEnum <- all(grepl("_[[:digit:]]+$", colNaAbund))
if(colNaEnum) { tm1 <- sub("_[[:digit:]]+$","", colNaAbund)
colNaAbund2 <- sub("\\..+","", substr(colNaAbund, 1, nchar(tm1)))
colNaAbund3 <- paste0(colNaAbund2,substring(colNaAbund, nchar(colNaAbund) -1),"$") # without repeated text after 1st '.'
## Adjust order of sdrf to data ()
sdrfDat <- wrMisc::matchMatrixLinesToRef(mat=cbind(sdrfDaIni, 1:nrow(sdrfDat)), ref=colNaAbund3, addRef=TRUE, exclCol=ncol(sdrfDat)+1, silent=silent, debug=debug, callFrom=fxNa) # 2way-grep
}
}
if(length(sdrfDat) <1 && !silent) message(fxNa,"PROBLEM : FAILED to align sdrf to actual colnames of data !!!")
}
iniSdrfOrder <- sdrfDat$iniSdrfOrd
sdrfDat <- sdrfDat[,-ncol(sdrfDat)] # why remove col 'ref' ?
if(debug) { message(fxNa,"rSM6a2 dim sdrfDat ",nrow(sdrfDat)," ",ncol(sdrfDat)); rSM6a2 <- list() }
rm(sdrfDaIni)
syncColumns["sdrfDat"] <- TRUE # really sure that synchronization successful ?
} else if(!silent) message(fxNa,"Note : NO Additional information on filenames-order found, can't correct/adjust sdrf (ie sdrfDat) !!", if(debug) " rSM6a3")
if(debug) { message(fxNa,"rSM6a4 dim sdrfDat ",nrow(sdrfDat)," ",ncol(sdrfDat)); rSM6a4 <- list(sdrf=sdrf,sdrfDat=sdrfDat,abund=abund,suplAnnotFile=suplAnnotFile,groupPref=groupPref,quantMeth=quantMeth,abund=abund,summaryD=summaryD,parametersD=parametersD,syncColumns=syncColumns,iniSdrfOrder=iniSdrfOrder) }
}
}
if(debug) { message(fxNa,"rSM6d dim sdrfDat ",nrow(sdrfDat)," ",ncol(sdrfDat)); rSM6d <- list(sdrf=sdrf,sdrfDat=sdrfDat,abund=abund,suplAnnotFile=suplAnnotFile,quantMeth=quantMeth,groupPref=groupPref,summaryD=summaryD,parametersD=parametersD,syncColumns=syncColumns,iniSdrfOrder=iniSdrfOrder,setupSd=setupSd) }
## 2.3 ready to make setupSd
chLe <- names(setupSd) %in% "lev"
if(any(chLe)) setupSd <- setupSd[-which(chLe)] # remove $lev - if existing
if(length(sdrfDat) >0 && all(dim(sdrfDat) >0)) {
## check for custom-provided sampleNames (priority)
if(length(groupPref$sampleNames) ==nrow(sdrfDat)) {
setupSd$sampleNames <- groupPref$sampleNames # custom sampleNames => use later (instead of colnames/file-names)
}
newNa <- NULL ## initialize
if(length(groupPref$gr)==1 && grepl("^sdrf", groupPref$gr[1])) {
## check for custom-provided gr (priority) wr modif 25sep24
## check for picking specific column of sdrf
if(grepl("^sdrf\\$[[:alpha:]]", groupPref$gr[1])) {
chColNa <- sub("^sdrf\\$","", groupPref$gr[1])
if(chColNa %in% colnames(sdrfDat)) {
## groupPref$gr points to specific column of sdrf
grp <- wrMisc::rmSharedWords(sdrfDat[,chColNa], sep=c("_"," ",".","=",";")) # try making more compact
chUnit <- if(isTRUE(chUnit) && all(grepl("[[:punct:]]", grp), na.rm=TRUE)) wrMisc::checkUnitPrefix(grp, if(isTRUE(chUnit)) defUnits else as.character(chUnit), stringentSearch=TRUE) else NULL
newNa <- if(length(chUnit) ==1) try(wrMisc::adjustUnitPrefix(grp, unit=chUnit[1], returnType=c("NAifInvalid"), silent=TRUE, callFrom=fxNa), silent=TRUE) else grp
if(inherits(newNa, "try-error")) {if(!silent) message(fxNa,"Failed to adjust unit-prefixes")} else grp <- newNa
setupSd$gr <- grp
setupSd$level <- .adjPat(grp)
setupSd$meth <- paste0("custom from sdrf-column '",chColNa,"'")
if(debug) { message(fxNa,"rSM6d1 dim sdrfDat ",nrow(sdrfDat)," ",ncol(sdrfDat)); rSM6d1 <- list(sdrf=sdrf,sdrfDat=sdrfDat,abund=abund,suplAnnotFile=suplAnnotFile,quantMeth=quantMeth,groupPref=groupPref,summaryD=summaryD,parametersD=parametersD,syncColumns=syncColumns,iniSdrfOrder=iniSdrfOrder,setupSd=setupSd, grp=grp,chUnit=chUnit,newNa=newNa,chColNa=chColNa) }
}
} else {
## regular minig of sdrf
groupPref$gr <- "sdrf"
}
} else {
if(length(abund) <1) message(fxNa,"Can't check if length of $gr is OK since 'abund' bot given ..")
setupSd$gr <- groupPref$gr
setupSd$level <- grp <- .adjPat(setupSd$gr)
setupSd$meth <- paste0("custom from groupPref$gr ")
}
if(debug) { message(fxNa,"rSM6d2 dim sdrfDat ",nrow(sdrfDat)," ",ncol(sdrfDat)); rSM6d2 <- list(sdrf=sdrf,sdrfDat=sdrfDat,abund=abund,suplAnnotFile=suplAnnotFile,quantMeth=quantMeth,groupPref=groupPref,summaryD=summaryD,parametersD=parametersD,syncColumns=syncColumns,iniSdrfOrder=iniSdrfOrder,setupSd=setupSd,chUnit=chUnit) }
## DEFAULT mining (if $levels and/or $sampleNames not yet set)
#if(length(setupSd$level) != ncol(abund) || length(setupSd$sampleNames) != ncol(abund)) {
if(length(setupSd$sampleNames) <1 || length(setupSd$level) <1) {
## check for column 'comment.technical.replicate.' (to exclude from using)
## look for and/or 'characteristics.spiked.compound.' ?
useSdrfCol <- 1:ncol(sdrfDat)
chTechnReplCol <- colnames(sdrfDat) %in% "comment.technical.replicate."
if(any(chTechnReplCol, na.rm=TRUE)) useSdrfCol <- useSdrfCol[-which(chTechnReplCol)] # exclude designation of technical replicates
## option : use sampleNames from sdrf-fileNames
if("sdrf" %in% groupPref$sampleNames && length(groupPref$sampleNames)==1) {
## use 1st of 'comment.data.file.' and 'comment.file.uri.'
colNa <- c("comment.data.file.", "comment.file.uri.")
chCol <- colNa %in% colnames(sdrfDat)
if(any(chCol)) { sampleNames <- sdrfDat[,which(chCol[1])]
## try to simplify (remove redundant words, try adjusting varying units)
iniNa <- wrMisc::rmSharedWords(sampleNames, sep=c(" ","_","-","/"), silent=silent, debug=debug, callFrom=fxNa)
if(!isFALSE(chUnit)) chUnit <- wrMisc::checkUnitPrefix(iniNa, unit=if(isTRUE(chUnit)) defUnits else as.character(chUnit))
if(length(chUnit) >0) {
newNa <- try(wrMisc::adjustUnitPrefix(iniNa, unit=chUnit[1], returnType=c("NAifInvalid"), silent=silent, debug=debug, callFrom=fxNa), silent=TRUE)
if(inherits(newNa, "try-error")) {if(!silent) message(fxNa,"Failed to adjust unit-prefixes")} else setupSd$sampleNames <- groupPref$sampleNames <- newNa
}
## need adjust to summaryD ??
} else groupPref <- groupPref[-1*which(names(groupPref) ==sampleNames)]
} else {
## sampleNamesnot yet set, need to pick sampleNames from other ressources !
}
}
if(debug) { message(fxNa,"rSM6e dim sdrfDat ",nrow(sdrfDat)," ",ncol(sdrfDat)); rSM6e <- list(sdrf=sdrf,sdrfDat=sdrfDat,abund=abund,suplAnnotFile=suplAnnotFile,quantMeth=quantMeth,groupPref=groupPref,summaryD=summaryD,parametersD=parametersD,syncColumns=syncColumns,iniSdrfOrder=iniSdrfOrder,setupSd=setupSd) }
setupSd$sdrfDat <- sdrfDat # useful here ?
## now determine gr (if not yet present)
if(length(setupSd$gr) < (if(length(abund) >1) ncol(abund) else 1)) {
#old# if(length(setupSd$gr) <1 || identical(groupPref$gr,"sdrf")) { # && length(dim(sdrf)) >1
replStrOpt <- c("highest","lowest","min","max","median","combAll","combNonOrth")
if(debug) {message(fxNa,if(debug)"rSM6g ","length setupSd ", length(setupSd)); rSM6g <- list(setupSd=setupSd,sdrf=sdrf,sdrfDat=sdrfDat,abund=abund,suplAnnotFile=suplAnnotFile,groupPref=groupPref,summaryD=summaryD,parametersD=parametersD,setupSdSoft=setupSdSoft,quantMeth=quantMeth) } #useSdrfCol=useSdrfCol,
## check for custom provided method for sdrf-mining : (is it risky to search in 2nd value of sdrf ?)
## 'groupPref$useCol' design custom choice of multiple columns
## 'groupPref$combMeth' designs method for choosing from automatic mining
if("useCol" %in% names(groupPref) && all(nchar(groupPref$useCol) >0)) { ## custom choice for sdrf-column
chCol <- which(colnames(sdrfDat) %in% groupPref$useCol)
} else chCol <- which(!colnames(sdrfDat) %in% c("comment.technical.replicate."))
if(length(chCol) >0) { # chCol <- sdrfDat[, useSdrfCol[which(chCol)]]
setupSdIni <- setupSd
replMeth <- "failed" # initialize
## check content of 'combMeth' ?
#old#if(!isTRUE(groupPref$lowNumberOfGroups) && isTRUE(groupPref$combMeth != "lowest")) {
if(length(chCol) >1 && isTRUE(groupPref$combMeth != "lowest")) {
## test 2 methods
tmp <- list(combNonOrth=try(wrMisc::replicateStructure(sdrfDat[,chCol], method=groupPref$combMeth, silent=silent, callFrom=fxNa, debug=debug), silent=TRUE),
lowest=try(wrMisc::replicateStructure(sdrfDat[,chCol], method="lowest", silent=silent, callFrom=fxNa, debug=debug), silent=TRUE))
ch1 <- sapply(tmp, inherits, "try-error")
if(all(ch1)) { message(fxNa,"UNABLE to understand replicate-structure from sdrf !!"); setupSd <- NULL; syncColumns["sdrfDat"] <- FALSE
} else {
## choose among multiple options for grouping (number of groups)
ch1 <- sapply(tmp, function(x) length(x$lev[which(!duplicated(x$lev))]))
lowNumberOfGroups <- if(length(groupPref$lowNumberOfGroups)==1) isTRUE(groupPref$lowNumberOfGroups) else TRUE
useSe <- if(any(ch1 ==1, na.rm=TRUE)) which(ch1 !=1) else if(lowNumberOfGroups) which.min(ch1) else which.max(ch1)
replMeth <- useSe <- useSe[1]
if(!silent) message(fxNa,"Choosing model '",names(useSe),"' for evaluating replicate-structure (ie ",ch1[useSe[1]]," groups of samples)" )
tmp <- tmp[[useSe]]
} # {message(fxNa,"REMOVING one attempt of understanding replicate-structure") }
} else {
## either just one column to mine or single specific method (eg groupPref$combMeth=="lowest"))
if(length(groupPref$combMeth)==1 && nchar(groupPref$combMeth) <1) groupPref$combMeth <- NULL
replMeth <- if(isTRUE(groupPref$lowNumberOfGroups) || isTRUE(nchar(groupPref$combMeth) ==1)) "lowest" else groupPref$combMeth
tmp <- try(wrMisc::replicateStructure(sdrfDat[,chCol], method=replMeth, silent=TRUE, callFrom=fxNa), silent=TRUE)
if(inherits(tmp, "try-error")) {syncColumns["sdrfDat"] <- FALSE; if(!silent) message(fxNa,"UNABLE to understand replicate-structure from sdrf (based on method '",replMeth,"')")}
}
if(debug) {message(fxNa,if(debug)"rSM6g2 ","length setupSd ", length(setupSd)); rSM6g2 <- list(setupSd=setupSd,setupSdIni=setupSdIni,sdrf=sdrf,sdrfDat=sdrfDat,abund=abund,suplAnnotFile=suplAnnotFile,groupPref=groupPref,summaryD=summaryD,parametersD=parametersD,setupSdSoft=setupSdSoft,quantMeth=quantMeth,tmp=tmp) } #useSdrfCol=useSdrfCol,
if(!inherits(tmp, "try-error") && length(tmp) >0) { # valid mining
if(isTRUE(tmp$col==1) && all(grepl("^Sample {0,1}[[:digit:]]+$", names(tmp$lev))) ) { # the column with sample-numbers was picked, not very informative; try to find better one
## Try finding better content/text for names of levels, ie tmp$lev
chPa <- apply(sdrfDat[,chCol], 2, .adjPat)
ch1 <- apply(chPa[,-1], 2, function(x) all(chPa[,1]==x, na.rm=TRUE))
if(any(ch1)) names(tmp$lev) <- wrMisc::rmSharedWords(sdrfDat[,chCol[which(ch1)[1] +1]], sep=c("_"," ",".","=",";"), silent=silent,callFrom=fxNa) else {
## otherwise try to find column/pattern fitting to prev hit after removing redundant text & removing enumerators
ref2 <- apply(sdrfDat[,chCol[-1]], 2, wrMisc::rmSharedWords, sep=c("_"," ",".","=",";"), silent=silent,callFrom=fxNa)
ref2 <- apply(ref2, 2, wrMisc::rmEnumeratorName, silent=TRUE)
chPa2 <- apply(ref2, 2, .adjPat)
ch1 <- apply(chPa2, 2, function(x) all(chPa[,1]==x, na.rm=TRUE))
if(any(ch1)) {names(tmp$lev) <- chPa2[,which(ch1)[1]] } # also document column used ?
}
}
if(length(abund) >0 && length(setupSdIni$sampleNames) == ncol(abund)) setupSd$sampleNames <- setupSdIni$sampleNames # priority to specified $sampleNames
setupSd$level <- tmp$lev # grouping correct, names may be not meaningful (if column picked contains 'Sample 1' etc)
if(debug) {message(fxNa,if(debug)"rSM6g3 ","length tmp$lev ", length(tmp$lev)); rSM6g3 <- list(setupSd=setupSd,sdrf=sdrf,sdrfDat=sdrfDat,abund=abund,suplAnnotFile=suplAnnotFile,
groupPref=groupPref,summaryD=summaryD,parametersD=parametersD,setupSdSoft=setupSdSoft,quantMeth=quantMeth,tmp=tmp,chUnit=chUnit) }
## optional adjusting of units to readjust levels
grp <- wrMisc::rmSharedWords(tmp$lev, sep=c("_"," ",".","=",";")) # try making more compact
if(!isFALSE(chUnit[1])) {
if(all(grepl("[[:punct:]]", grp), na.rm=TRUE) && (length(chUnit) <1 || isTRUE(chUnit))) chUnit <- wrMisc::checkUnitPrefix(grp, if(isTRUE(chUnit)) defUnits else as.character(chUnit), stringentSearch=TRUE)
if(length(chUnit) ==1 && nchar(chUnit) >0) {newNa <- try(wrMisc::adjustUnitPrefix(grp, unit=chUnit[1], returnType=c("NAifInvalid"), silent=TRUE, callFrom=fxNa), silent=TRUE)
if(inherits(newNa, "try-error")) {if(!silent) message(fxNa,"Failed to adjust unit-prefixes")} else setupSd$level <- grp <- .adjPat(newNa) }}
setupSd$col <- tmp$col
setupSd$meth <- replMeth
}
}
if(debug) {message(fxNa,if(debug)"rSM6h ","length setupSd ", length(setupSd)); rSM6h <- list(setupSd=setupSd,sdrf=sdrf,sdrfDat=sdrfDat,abund=abund,suplAnnotFile=suplAnnotFile,
groupPref=groupPref,summaryD=summaryD,parametersD=parametersD,setupSdSoft=setupSdSoft,quantMeth=quantMeth,tmp=tmp,chUnit=chUnit) }
## remove/rename $lev (normally not expected any more)
chLe <- names(setupSd) %in% "lev"
if(any(chLe)) {
if(debug) message(fxNa,"Still found setupSd$lev !! Used to replace setupSd$level")
names(setupSd$lev) <- gsub("^[[:space:]]*","", names(setupSd$lev))
setupSd$level <- setupSd$lev
setupSd <- setupSd[-which(chLe)]
}
if(debug) {message(fxNa,if(debug) "rSM6i ","length setupSd ", length(setupSd)); rSM6i <- list() }
} # finish determining setupSd$gr
if(debug) {message(fxNa,"rSM6j"); rSM6j <- list(setupSd=setupSd,sdrf=sdrf,sdrfDat=sdrfDat,abund=abund,suplAnnotFile=suplAnnotFile,groupPref=groupPref,summaryD=summaryD,parametersD=parametersD,setupSdSoft=setupSdSoft,quantMeth=quantMeth)}
if("setupSd" %in% names(setupSd)) { setupSd <- wrMisc::partUnlist(setupSd, callFrom=fxNa,debug=debug);
if(debug) message(fxNa,"rSM6j2 - not expecting list of list(s) for setupSd ! .. correcting")}
if(debug) {message(fxNa,"rSM6i names setupSd : ", wrMisc::pasteC(names(setupSd))); rSM6g <- list() }
if(!is.list(setupSd)) {setupSd <- as.list(setupSd); if(debug) message(fxNa,"rSM6i 'setupSd' should be list, but was NOT !!")}
if(!"sdrfDat" %in% names(setupSd)) setupSd$sdrfDat <- sdrfDat
if(debug) {message(fxNa, "rSM6k ")
rSM6k <- list(sdrf=sdrf,setupSd=setupSd,sdrfDat=sdrfDat,abund=abund,suplAnnotFile=suplAnnotFile,quantMeth=quantMeth,groupPref=groupPref,newNa=newNa,summaryD=summaryD,parametersD=parametersD,syncColumns=syncColumns,iniSdrfOrder=iniSdrfOrder)
}
setupSd$annotBySoft <- as.data.frame(summaryD)
setupSd$syncColumns <- syncColumns
} else { ## sdrf was given - but NOT conform : (no soft-generated sample annot available) try to match colnames of abund
if(debug) message(fxNa, if(debug) "rSM6l ","NO valid sdrf found")
## ie single source of info
if(length(summaryD) <1) { ## ie no summaryD
if(debug) message(fxNa, if(debug) "rSM6m ","NO valid sdrf and NO valid information (summaryD) from quant-software found")
} else { # ie summaryD is available
setupSd <- setupSdSoft
if(!silent) message(fxNa, if(debug) "rSM6n ","Reading of sdrf was NOT successful and no summaryD available => nothing can be done to mine experimental setup...")
}
}
if(length(iniSdrfOrder) >0) setupSd$iniSdrfOrder <- iniSdrfOrder
if(debug) { message(fxNa,"rSM7 head of setupSd$level : ",wrMisc::pasteC(utils::head(setupSd$level))); rSM7 <- list(setupSd=setupSd,sdrf=sdrf,sdrfDat=sdrfDat,suplAnnotFile=suplAnnotFile,quantMeth=quantMeth,abund=abund,summaryD=summaryD,nSamp0=nSamp0,iniSdrfOrder=iniSdrfOrder)}
if(length(setupSd) >0) if(length(setupSd$level) != nSamp0 && length(abund) >0) { ## keep this ? - redundant !
if(!silent) warning(fxNa, if(debug) "rSM7 ","Invalid information from sample meta-data or wrong experiment ! Number of samples from sdrf ",
" (",length(setupSd$level),") and from experimental data (",ncol(abund),") don't match !")
setupSd <- NULL } else {
if(length(abund) <1 && !silent) message(fxNa,"Note: Order of lines in sdrf not ajusted since no valid 'abund' given...")
}
} else { setupSd <- setupSdSoft; setupSd$annotBySoft <- summaryD }
## allow export of sdrf-draft
if(length(parametersSd) >0 && length(setupSd$sdrfDat) <1) {
setupSd$sdrfExport <- parametersSd
setupSd$summaryD <- summaryD
}
if(debug) { message(fxNa,"rSM8 head of setupSd$level : ",wrMisc::pasteC(utils::head(setupSd$level))); rSM8 <- list(setupSd=setupSd)}
}
## finished readSampleMetaData
setupSd }
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.