Nothing
#' Read Tabulated Files Exported By ProteomeDiscoverer At Protein Level
#'
#' Protein identification and quantification results from
#' \href{https://www.thermofisher.com/order/catalog/product/OPTON-30812}{Thermo ProteomeDiscoverer}
#' which were exported as tabulated text can be imported and relevant information extracted.
#'
#' @details
#' This function has been developed using Thermo ProteomeDiscoverer versions 2.2 to 2.5.
#' The format of resulting files at export also depends which columns are chosen as visible inside ProteomeDiscoverer and subsequently get chosen for export.
#' Using the argument \code{suplAnnotFile} it is possible to specify a specific file (or search for default file) to read for extracting file-names as sample-names and other experiment realted information.
#' If a column named \code{contamCol} is found, the data will be lateron filtered to remove all contaminants, set to \code{NULL} for keeping all contaminants.
#'
#' The final output is a list containing as (main) elements: \code{$annot}, \code{$raw} and optional \code{$quant},
#' or returns data.frame with entire content of file if \code{separateAnnot=FALSE}.
#'
#' This function replaces the depreciated function \code{readProtDiscovFile} which will soon be retracted from this package.
#'
#' @param fileName (character) name of file to be read
#' @param path (character) path of file to be read
#' @param normalizeMeth (character) normalization method, defaults to \code{median}, for more details see \code{\link[wrMisc]{normalizeThis}})
#' @param sampleNames (character) custom column-names for quantification data (ProteomeDiscoverer does not automatically use file-names from spectra); this argument has priority over \code{suplAnnotFile}
#' @param read0asNA (logical) decide if initial quntifications at 0 should be transformed to NA
#' @param quantCol (character or integer) define ywhich columns should be extracted as quantitation data : The argument may be the exact column-names to be used, or if length=1
#' content of \code{quantCol} will be used as pattern to search among column-names for $quant using \code{grep};
#' if \code{quantCol='allAfter_calc.pI'} all columns to the right of the column 'calc.pI' will be interpreted as quantitation data
#' (may be useful with files that have been manually edited before passing to wrProteo)
#' @param contamCol (character or integer, length=1) which columns should be used for contaminants marked by ProteomeDiscoverer.
#' If a column named \code{contamCol} is found, the data will be lateron filtered to remove all contaminants, set to \code{NULL} for keeping all contaminants
#' @param refLi (character or integer) custom specify which line of data is main species, if character (eg 'mainSpe'), the column 'SpecType' in $annot will be searched for exact match of the (single) term given
#' @param separateAnnot (logical) if \code{TRUE} output will be organized as list with \code{$annot}, \code{$abund} for initial/raw abundance values and \code{$quant} with final log2 (normalized) quantitations
#' @param annotCol (character) column names to be read/extracted for the annotation section (default c("Accession","Description","Gene","Contaminant","Sum.PEP.Score","Coverage....","X..Peptides","X..PSMs","X..Unique.Peptides", "X..AAs","MW..kDa.") )
#' @param FDRCol (list) optional indication to search for protein FDR information
#' @param specPref (character or list) define characteristic text for recognizing (main) groups of species (1st for comtaminants - will be marked as 'conta', 2nd for main species- marked as 'mainSpe',
#' and optional following ones for supplemental tags/species - maked as 'species2','species3',...);
#' if list and list-element has multiple values they will be used for exact matching of accessions (ie 2nd of argument \code{annotCol})
#' @param gr (character or factor) custom defined pattern of replicate association, will override final grouping of replicates from \code{sdrf} and/or \code{suplAnnotFile} (if provided) \code{}
#' @param sdrf (character, list or data.frame) optional extraction and adding of experimenal meta-data: if character, this may be the ID at ProteomeExchange,
#' the second & third elements may give futher indicatations for automatic organization of groups of replicates.
#' Besides, the output from \code{readSdrf} or a list from \code{defineSamples} may be provided;
#' if \code{gr} is provided, \code{gr} gets priority for grouping of replicates;
#' if \code{sdrfOrder=TRUE} the output will be put in order of sdrf
#' @param suplAnnotFile (logical or character) optional reading of supplemental files produced by ProteomeDiscoverer; however, if \code{gr} is provided, \code{gr} gets priority for grouping of replicates;
#' if \code{TRUE} defaults to file '*InputFiles.txt' (needed to match information of \code{sdrf}) which can be exported next to main quantitation results;
#' if \code{character} the respective file-name (relative or absolute path)
#' @param groupPref (list) additional parameters for interpreting meta-data to identify structure of groups (replicates), will be passed to \code{readSampleMetaData}.
#' May contain \code{lowNumberOfGroups=FALSE} for automatically choosing a rather elevated number of groups if possible (defaults to low number of groups, ie higher number of samples per group)
#' May contain \code{chUnit} (logical or character) to be passed to \code{readSampleMetaData()} for (optional) adjustig of unit-prefixes in meta-data group labels, in case multiple different unit-prefixes
#' are used (eg '100pMol' and '1nMol').
#' @param plotGraph (logical or integer) optional plot of type vioplot of initial and normalized data (using \code{normalizeMeth}); if integer, it will be passed to \code{layout} when plotting
#' @param titGraph (character) custom title to plot of distribution of quantitation values
#' @param wex (integer) relative expansion factor of the violin-plot (will be passed to \code{\link[wrGraph]{vioplotW}})
#' @param silent (logical) suppress messages
#' @param debug (logical) additional messages for debugging
#' @param callFrom (character) allow easier tracking of messages produced
#' @return This function returns a list with \code{$raw} (initial/raw abundance values), \code{$quant} with final normalized quantitations, \code{$annot}, \code{$counts} an array with number of peptides, \code{$quantNotes}
#' and \code{$notes}; or if \code{separateAnnot=FALSE} the function returns a data.frame with annotation and quantitation only
#' @seealso \code{\link[utils]{read.table}}, \code{\link[wrMisc]{normalizeThis}}) , \code{\link{readMaxQuantFile}}, \code{\link{readProlineFile}}, \code{\link{readFragpipeFile}}
#' @examples
#' path1 <- system.file("extdata", package="wrProteo")
#' fiNa <- "tinyPD_allProteins.txt.gz"
#' dataPD <- readProteomeDiscovererFile(file=fiNa, path=path1, suplAnnotFile=FALSE)
#' summary(dataPD$quant)
#'
#' @export
readProteomeDiscovererFile <- function(fileName, path=NULL, normalizeMeth="median", sampleNames=NULL, read0asNA=TRUE, quantCol="^Abundance", annotCol=NULL, contamCol="Contaminant",
refLi=NULL, separateAnnot=TRUE, FDRCol=list(c("^Protein.FDR.Confidence","High"), c("^Found.in.Sample.","High")), gr=NULL, sdrf=NULL, suplAnnotFile=TRUE,
groupPref=list(lowNumberOfGroups=TRUE, chUnit=TRUE), specPref=c(conta="CON_|LYSC_CHICK", mainSpecies="OS=Homo sapiens"), plotGraph=TRUE, wex=1.6, titGraph="Proteome Discoverer",
silent=FALSE, debug=FALSE, callFrom=NULL) {
## read ProteomeDiscoverer exported txt
fxNa <- wrMisc::.composeCallName(callFrom, newNa="readProteomeDiscovererFile")
oparMar <- graphics::par("mar") # old margins, for rest after figure
oparLayout <- graphics::par("mfcol") # old layout, for rest after figure
on.exit(graphics::par(mar=oparMar, mfcol=oparLayout)) # restore old mar settings
reqPa <- c("utils","wrMisc")
chPa <- sapply(reqPa, requireNamespace, quietly=TRUE)
if(any(!chPa)) stop("package(s) '",paste(reqPa[which(!chPa)], collapse="','"),"' not found ! Please install first from CRAN")
if(!isTRUE(silent)) silent <- FALSE
if(isTRUE(debug)) silent <- FALSE else debug <- FALSE
contamFilter <- TRUE # filter contaminants away
#excluCol <- c("^Abundance\\.Count","^Abundances\\.Count","^Abundance\\.Ratio","^Abundances\\.Ratio","^Abundance\\.Grouped","^Abundances\\.Grouped") # exclude this from quantifications columns
cleanDescription <- TRUE # clean 'Description' for artifacts of truncated text (tailing ';' etc)
infoDat <- infoFi <- setupSd <- parametersD <- NULL # initialize
.corPathW <- function(x) gsub("\\\\", "/", x)
## check if path & (tsv) file exist
if(!grepl("\\.txt$|\\.txt\\.gz$", fileName)) message(fxNa,"Trouble ahead, expecting tabulated text file (the file'",fileName,"' might not be right format) !!")
paFi <- wrMisc::checkFilePath(fileName, path, expectExt="txt", compressedOption=TRUE, stopIfNothing=TRUE, callFrom=fxNa, silent=silent, debug=debug)
## note : reading sample-setup from 'suplAnnotFile' at this place won't allow comparing if number of samples/columns corresponds to data; do after reading main data
if(debug) {message(fxNa,"rpd0 .. Ready to read", if(length(path) >0) c(" from path ",path[1])," the file ",fileName[1]);
rpd0 <- list(fileName=fileName,path=path,paFi=paFi,chPa=chPa) }
## read (main) file
## future: look for fast reading of files
tmp <- try(utils::read.delim(paFi, stringsAsFactors=FALSE), silent=TRUE)
if(length(tmp) <1 || inherits(tmp, "try-error") || length(dim(tmp)) <2) {
if(inherits(tmp, "try-error")) warning("Unable to read input file ('",paFi,"')! (check if rights to read)") else {
if(!silent) message(fxNa,"Content of file '",paFi,"' seeps empty or non-conform ! Returning NULL; check if this is really a ProteomeDiscoverer-file") }
NULL
} else {
if(debug) { message(fxNa,"rpd1 ... dims of initial data : ", nrow(tmp)," li and ",ncol(tmp)," col ")
rpd1 <- list(tmp=tmp,paFi=paFi,annotCol=annotCol,fileName=fileName) }
## locate & extract annotation
## default as R-friendly (convert standard cols later to this format)
if(length(annotCol) <1) annotCol <- c("Accession","Description","Gene","GeneName","Gene.Name","Marked.as", "Number.of.Peptides","Number.of.PSMs","Number.of.Unique.Peptides","Number.of.AAs","Coverage.in.Percent")
## also ?? "Exp.q.value.Combined","Sum.PEP.Score"
if(debug) message(fxNa,"rpd1a")
## option for future: also extract column "MarkedAs"
PSMCol <- "^Number\\.of\\.PSMs." # pattern searching tag for PSM-data (was previously .. .by.Search.Engine)
PepCol <- "^Number\\.of\\.Peptides." # pattern searching tag for Number of peptides (was preiously .. .by.Search.Engine)
## future option : lateron rename columns called as "Description" to annotCol[2]
## below use explicit colnames "Accession","Description", rename if tolower() fits
.extrCol <- function(x, mat, asIndex=FALSE, notFound=NA, silent=FALSE, fxNa=NULL) {
## integrate to wrMisc ?
## look for column 'x' in matrix, extract index or content of column
if(length(x) >1) x <- x[1]
isNum <- grepl("^[[:digit:]]+$", x)
if(isTRUE(isNum)) {
x <- as.integer(x)
if(x > ncol(mat)) stop(fxNa,"Invalid column number chosen (",x," but only ",ncol(mat)," available)")
out <- if(isTRUE(asIndex)) x else matrix(mat[,x], ncol=1, dimnames=list(rownames(mat),colnames(mat)[x]))
} else {
ch1 <- colnames(mat) %in% x # direct match
if(any(ch1)) { if(sum(ch1) >1 && !silent) message(fxNa,"Note: ",sum(ch1)," columns at direct match, using 1st")
out <- if(isTRUE(asIndex)) which(ch1)[1] else matrix(mat[,which(ch1)[1]], ncol=1, dimnames=list(rownames(mat),colnames(mat)[which(ch1)[1]]))
} else {
ch1 <- grep(x, mat) # partial match (grep)
if(length(ch1) >0) { if(length(ch1) >1 && !silent) message(fxNa,"Note: ",length(ch1)," columns found by grep, using 1st")
out <- if(isTRUE(asIndex)) which(ch1)[1] else matrix(mat[,ch1[1]], ncol=1, dimnames=list(rownames(mat),colnames(mat)[ch1[1]]))
} else {
ch1 <- grep(tolower(x), tolower(mat)) # case-tolerant grep
if(length(ch1) >0) { if(length(ch1) >1 && !silent) message(fxNa,"Note: ",length(ch1)," columns found by grep (case-independent), using 1st")
out <- if(isTRUE(asIndex)) which(ch1)[1] else matrix(mat[,ch1[1]], ncol=1, dimnames=list(rownames(mat),colnames(mat)[ch1[1]]))
} else {
if(grepl("^error|^stop", notFound)) {
msg <- c(fxNa,"Could NOT find column '",x,"' !!\n (available columns ",wrMisc::pasteC(colnames(mat), quoteC="'"),")")
stop(msg)
} else { if(is.null(notFound)) out <- NULL else {
out <- if(isTRUE(asIndex)) NA else matrix(NA, ncol=1, dimnames=list(rownames(mat),x))
if(!silent) if(grepl("^warning", notFound)) message(msg) else warning(msg)}}
}
}
}
}
out }
## check for essential colnames !
annot <- cbind(Accession=.extrCol(annotCol[1], tmp, silent=silent, fxNa=fxNa), # 1st typically 'Accession'
EntryName=NA, GeneName=NA, Description=.extrCol(annotCol[2], tmp, silent=silent, fxNa=fxNa), Species=NA, Contam=NA) #, iniDescription=NA)
## address different naming/writing of GeneName
GNco <- colnames(tmp) %in% c("Gene.Name","GeneName")
if(any(GNco)) { annot[,"GeneName"] <- sub(" +$","",tmp[,GNco]) # also remove tailing space
annotCol <- annotCol[-which(annotCol %in% c("Gene.Name","GeneName"))]
}
## add other cols form annotCol
if(debug) { message(fxNa,"rpd1b "); rpd1b <- list(tmp=tmp,annot=annot,paFi=paFi,annotCol=annotCol,fileName=fileName) }
#notAnnCol <- c( "Accession","Description","Gene","Number.of.Peptides","Number.of.PSMs","Number.of.Unique.Peptides")
if(length(annotCol) >2) { chSupCol <- match(annotCol[-(1:2)], colnames(tmp))
if(sum(is.na(chSupCol)) < length(annotCol) -2) annot <- cbind(annot, tmp[,wrMisc::naOmit(chSupCol)])
if(!silent) message(fxNa,"Adding supl annotation-columns ",wrMisc::pasteC(annotCol[wrMisc::naOmit(chSupCol)], quoteC="'")) }
## note 'EntryName' (eg 'UBE2C_HUMAN' not avail in PD)
## note 'GeneName' (eg 'UBE2C' can be extracted out of 'Description' after GN=
## extract GN
if(debug) { message(fxNa,"rpd1c "); rpd1c <- list() }
## add more annot cols
## check for R-friendly
Rfriendly <- !(length(grep("^X\\.\\.", colnames(tmp))) >0 || length(grep("Abundance\\.\\.F[[:digit:]]+\\.\\.", colnames(tmp)) >0))
annotColNo <- NULL # initialize for message rpd2
if(length(annotCol) >2) {
if(is.numeric(annotCol)) { message(fxNa,"Extraction by column index not yet finished")
} else {
annotColNo <- match(annotCol[3:length(annotCol)], colnames(tmp))
if(!Rfriendly) {
## assume that annotation is in first 19 columns (example until 16) !!
maxNCol <- min(ncol(tmp), 19)
## covert standard output to R-friendly
colnames(tmp[1:maxNCol]) <- sub("^X\\.\\.","Number.of.", colnames(tmp[1:maxNCol])) # sub("\\.\\.\\.$",".in.Percent",
## define table of specific terms to substitute ..
subst=cbind(std=c("Exp.q.value.Combined","Coverage....","MW..kDa.","calc..pI"),
rfriendly=c("Exp..q.value..Combined","Coverage.in.Percent","MW.in.kDa","calc.pI"))
chSubst <- match(subst[,1], colnames(tmp)[1:maxNCol])
if(any(!is.na(chSubst))) colnames(tmp)[wrMisc::naOmit(chSubst)] <- subst[wrMisc::naOmit(match(colnames(tmp)[1:maxNCol], subst[,1])), 2]
}
## extract contam separately & remove from annotCol
if(contamCol %in% colnames(tmp)) annot[,"Contam"] <- as.logical(gsub(" ","",tmp[,contamCol]))
annotCol <- annotCol[-which(annotCol %in% contamCol)]
}
}
if(length(annotCol) >2) {
annotColNo <- wrMisc::naOmit(match(annotCol[-(1:2)], colnames(tmp)))
if(length(annotColNo) >0) annot <- cbind(annot, tmp[,annotColNo])
}
if(debug) { message(fxNa,"rpd2 .. annotColNo : ", wrMisc::pasteC(annotColNo)) #," contamCol : ",wrMisc::pasteC(contamCol)," ")
rpd2 <- list(tmp=tmp,annot=annot,annotCol=annotCol,fileName=fileName,Rfriendly=Rfriendly)} # PSMCol=PSMCol,PepCol=PepCol,
## extract GN from 'Description' (like ...GN=LIPK )
chGN <- grep(" GN=[[:alpha:]]", annot[,"Description"])
if(length(chGN) >0) annot[chGN,"GeneName"] <- sub(" [[:upper:]]{2}=.*","", sub(".* GN=","", annot[chGN,"Description"]))
## extract Species from 'Description'
chSpe <- grep(" OS=[[:alpha:]]", annot[,"Description"])
if(length(chSpe) >0) { annot[chSpe,"Species"] <- sub(" [[:upper:]]{2}=.*","", sub(".* OS=","", annot[chSpe,"Description"]))
chStr <- grep("^[[:upper:]][[:lower:]]+ [[:lower:]]+ \\(.", annot[chSpe,"Species"]) # check for add'l strain name
if(length(chStr) >0) annot[chSpe[chStr],"Species"] <- substr(annot[chSpe[chStr],"Species"], 1, nchar(annot[chSpe[chStr],"Species"]) -2 -nchar(sub("^[[:upper:]][[:lower:]]+ [[:lower:]]+ \\(","", annot[chSpe[chStr],"Species"] ))) # remove strain no
}
## clean 'Description' entries: remove tailing punctuation or open brackets (ie not closed) at end of (truncated) fasta header
if(cleanDescription) {
if(debug) { message(fxNa,"rpd3a"); rpd3a <- list() }
chD <- grep(" [[:upper:]]{1,2}=", annot[,"Description"])
if(length(chD) >0) annot[chD,"Description"] <- sub(" [[:upper:]]{1,2}=.*","", annot[chD,"Description"]) # remove all add'l fields (eg OS=... OX=... GN=...)
}
if(debug) {message(fxNa,"rpd4 .. dim annot: ", nrow(annot)," li and ",ncol(annot)," cols; colnames : ",wrMisc::pasteC(colnames(annot))," ");
rpd4 <- list(annot=annot,tmp=tmp,specPref=specPref,annotCol=annotCol,Rfriendly=Rfriendly,contamCol=contamCol)}
## identify user defined (main) groups of proteins (eg species, function, etc), create column 'SpecType'
if(length(specPref) >0) {
annot <- .extrSpecPref(specPref=specPref, annot=annot, useColumn=c("Species","EntryName","GeneName","Accession","Marked.as"), suplInp=tmp, silent=silent, debug=debug, callFrom=fxNa) #"Majority.protein.IDs","Fasta.headers",
chCont <- grep("^conta$|^contaminant", tolower(annot[,"SpecType"]))
if(length(chCont) >0) {
if(length(chCont)==nrow(tmp)) {warning(fxNa,"All proteins/peptides were marked as contaminants based on 'specPref' and will be removed ! Trouble ahead ?")
} else if(!silent) message(fxNa,"Note : ",length(chCont)," proteins/peptide(s) were marked (in addition) as contaminants based on 'specPref'")
annot[chCont,"Contam"] <- TRUE
}
}
if(debug) {message(fxNa,"rpd5"); rpd5 <- list(annot=annot,tmp=tmp,specPref=specPref,annotCol=annotCol,Rfriendly=Rfriendly,contamCol=contamCol)}
## filter/remove contaminants unless in SpecTy
if(any(as.logical(annot[,"Contam"]), na.rm=TRUE)) { # filter contam
filt1 <- if("SpecType" %in% colnames(annot)) {
which( !as.logical(annot[,"Contam"]) | !grepl("^conta", tolower(annot[,"SpecType"]))) ## do not filter away proteins/peptides anotated by SpecTy (exept maked as 'conta')
} else which(!as.logical(annot[,"Contam"]))
if(length(filt1)==0) {
warning(fxNa,"All lines will be removed based on contaminant-filter (column '",contamCol,"') - TROUBLE AHEAD ?")
} else if(!silent) message(fxNa,"Note : ",nrow(tmp) - length(filt1)," contaminant protein(s)/peptide(s) will be removed, ",length(filt1)," remain")
tmp <- tmp[filt1,]
annot <- annot[filt1,]
}
if(debug) {message(fxNa,"rpd6b .. "); rpd6b <- list()}
## report by species
if(!silent) { chSp <- is.na(annot[,"Species"])
if(!all(chSp)) { tab <- table(annot[,"Species"])
if(any(chSp)) tab <- c(tab, na=sum(chSp))
tab <- rbind(names(tab),": ",tab," ; ")
tab[length(tab)] <- "" # no separator at last position
message(fxNa,"Count by 'Species' : ",apply(tab, 2, paste)) }} # all lines assigned
if(debug) {message(fxNa,"rpd7 .. "); rpd7 <- list(annot=annot,specPref=specPref,chSp=chSp,tmp=tmp,quantCol=quantCol )}
extraQuantColCheck <- TRUE # avoid 'Abundances.Count.' and 'Abundances.Normalized.' when pattern searching for quant-columns
## locate & extract abundance/quantitation data
msg <- " CANNOT find ANY quantification columns"
if(length(quantCol) <1) {
## default pattern search (for abundance/quantitation data)
quantCol <- "^Abundance"
if(debug) message(fxNa,"Setting argument 'quantCol' to '^Abundance'")
}
## construct quantColInd for index to use
if(length(quantCol) >1) {
## explicit columns (for abundance/quantitation data)
ch1 <- match(quantCol, colnames(tmp)) # look for direct match (rarley ok)
chNA <- is.na(ch1)
if(debug) {message(fxNa,"rpd7a .. "); rpd7a <- list(annot=annot,specPref=specPref,chSp=chSp,tmp=tmp,quantCol=quantCol,ch1=ch1,chNA=chNA )}
if(any(chNA)) {
ch2 <- chNA & grepl("^[[:digit:]]", quantCol)
if(any(ch2)) { quantCol[which(ch2)] <- paste0("X", quantCol[which(ch2)])
ch1 <- match(quantCol, colnames(tmp)) # (update) look for direct match (rarley ok) x
chNA <- is.na(ch1)
if(debug) message(fxNa,"Trying to recuperate ",sum(ch2)," quantCol-names due to start with digits")}
}
if(all(chNA)) { # no direct match found
ch1 <- match(paste0(quantCol,".F.Sample"), sub("\\.F[[:digit:]]+\\.Sample",".F.Sample",colnames(tmp))) # match to simplified "Cancer01.F.Sample" instead of "Cancer01.F1.Sample"
chNA <- is.na(ch1)
if(all(chNA)) { # no composed match found
## run grep on each indiv quantCol
message(fxNa,"grep on each indiv quantCol - Not yet developed")
## develop further ?
if(debug) {message(fxNa,"rpd7b .. "); rpd7b <- list(annot=annot,specPref=specPref,chSp=chSp,tmp=tmp,quantCol=quantCol,ch1=ch1,chNA=chNA )}
stop(fxNa,"None of samples found !!")
}
}
if(any(chNA)) { message(fxNa,"Note : ",sum(chNA)," out of ",length(chNA)," samples NOT found !")
ch1 <- ch1[which(!chNA)]
}
quantColInd <- ch1
} else { ## single value => use as search pattern
if(debug) {message(fxNa,"rpd7c"); rpd7c <- list(annot=annot,specPref=specPref,chSp=chSp,tmp=tmp,quantCol=quantCol) }
if(identical("allAfter_calc.pI", quantCol)) {
calcPIcol <- c("calc.pI","calc..pI","calc pI","calc_pI") # possible variations of 'calc.pI'
chCol <- calcPIcol %in% colnames(tmp)
endCol <- c("Number.of.Protein.Groups","Modifications")
chEnd <- colnames(tmp) %in% endCol
lastCol <- if(any(chEnd)) min(which(chEnd)) -1 else ncol(tmp)
if(any(chCol, na.rm=TRUE)) {
quantColInd <- (which(colnames(tmp)==calcPIcol[which(chCol)[1]]) +1) : lastCol
} else {quantColInd <- NULL; stop(fxNa,"Cannot find column called 'calc_pI' (or 'cal.pI); don't know which columns to choose as quantitation data !")}
} else {
## need to avoid "Abundances.Normalized.F1.Sample" or 'Abundances.Count.*'
ch1 <- grep(paste0(if(substr(quantCol,1,1) !="^") "^",quantCol,"\\.F[[:digit:]]+\\.Sample"), colnames(tmp)) # construct pattern code to "Abundance.F1.Sample"
if(length(ch1) <1) {
ch1 <- grep(paste0(if(substr(quantCol,1,1) !="^") "^",quantCol,"s\\.F[[:digit:]]+\\.Sample"), colnames(tmp)) # construct pattern code to "Abundances.F1.Sample"
if(length(ch1) <1) {
ch1 <- grep(paste0(if(substr(quantCol,1,1) !="^") "^",quantCol,"s{0,1}\\.F[[:digit:]]"), colnames(tmp)) # construct pattern code to "Abundances.F1"
## challange : one may pick too many columns (type "Abundances.Normalized.F1.Sample" or 'Abundances.Count.*')
if(length(ch1) <1) {
qC1 <- paste0(sub("\\\\.","", sub("S$","", quantCol)),"\\.S")
ch1 <- grep(qC1, colnames(tmp)) # construct pattern code to "Abundance.S"
excluPat <- "\\.Normalized|\\.Count|\\.Ratio|\\.Grouped"
if(length(ch1) >0) { ch2 <- grepl(excluPat, colnames(tmp)[ch1]) # avoid '.Normalized' (if not concerning all)
if(length(ch2) >0 && !all(ch2)) ch1 <- ch1[which(!ch2)]
} else {
ch1 <- grep(quantCol, colnames(tmp)) # use directly as pattern (may find too many)
if(length(ch1) >0) { ch2 <- grepl(excluPat, colnames(tmp)[ch1]) # avoid '.Normalized' (if not concerning all)
if(length(ch2) >0 && !all(ch2)) ch1 <- ch1[which(!ch2)]
} else {
warning(fxNa,"Unable to find any matches to '",quantCol,"' !") }
}
}
}
if(debug) message(fxNa,"Found ",length(ch1)," quantitation-columns", if(length(ch1) >0) c(" (eg ",wrMisc::pasteC(colnames(tmp)[utils::head(ch1)], quoteC="'"),")"))
}
quantColInd <- ch1
}
}
if(length(quantColInd) <1) stop(msg," ('",quantCol,"') NOT FOUND !")
## look for columns endig with 'intensity' ?
quantCol <- quantColInd
## extract quantitation
if(length(quantCol) >0) { abund <- if(length(quantCol) >1) tmp[,quantCol] else {
matrix(tmp[,quantCol], ncol=1, dimnames=list(rownames(tmp),NULL))}} # how to know column-name if single sample ?
#rownames(abund) <- wrMisc::correctToUnique(pepSeq, silent=silent, callFrom=fxNa)
if(debug) {message(fxNa,"rpd8 .. "); rpd8 <- list(annot=annot,specPref=specPref,abund=abund,quantCol=quantCol)}
## check & clean abudances
chNorm <- grep("\\.Normalized\\.", colnames(abund))
if(length(chNorm)*2 == ncol(abund)) { # in case Normalized makes 1/2 of columns use non-normalized
abund <- abund[,-chNorm]
}
colnames(abund) <- sub("^Abundances\\.Normalized\\._{0,1}|^abundances\\.Normalized\\._{0,1}|^Abundances{0,1}_{0,1}|^abundances{0,1}_{0,1}","",colnames(abund))
chNum <- is.numeric(abund)
if(!chNum) {abund <- apply(tmp[,quantCol], 2, wrMisc::convToNum, convert="allChar", silent=silent, callFrom=fxNa)}
## remove heading 'X..' from headers (only if header won't get duplicated
chXCol <- grep("^X\\.\\.",colnames(annot))
if(length(chXCol) >0) {
newNa <- sub("^X\\.\\.","",colnames(annot)[chXCol])
chDu <- duplicated(c(newNa, colnames(annot)), fromLast=TRUE)
if(any(chDu, na.rm=TRUE)) newNa[which(chDu)] <- colnames(annot)[chXCol][which(chDu)]
colnames(annot)[chXCol] <- newNa }
## remove heading/tailing spaces (first look which columns might be subject to this treatment)
ch1 <- list(A=grep("^ +",annot[1,]), B=grep("^ +",annot[2,]), C=grep("^ +",annot[floor(mean(nrow(annot))),]), D=grep("^ +",annot[nrow(annot),]) )
chCo <- unique(unlist(ch1))
annot[,chCo] <- sub("^ +","",sub(" +$","",annot[,chCo])) # remove heading/tailing spaces
if(debug) { message(fxNa,"rpd9 .. dim annot ",nrow(annot)," and ",ncol(annot)); rpd9 <- list(annot=annot,tmp=tmp,abund=abund,sampleNames=sampleNames,specPref=specPref,annotCol=annotCol,Rfriendly=Rfriendly,contamCol=contamCol,PSMCol=PSMCol,PepCol=PepCol,infoDat=infoDat) }
## add custom sample names (if provided)
if(identical("sdrf",sampleNames)) {specPref$sampleNames <- sampleNames; sampleNames <- NULL}
if(length(sampleNames) ==ncol(abund) && ncol(abund) >0) {
if(debug) { message(fxNa,"rpd9b") }
if(length(unique(sampleNames)) < length(sampleNames)) {
if(!silent) message(fxNa,"Custom sample names not unique, correcting to unique")
sampleNames <- wrMisc::correctToUnique(sampleNames, callFrom=fxNa) }
colnames(abund) <- sampleNames
if(debug) { message(fxNa,"rpd9c") }
} else {
colnames(abund) <- sub("Abundance\\.F[[:digit:]]+\\.Sample\\.|Abundances\\.F[[:digit:]]+\\.Sample\\.","Sample.", colnames(abund))
}
## (optional) filter by FDR (so far use 1st of list where matches are found from argument FDRCol)
if(length(FDRCol) >0) {
## stand : "Found.in.Sample...S32..F32..Sample" Rfriendly : "Found.in.Sample.in.S33.F33.Sample"
## stand : "X..Protein.Groups" Rfriendly : "Number.of.Protein.Groups"
chFDR <- lapply(FDRCol, function(x) {z <- grep(x[1], colnames(tmp)); if(length(z) ==ncol(abund)) z else NULL})
names(chFDR) <- sapply(FDRCol, function(x) x[1])
chFDR <- chFDR[which(sapply(chFDR, length) >0)]
if(length(chFDR) >0) {
i <- 1 # so far just use 1st instance matching
searchFor <- FDRCol[[which(sapply(FDRCol, function(x) x[1]) %in% names(chFDR)[i])]]
filtFdrHi <- tmp[,chFDR[[i]]] == searchFor[2] # find occurances of best tag 'High'
roSu <- rowSums(filtFdrHi) <1
if(all(roSu) && !silent) message(fxNa,"NONE of the lines/proteins had any '",searchFor[1],"' in column(s) '",searchFor[2],"' !! This is probably not a good filtering-parameter, ignoring")
if(any(roSu, na.rm=TRUE) && !all(roSu)) { if(!silent) message(fxNa,"Removing ",sum(roSu)," lines/proteins without ANY '",searchFor[2],"' in columns '",searchFor[1],"'")
rmLi <- -1*which(roSu)
annot <- annot[rmLi,]
abund <- abund[rmLi,]
filtFdrHi <- filtFdrHi[rmLi,] # useful lateron ?
tmp <- tmp[rmLi,] }
}
}
if(debug) { message(fxNa,"rpd11 .. length(FDRCol) ",length(FDRCol)," dim annot ",nrow(annot)," and ",ncol(annot))}
## rownames : check if Accession is unique
chAc <- duplicated(annot[,"Accession"], fromLast=FALSE)
if(any(chAc, na.rm=TRUE)) {
getLiToRemove <- function(x,useCol=c("rowNo","Contaminant","SpecType")) { # return index for all lines to remove from matrix ...
if(is.data.frame(x)) x <- as.matrix(x)
spe <- grep("^species", x[,useCol[3]])
if(length(spe) >0) {
rmLi <- x[which(1:nrow(x) != spe[1]), useCol[1]]
} else { ## look for any lines marked as Contaminant="true", then mark other(s) for remove
rmLi <- if(any(tolower(x[,useCol[2]])=="true", na.rm=TRUE)) x[which(tolower(x[,useCol[2]]) !="true") ,useCol[1]] }
as.integer(rmLi) }
## check if one of duplicated lines is marked as Contaminant -> remove non-contaminant, BUT NOT 'speciesX' ?
if(contamFilter) { # ready to correct (if possible) duplicated 'Accession' entries
## elaborate procedure for removing duplicate Accession lines : 'fuse' annot where no NA & use quantification-line with fewest NAs
## need to separate all groups of repeated IDs & treat separately
annot <- cbind(annot, rowNo=1:nrow(tmp))
duplAc <- unique(annot[which(chAc), "Accession"])
## need to remove duplicated lines which are not marked as Contaminant="True"
chAc2 <- duplicated(annot[,"Accession"], fromLast=TRUE)
rmLi <- chAc | chAc2
## find lines where is not Contaminant="True" (and keep contaminant)
annot <- cbind(annot, iniIndex=1:nrow(annot), nNA=rowSums(is.na(abund)))
useCol2 <- c("Accession","GeneName","Species","Contam","SpecType","Description","Contaminant", "iniIndex","nNA") # the last 2 are added within function
useCol2 <- wrMisc::naOmit(match(useCol2,colnames(annot)))
abund <- cbind(abund,iniIndex=1:nrow(abund))
rmAbund <- as.integer(unlist(by(abund[which(rmLi),], annot[which(rmLi),"Accession"], function(x) x[(1:nrow(x))[-which.min(rowSums(is.na(x)))],ncol(x)])))
rmAnnot2 <- as.integer(unlist(by(annot[which(rmLi),], annot[which(rmLi),"Accession"], function(x) x[2:nrow(x),ncol(x) -1])))
rmAnnot <- which(chAc)
for(j in unique(annot[which(rmLi),useCol2][1])) { # need loop for 'fusing' columns with fewes NAs and recording which lines should be removed
x <- annot[which(annot[,"Accession"] %in% j),]
useLi <- apply(0+is.na(x),2,which.min)
if(any(useLi >1, na.rm=TRUE)) for(i in 2:max(useLi)) annot[as.integer(x[1,"iniIndex"]),which(useLi==i)] <- annot[as.integer(x[i,"iniIndex"]),which(useLi==i)]
}
if(length(rmAnnot) >0) {annot <- annot[-rmAnnot,]; tmp <- tmp[-rmAnnot,]
abund <- abund[-rmAbund,]
if(!silent) message(fxNa,"Removing ",length(rmAnnot)," lines due to duplicated Accessions (typically due to contaminants)")
}
annot <- annot[,-ncol(annot) +(1:0)] # remove extra columns (ie "iniIndex","nNA")
abund <- abund[,-ncol(abund)] # remove extra column (ie "iniIndex")
chAc <- duplicated(annot[,"Accession"], fromLast=FALSE)
} }
if(debug) { message(fxNa,"rpd11b .. dim abund ",nrow(abund)," and ",ncol(abund)); rpd11b <- list()}
## Now we are ready to add unique rownames
if(any(chAc, na.rm=TRUE)) {
if(!silent) message(fxNa,sum(chAc)," (out of ",length(chAc),") cases of duplicated 'Accession' exist, adding extensions for use as rownames")
rownames(tmp) <- rownames(annot) <- wrMisc::correctToUnique(annot[,"Accession"], sep="_", atEnd=TRUE, callFrom=fxNa)
} else rownames(abund) <- rownames(annot) <- annot[,"Accession"]
## optional/additional counting results (PSM, no of peptides)
PSMCol <- if(length(PSMCol) ==1) grep(PSMCol, colnames(tmp)) else NULL
PepCol <- if(length(PepCol) ==1) grep(PepCol, colnames(tmp)) else NULL
usTy <- c("PSM","NoOfPeptides")[which(c(length(PSMCol), length(PepCol)) ==ncol(abund))]
if(length(usTy) >0) {
counts <- array(NA,dim=c(nrow(abund), ncol(abund), min(1, length(usTy))), dimnames=list(rownames(abund), colnames(abund),usTy))
if("PSM" %in% usTy) counts[,,"PSM"] <- as.matrix(tmp[,PSMCol])
if("NoOfPeptides" %in% usTy) counts[,,"NoOfPeptides"] <- as.matrix(tmp[,PepCol])
} else counts <- NULL
if(debug) {message(fxNa,"rPD12 .. "); rPD12 <- list(annot=annot,tmp=tmp,abund=abund,sampleNames=sampleNames,specPref=specPref,annotCol=annotCol,refLi=refLi,Rfriendly=Rfriendly,contamCol=contamCol,PSMCol=PSMCol,PepCol=PepCol,counts=counts,infoDat=infoDat)}
## check for reference for normalization
refLiIni <- refLi
if(is.character(refLi) && length(refLi)==1) {
refLi <- which(annot[,"SpecType"]==refLi)
if(length(refLi) <1 && identical(refLiIni, "mainSpe")) refLi <- which(annot[,"SpecType"] =="mainSpecies") # fix compatibility problem 'mainSpe' to 'mainSpecies'
if(length(refLi) <1 ) { refLi <- 1:nrow(abund)
if(!silent) message(fxNa,"Could not find any proteins matching argument 'refLi=",refLiIni,"', ignoring ...")
} else {
if(!silent) message(fxNa,"Normalize using (custom) subset of ",length(refLi)," lines specified as '",refLiIni,"'")}} # may be "mainSpe"
## take log2 & normalize
quant <- try(wrMisc::normalizeThis(log2(abund), method=normalizeMeth, mode="additive", refLines=refLi, silent=silent, debug=debug, callFrom=fxNa), silent=TRUE)
if(inherits(quant, "try-error")) quant <- NULL
if(debug) { message(fxNa,"rPD13 .. dim quant: ", nrow(quant)," li and ",ncol(quant)," cols; colnames : ",wrMisc::pasteC(colnames(quant))," ");
rPD13 <- list(annot=annot,tmp=tmp,abund=abund,quant=quant,sampleNames=sampleNames,normalizeMeth=normalizeMeth,specPref=specPref,annotCol=annotCol,Rfriendly=Rfriendly,contamCol=contamCol,PSMCol=PSMCol,PepCol=PepCol,counts=counts,infoDat=infoDat, refLi=refLi,suplAnnotFile=suplAnnotFile,sdrf=sdrf)}
### GROUPING OF REPLICATES AND SAMPLE META-DATA
if(length(suplAnnotFile) >0 || length(sdrf) >0) {
if(length(sampleNames) %in% c(1, ncol(abund))) groupPref$sampleNames <- sampleNames
if(length(gr) %in% c(1, ncol(abund))) groupPref$gr <- gr
if("sampleNames" %in% names(specPref)) groupPref$sampleNames <- specPref$sampleNames
setupSd <- readSampleMetaData(sdrf=sdrf, suplAnnotFile=suplAnnotFile, quantMeth="PD", path=path, abund=utils::head(quant), chUnit=isTRUE(groupPref$chUnit), groupPref=groupPref, silent=silent, debug=debug, callFrom=fxNa)
}
if(debug) {message(fxNa,"rPD13b .."); rPD13b <- list(annot=annot,tmp=tmp,abund=abund,suplAnnotFile=suplAnnotFile,quant=quant,sampleNames=sampleNames,specPref=specPref,annotCol=annotCol,Rfriendly=Rfriendly,contamCol=contamCol,PSMCol=PSMCol,PepCol=PepCol,counts=counts,infoDat=infoDat, refLi=refLi,setupSd=setupSd, gr=gr)}
## finish groups of replicates & annotation setupSd
setupSd <- .checkSetupGroups(abund=abund, setupSd=setupSd, gr=gr, sampleNames=sampleNames, quantMeth="PD", silent=silent, debug=debug, callFrom=fxNa)
if(debug) {message(fxNa,"rPD13c .."); rPD13c <- list(annot=annot,tmp=tmp,abund=abund,quant=quant,gr=gr,sdrf=sdrf,setupSd=setupSd,sampleNames=sampleNames,specPref=specPref,annotCol=annotCol,Rfriendly=Rfriendly,contamCol=contamCol,PSMCol=PSMCol,PepCol=PepCol,counts=counts,infoDat=infoDat, refLi=refLi)}
## harmonize sample-names/1
colNa <- if(length(setupSd$sampleNames)==ncol(abund)) setupSd$sampleNames else setupSd$groups
### begin new
## option : choose (re-)naming of levels & columns on most redundance
if(length(setupSd$sampleNames)==ncol(quant) && length(setupSd$sampleNaSdrf)==ncol(quant) && any(duplicated(setupSd$level))) {
## check if setupSd$sampleNaSdrf or setupSd$sampleNames contain better info (some info on replicates)
chRed1 <- sum(duplicated(sub("(_|\\.| )[[:digit:]]+.*","", setupSd$sampleNaSdrf)), na.rm=TRUE)
chRed2 <- sum(duplicated(sub("(_|\\.| )[[:digit:]]+.*","", setupSd$sampleNames)), na.rm=TRUE)
if(chRed1 <1) chRed1 <- sum(duplicated(sub("[[:digit:]]+$","", setupSd$sampleNaSdrf)), na.rm=TRUE) # remove terminal digits
if(chRed2 <1) chRed2 <- sum(duplicated(sub("[[:digit:]]+$","", setupSd$sampleNames)), na.rm=TRUE) # remove terminal digits
if(any(c(chRed1,chRed2) >0)) {
if(chRed2 < chRed1) { # use info for levels depending on where more
colNa <- colnames(abund) <- setupSd$sampleNames <- setupSd$sampleNaSdrf ## take sample names from sdrf via setupSd$sampleNaSdrf
} else {
colNa <- colnames(abund) <- setupSd$sampleNames } ## take sample names from sdrf via setupSd$sampleNaSdrf
if(length(quant) >0) colnames(quant) <- colNa }
#setupSd$level
}
if(debug) {message(fxNa,"rPD13d .."); rPD13d <- list(annot=annot,tmp=tmp,abund=abund,quant=quant,gr=gr,sdrf=sdrf,setupSd=setupSd,sampleNames=sampleNames,specPref=specPref,annotCol=annotCol,Rfriendly=Rfriendly,contamCol=contamCol,PSMCol=PSMCol,PepCol=PepCol,counts=counts,infoDat=infoDat, refLi=refLi)}
## option : set order of samples as (init) sdrf
if(length(quant) >0 && "sdrfOrder" %in% names(sdrf) && isTRUE(as.logical(sdrf["sdrfOrder"])) && length(setupSd$iniSdrfOrder)==ncol(abund) && ncol(abund) >1) { # set order according to sdrf (only if >1 samples)
nOrd <- order(setupSd$iniSdrfOrder)
abund <- abund[,nOrd]
if(length(quant) >0) quant <- quant[,nOrd]
#setupSd$level <- setupSd$level[nOrd]
## rename columns according to sdrf and set order of quant and abund ..
## now adapt order of setupSd, incl init Sdrf
if(length(setupSd) >0) {
is2dim <- sapply(setupSd, function(x,le) length(dim(x))==2 && nrow(x)==le, le=length(nOrd)) # look for matr or df to change order of lines
if(any(is2dim) >0) for(i in which(is2dim)) setupSd[[i]] <- setupSd[[i]][nOrd,]
isVe <- sapply(setupSd, function(x,le) length(x)==le && length(dim(x)) <1, le=length(nOrd)) # look for vector to change order in setupSd
if(any(isVe) >0) for(i in which(isVe)) setupSd[[i]] <- setupSd[[i]][nOrd] }
gr <- gr[nOrd]
colNa <- colNa[nOrd]
if(length(counts) >0 && length(dim(counts))==3) counts <- array(counts[,nOrd,], dim=c(nrow(counts), length(nOrd), dim(counts)[3]),
dimnames=list(rownames(counts), colnames(counts)[nOrd], dimnames(counts)[[3]]))
if(debug) {message(fxNa,"rPD13e .."); rPD13e <- list(path=path,chPa=chPa,tmp=tmp,groupPref=groupPref,quantCol=quantCol,abund=abund,chNum=chNum,
sdrf=sdrf,quant=quant,annot=annot,setupSd=setupSd)} #
## try re-adjusting levels
tm1 <- sub("^[[:alpha:]]+( |_|-|\\.)+[[:alpha:]]+","", colnames(abund)) # remove heading text
if(all(grepl("^[[:digit:]]", tm1))) {
tm1 <- try(as.numeric(sub("( |_|-|\\.)*[[:alpha:]].*","", tm1)), silent=TRUE) # remove tailing text and try converting to numeric
if(!inherits(tm1, "try-error")) {
setupSd$level <- match(tm1, sort(unique(tm1)))
names(setupSd$level) <- tm1
if(!silent) message(fxNa,"Sucessfully re-adjusted levels after bringing in order of Sdrf")}
}
} else { # no sdrf-info
### begin old
## harmonize sample-names/2
colNa <- colnames(abund)
chGr <- grepl("^X[[:digit:]]", colNa) # check & remove heading 'X' from initial column-names starting with digits
if(any(chGr)) colNa[which(chGr)] <- sub("^X","", colNa[which(chGr)]) #
}
if(length(colNa)==ncol(abund)) { colnames(abund) <- colNa
if(length(setupSd$sampleNames)==length(colNa)) setupSd$sampleNames <- colNa
if(length(quant) >0 && ncol(quant)==length(colNa)) colnames(quant) <- colNa }
if(length(dim(counts)) >1 && length(counts) >0) colnames(counts) <- colNa
if(debug) {message(fxNa,"Read sample-meta data, rPD14"); rPD14 <- list(sdrf=sdrf,suplAnnotFile=suplAnnotFile,abund=abund, quant=quant,refLi=refLi,annot=annot,setupSd=setupSd,counts=counts,sampleNames=sampleNames)}
## main plotting of distribution of intensities
custLay <- NULL
if(is.numeric(plotGraph) && length(plotGraph) >0) {custLay <- as.integer(plotGraph); plotGraph <- TRUE} else {
if(!isTRUE(plotGraph)) plotGraph <- FALSE}
if(plotGraph) .plotQuantDistr(abund=abund, quant=quant, custLay=custLay, normalizeMeth=normalizeMeth, softNa="Proteome Discoverer",
refLi=refLi, refLiIni=refLiIni, notLogAbund=TRUE, tit=titGraph, las=NULL, silent=silent, callFrom=fxNa, debug=debug)
## meta-data
notes <- c(inpFile=paFi, qmethod="ProteomeDiscoverer", qMethVersion=if(length(infoDat) >0) unique(infoDat$Software.Revision) else NA,
identType="protein", rawFilePath= if(length(infoDat) >0) infoDat$File.Name[1] else NA, normalizeMeth=normalizeMeth, call=deparse(match.call()),
created=as.character(Sys.time()), wrProteo.version=utils::packageVersion("wrProteo"), machine=Sys.info()["nodename"])
## final output
if(isTRUE(separateAnnot)) list(raw=abund, quant=quant, annot=annot, counts=counts, sampleSetup=setupSd, quantNotes=parametersD, notes=notes) else data.frame(quant,annot)
}
}
#' Additional/final Check And Adjustments To Sample-order After readSampleMetaData()
#'
#' This (low-level) function performs an additional/final chek & adjustments to sample-names after readSampleMetaData()
#'
#' @param abund (matrix or data.frame) abundance data, only the colnames will be used
#' @param setupSd (list) describing sammple-setup, typically produced by from package wrMisc
#' @param gr (factor) optional custom information about replicate-layout, has priority over setuoSd
#' @param sampleNames (character) custom sample-names, has priority over abund and setuoSd
#' @param quantMeth (character) 2-letter abbreviation of name of quantitation-software (eg 'MQ')
#' @param silent (logical) suppress messages
#' @param debug (logical) display additional messages for debugging
#' @param callFrom (character) allow easier tracking of messages produced
#' @return This function returns an enlaged/updated list 'setupSd' (set setupSd$sampleNames, setupSd$groups)
#' @seealso used in \code{readProtDiscovererFile}, \code{\link{readMaxQuantFile}}, \code{\link{readProlineFile}}, \code{\link{readFragpipeFile}}
#' @examples
#' set.seed(2021)
#' @export
.checkSetupGroups <- function(abund, setupSd, gr=NULL, sampleNames=NULL, quantMeth=NULL, silent=FALSE, callFrom=NULL, debug=FALSE) {
## additional/final check & adjustments to sample-names after readSampleMetaData()
## examine
## returns enlaged/updated list 'setupSd' (set setupSd$sampleNames, setupSd$groups)
## assume 'abund' to be valid matrix of data.frame !!
## 'setupSd' as list produced by readSampleMetaData()
## 'gr' .. (char vector or factor) designating who should be considered as replicate/same level, as optional custom entry (has prior over automatic groups/levels)
## 'sampleNames' .. optional custom entry for sample names (has prior over automatic names)
## 'quantMeth' .. (char vect, length=1) design method via abbreviation like 'PD' for ProteomeDiscoverer, 'MQ' for MaxQuant, 'PL' for Proline, etc (used for automatic trimming of default column/sample-names)
fxNa <- wrMisc::.composeCallName(callFrom, newNa=".checkSetupGroups")
delPat <- "_[[:digit:]]+$|\\ [[:digit:]]+$|\\-[[:digit:]]+$" # remove enumerators, ie tailing numbers after separator
rawExt <- "\\.raw$|\\.Raw$|\\.RAW$" # paste(paste0("\\.",c("Raw","raw","RAW"),"$"), collapse="|")
.corPathW <- function(x) gsub("\\\\", "/", x)
.adjPat <- function(x) { out <- match(x, unique(x)); names(out) <- names(x); out} # needed ??
extrSamNaSetup <- function(setS, meth) {
## extract (use-given) sampleNames out of setupSd$annotBySoft (colnames may depend on quant-method)
if(!any(c("PD","MQ","PL","FP","IB") %in% meth, na.rm=TRUE)) meth <- "other"
switch(meth, PD = NULL,
MQ=setupSd$annotBySoft$Experiment , PL=setupSd$annotBySoft$Experiment, FP=setupSd$annotBySoft$Experiment, IB=NULL, other=NULL)}
defColNa <- function(colN, meth) { # check if colN may represent default colnames (ie not useful since wo any indication about samples)
## note MQ : requires setExperiment to be defined by user, set each sample as different for getting quant by sample !
## note FP : requires setExperiment to be defined by user (defining different bioreplictates sufficient for getting quant by sample)
if(!any(c("PD","MQ","PL","FP","IB") %in% meth, na.rm=TRUE)) meth <- "other"
switch(meth, PD = all(grepl("F[[:digit:]]+\\.Sample$", colN), na.rm=TRUE),
MQ=FALSE , PL=FALSE, FP=FALSE, IB=FALSE, other=FALSE)}
.extrColNames <- function(abun, meth, silent=FALSE, callFrom=NULL, debug=FALSE) { ## get sampleNames from file-names/colnames of abund & clean
fxNa <- wrMisc::.composeCallName(callFrom, newNa=".extrColNames")
# abun=abund; meth=quantMeth
grou <- grou2 <- NULL
colNa2 <- sub(rawExt,"", .corPathW(colnames(abun)))
colNa <- basename(colNa2) # trim to filename
chDu <- duplicated(colNa)
if(debug) {message(fxNa,"eCN1"); eCN1 <- list(abun=abun,meth=meth,colNa2=colNa2,colNa=colNa,chDu=chDu)}
if(any(chDu)) {
if(debug) message(fxNa,"Some stripped filenames appear duplicated, need to keep with path ...")
colNa <- colNa2
chDu <- duplicated(colNa) }
if(any(chDu) && !silent) message(fxNa,"Some filenames appear duplicated !! (eg ",wrMisc::pasteC(utils::head(unique(colNa[which(chDu)]), 3))," )")
if(debug) {message(fxNa,"eCN2"); eCN2 <- list(abun=abun,meth=meth,colNa2=colNa2,colNa=colNa,chDu=chDu)}
if(is.null(colNa) || sum(duplicated(colNa)) ==length(colNa) -1) { sampleNames <- NULL
if(debug) message(fxNa,"All colnames of abund are empty or identical ! (can't use to figure out pattern of replicates/levels)")
} else {
## colNa seem usable
if(any(c("FP") %in% meth)) colNa <- sub("MaxLFQ Intensity$|Intensity$","", colNa)
if(any(c("MQ") %in% meth)) colNa <- sub("^LFQ Intensity","", colNa)
if("PL" %in% meth) colNa <- sub("^abundance|^Abundance","", colNa)
if(any(c("IB") %in% meth)) colNa <- sub("^Intensity_{0,1}","", colNa)
colNa <- gsub(" +$|\\.+$|_+$|\\-+$","", colNa) # remove tailing separators (' ','.','_','-')
colNa <- gsub("^ +|^\\.+|^_+|^\\-+","", colNa) # heading tailing separators
chDu <- duplicated(colNa)
if(!silent && any(chDu)) message(fxNa,"NOTE : ",sum(chDu)," DUPLICATED colnames for abund !! (eg ",wrMisc::pasteC(utils::head(unique(colNa[which(chDu)]), 3))," )")
if(debug) {message(fxNa,"eCN3"); eCN3 <- list()}
## now address group names/levels
if("PD" %in% meth) {
grou2 <- sub("rep[[:digit:]]+$","", colnames(abun))
grou2 <- sub(" +$|\\.+$|_+$|\\-+$","", grou2) # remove tailing separators (' ','.','_','-')
if(all(grepl("^\\.F[[:digit:]]+\\.Sample$", colnames(abun)))) grou2 <- NULL # PD default names like '.F1.Sample', '.F2.Sample' etc
} else {
if("PL" %in% meth) colNa <- sub("\\.mzDB\\.t\\.xml$","", colNa)
colNa <- sub("\\.raw$|\\.RAW$","", colNa)
sep <- c("_","\\-","\\.")
#sub("\\.mzDB|\\.t\\.xml","",colNa)
rmTx <- paste0(c("Sample","Samp","Replicate","Repl","Rep"),"$")
rmTx <- paste(paste0(rep(sep, each=length(rmTx)), rep(rmTx, length(sep))), collapse="|")
grou2 <- sub(rmTx,"",sub(delPat,"", colNa)) # remove tailing enumerators..
}
if(debug) message(fxNa,"Based on colnames(abund) : ",length(unique(grou2))," levels for ",ncol(abun)," samples")
}
if(debug) {message(fxNa,"eCN4 done"); eCN4 <- list()}
list(sampleNames=colNa, grou=grou2) }
.replSingleWord <- function(ii, tx, se, replBy="") { ## for single word in all tx; moove to wrMisc?
## ii .. word to remove
## tx .. ini char-vector
## se .. possible separators
if(length(se) >1) se <- paste0("(",paste(se,collapse="|"),")")
i2 <- grepl(paste0("^",ii), tx) # heading
out <- rep(NA, length(tx))
## need to protect special characters in ii
ii <- wrMisc::protectSpecChar(ii)
if(any(i2)) out[which(i2)] <- sub(paste0("^",ii,se), replBy, tx[which(i2)]) # heading : remove with following sep (if avail)
if(any(!i2)) out[which(!i2)] <- sub(paste0("(",se,ii,")|(^",ii,")"), replBy, tx[which(!i2)]) # not heading (may be heding now..): remove preceeding sep
out }
.trimRedWord <- function(txt, sep=c(" ","_","-","/"), minLe=3, strict=TRUE, silent=TRUE, callFrom=NULL, debug=FALSE) {
## moove to wrMisc?
## function to trim redundant words (@separator) similar to wrMisc::trimRedundText()
## strict .. (logical) requires separator to occur in each single character-string to be considered
## minLe .. min length for words to be considered (otherwise frequently problem with '1')
##
datOK <- length(txt) >0
if(datOK) { chNA <- is.na(txt)
if(all(chNA)) datOK <- FALSE else tx1 <- txt[which(!chNA)]
}
if(datOK) {
#strict=TRUE
chSe <- sapply(sep, function(x) nchar(tx1) > nchar(gsub(x,"",tx1)))
chS2 <- if(strict) colSums(chSe) ==length(tx1) else colSums(chSe) >0 # if strict require at least instace of 'sep' in each element
if(debug) {message(fxNa,"tRW1"); tRW1 <- list(txt=txt,sep=sep,chSe=chSe,chS2=chS2,strit=strict,tx1=tx1)}
if(any(chS2)) sep <- sep[which(chS2)] else datOK <- FALSE # reduce to sep found
}
if(datOK) {
allW <- unique(unlist(strsplit(tx1, paste(sep, collapse="|")), use.names=FALSE))
## keep only >2 char words
chLe <- nchar(allW) >= minLe
if(any(!chLe)) allW <- allW[which(chLe)]
## check all 'words' for recurring in each char-string
if(length(allW) >0) {
chW <- colSums(sapply(allW, grepl, tx1)) ==length(tx1)
if(any(chW)) {
rmWo <- names(chW[which(chW)])
#chLe <- nchar(rmWo) >0
if(debug) {message(fxNa,"tRW2"); tRW2 <- list(txt=txt,sep=sep,chS2=chS2,strit=strict,tx1=tx1,chW=chW,allW=allW,rmWo=rmWo,chLe=chLe)}
if(length(rmWo) >0) {
for(wo in rmWo) tx1 <- .replSingleWord(wo, tx1, sep)
txt[which(!chNA)] <- tx1
}
if(any(chLe)) {
rmWo <- rmWo[which(chLe)]
for(wo in rmWo) tx1 <- .replSingleWord(wo, tx1, sep)
txt[which(!chNA)] <- tx1 }
} }
}
txt }
rmSharedWords <- function(x, sep=c("_"," ","."), anySep=TRUE, newSep=NULL, fixed=TRUE, minLe=2) {
## function to trim redundant words (@separator) similar to wrMisc::trimRedundText()
## remove common/repeated words as occuring in each instance of x; words are separarated by sep (ignoring NAs); separators must be consistent (no mixing of separators allowed)
## special characters will be automatically protected, order does NOT matter, multiple repeats will be removed, too
## note : anySep=TRUE will consider all separators at one time (), thus combinations with different separators won't be distinguished
## note : heading separators will be removed in any case
## move to wrMisc ??
#example# x1 <- c("aa_A1 yy_zz.txt", NA, "B2 yy_aa_aa_zz.txt"); rmSharedWords(x1)
datOK <- length(x) >0 && length(sep) >0
if(datOK) {
chNA <- is.na(x)
x2 <- if(any(chNA)) x[-which(chNA)] else x
redSep <- which(colSums(sapply(wrMisc::protectSpecChar(sep), function(y) nchar(x2) > nchar(sub(y,"", x2))))==length(x2))
} else x2 <- x
if(datOK && length(redSep) >0) {
se2 <- sep[redSep] # reduce to separators appearing in all cases
if(length(newSep) >1) newSep <- newSep[redSep]
if(isTRUE(anySep)) { # combine all separators
iP <- paste(wrMisc::protectSpecChar(sep), collapse="|")
se2 <- paste(sep, collapse="|")
fixed <- FALSE
} else { se2 <- sep
iP <- wrMisc::protectSpecChar(sep) }
for(i in 1:length(se2)) {
i2 <- iP[i]
chSep <- if(i==1) TRUE else all(nchar(x2) > nchar(sub(i2,"", x2))) # check current separator 'se2' if occuring in each instance
if(chSep) {
spl <- strsplit(x2, split=if(isTRUE(fixed)) se2[i] else iP[i], fixed=fixed)
rmW <- Reduce(intersect, spl) # get words common in all instances
if(length(rmW) >0) {
nCh <- nchar(rmW)
chL <- nCh >= minLe | nCh ==0
if(any(chL)) {
spl <- lapply(spl, function(y) y[-which(y %in% unique(c(rmW[which(chL)], if(TRUE) "")))]) # remove repeated/common 'words'
newSe <- if(length(newSep) >0) {if(length(newSep) < length(sep)) rep(newSep,i)[i] else newSep[i]} else sep[1]
x2 <- sapply(spl, paste, collapse=newSe) } } # paste collapse
}
} }
if(datOK && any(chNA)) {out <- x; out[-which(chNA)] <- x2; out} else x2 }
## start main function
if(!is.list(setupSd)) { if(length(setupSd) >0) warning(fxNa,"BIZZARE format of 'setupSd', it's content will be lost")
setupSd <- list()}
## finish groups of replicates & annotation setupSd
if(debug) { message(fxNa,"cSG0"); cSG0 <- list(gr=gr,abund=abund,sampleNames=sampleNames, setupSd=setupSd, quantMeth=quantMeth) } # sampleNames=sampleNames
## grX .. (new) pattern, setupSd$groups .. as index, + names (level names)
## special case : fractionated samples : need to change assignment of sample-names here ???
colNa <- grou <- grou2 <- saNa <- grX <- NULL
iniSaNa <- iniGr <- FALSE
iniSetupSd <- length(setupSd) >0 # check if init 'setupSd' given
## OPTIONS :
## 1) use custom -values (if avail)
## 2) extract from 'setupSd' (if avail)
## 3) from prev mined from setupSd (groups & levels, not for sampleNames)
## 4) pick/mine from colnames of 'abund'
## sampleNames/colnames : if valid user-defied sampleNames is given => use
if(length(sampleNames) == ncol(abund)) {
## 1) use externally given sampleName ...
iniSaNa <- TRUE
setupSd$sampleNames <- sampleNames
} else {
if(length(sampleNames) ==1 && any(grepl("^sdrf\\.",sampleNames)) && "sdrfDat" %in% setupSd$sdrfDat) {
## 2) arguments orients towards sdrf => check for sdrf column to use
chSd <- colnames(setupSd$sdrfDat) %in% sub("^sdrf\\.", "", sampleNames)
if(any(chSd, na.rm=TRUE)) { ## OK to use
iniSaNa <- TRUE
if(debug) message(fxNa,"Setting 'sampleNames' based on sdrf-column ")
setupSd$sampleNames <- setupSd$sdrfDat[,which(chSd)[1]] }
} else {
## 4.0) from prev mined from setupSd (sampleNames)
## special case PD : check if setupSd$annotBySoft$File.Name usable
if("PD" %in% quantMeth && length(setupSd$annotBySoft$File.Name) ==ncol(abund)) {
chOr <- try(match(basename(setupSd$annotBySoft$File.Name), basename(if("sdrfDat" %in% names(setupSd)) setupSd$sdrfDat$comment.data.file. else setupSd$annotBySoft$filePath)), silent=TRUE)
if(inherits(chOr, "try-error") || !any(is.na(chOr))) {
if(all(chOr ==1:ncol(abund), na.rm=TRUE)) {
setupSd$sampleNames <- sub("\\.raw$|\\.RAW$","", setupSd$annotBySoft$File.Name)
}
}
} else {
## 4.1) default names based on colnames
saNa <- .extrColNames(abund, meth=quantMeth, silent=silent,callFrom=fxNa,debug=debug)
setupSd$sampleNames <- if(length(saNa$sampleNames) ==ncol(abund)) saNa$sampleNames else if(length(setupSd$sdrfDat$comment.data.file.)==ncol(abund)) sub(rawExt,"", setupSd$sdrfDat$comment.data.file.)
} }
}
if(debug) {message(fxNa,"cSG1"); cSG1 <- list(abund=abund,setupSd=setupSd,gr=gr,quantMeth=quantMeth,saNa=saNa)}
if(length(gr) == ncol(abund)) {
## 1) if valid user-defied grouping is given => use
iniGr <- TRUE
grX <- gr
} else {
if(debug && length(gr) >1) message(fxNa,"Invalid entry of 'gr' (length= ",length(gr)," but ",ncol(abund)," expected) ...ignoring")
if(length(gr) <1 && "sdrfDat" %in% names(setupSd)) { # && any(grepl("^sdrf\\.",gr))
## 2) argument orients towards sdrf => check for sdrf column to use
chSd <- colnames(setupSd$sdrfDat) %in% sub("^sdrf\\.", "", gr)
if(any(chSd, na.rm=TRUE)) { grX <- setupSd$sdrfDat[,which(chSd)[1]]
iniGr <- TRUE }
if(debug) {message(fxNa,"cSG1a1"); cSG1a1 <- list(sampleNames=sampleNames,gr=gr,grX=grX,abund=abund,iniSaNa=iniSaNa,iniGr=iniGr,setupSd=setupSd)}
} else {
if(length(setupSd) >0) {
if(debug) {message(fxNa,"cSG1a2"); cSG1a2 <- list(sampleNames=sampleNames,gr=gr,grX=grX,abund=abund,iniSaNa=iniSaNa,iniGr=iniGr,setupSd=setupSd,quantMeth=quantMeth)}
## 3) use from setupSd -if avail
if("PD" %in% quantMeth && length(setupSd$annotBySoft$File.Name) ==ncol(abund)) {
## special case PD : check if setupSd$annotBySoft$File.Name usable
colNa <- setupSd$sampleNames #basename(setupSd$annotBySoft$File.Name) #sub("\\.raw$|\\.RAW$","", basename(sub(rawExt,"", .corPathW(colnames(sampleNames)))))
sep <- c("_","\\-","\\.")
rmTx <- paste0(c("Samples","Sample","Samp","Replicates","Replicate","Rep"),"$")
rmTx <- paste(paste0(rep(sep, each=length(rmTx)), rep(rmTx, length(sep))), collapse="|")
grX <- sub(rmTx,"", sub(delPat,"", setupSd$sampleNames)) # remove tailing enumerators..
if(debug) {message(fxNa,"cSG1a Method ",quantMeth," : Extracted ",length(unique(grX)), " groups of replicates based on meta-data"); cSG1a <- list(grX=grX)}
} else grX <- setupSd$level
} else {
## 4) use/mine sdrf colnames
if(length(saNa) <1) saNa <- .extrColNames(abund, meth=quantMeth, silent=silent,callFrom=fxNa,debug=debug) # extrSamNaSetup(setupSd, quantMeth)
if(length(saNa$grou) ==ncol(abund)) { grX <- saNa$grou
} else {
if(!silent) message(fxNa,"Having TROUBLE finding where sample-groups are defined !!")
} }
if(debug) {message(fxNa,"cSG1b"); cSG1b <- list(sampleNames=sampleNames,gr=gr,grX=grX,abund=abund,iniSaNa=iniSaNa,iniGr=iniGr,setupSd=setupSd)}
}
}
if(debug) {message(fxNa,"cSG2"); cSG2 <- list(sampleNames=sampleNames,gr=gr,grX=grX,abund=abund,iniSaNa=iniSaNa,iniGr=iniGr,setupSd=setupSd)}
## sampleNames based on sdrf
if("sdrfDat" %in% names(setupSd)) {
useSdrfCol <- "comment.data.file."
if(useSdrfCol %in% colnames(setupSd$sdrfDat)) {
setupSd$sampleNaSdrf <- sub("(\\.RAW$)|(\\.Raw$)|(\\.raw$)", "", setupSd$sdrfDat[,useSdrfCol] )
setupSd$sampleNaSdrf <- wrMisc::rmSharedWords(setupSd$sampleNaSdrf, sep=c("_"," ",".","=",";"), silent=silent, callFrom=fxNa)}
}
## last effort to define groups (non-sdrf like info given) : accept 'lowest','min','highest','max','median','med'
if(length(setupSd$meth)==1 && length(setupSd$sdrfDat) >0 && ncol(setupSd$sdrfDat) >1) {
gr2 <- apply(setupSd$sdrfDat, 2, .adjPat)
gr2un <- apply(gr2, 2, function(x) length(unique(x)))
grX <- if(setupSd$meth %in% c("lowest","min")) gr2[,which.min(gr2un)] else {if(setupSd$meth %in% c("highest","max")) gr2[,which.min(gr2un)] else NULL}
if(setupSd$meth %in% c("med","median")) {chGr <- which(gr2un==stats::median(gr2un, na.rm=TRUE)); if(length(chGr) >1) grX <- gr2[,chGr[1]] }
if(debug) {message(fxNa,"Last effort to find replicate-structure: ",length(grX)," cSG2b"); cSG2b <- list(sampleNames=sampleNames,gr=gr,grX=grX,abund=abund,iniSaNa=iniSaNa,iniGr=iniGr,setupSd=setupSd)}
}
if(length(grX) >0) {
if(sum(duplicated(grX)) +1 ==ncol(abund) && ncol(abund) >1) {
if(!silent) message(fxNa,"NO clear distinction of groups found, all samples were put in single group/class")
}
setupSd$level <- match(grX, unique(grX)) # make pattern
setupSd$groups <- names(setupSd$level) <- grX
} else {
}
if(debug) {message(fxNa,"cSG3"); cSG3 <- list(sampleNames=sampleNames,gr=gr,grX=grX,abund=abund,iniSaNa=iniSaNa,iniGr=iniGr,setupSd=setupSd)}
## simplify terms ?
if(TRUE) {
#old#setupSd$sampleNames <- .trimRedWord(setupSd$sampleNames, sep=c(" ","_","-","/"), strict=TRUE, silent=silent, callFrom=fxNa, debug=debug)
setupSd$sampleNames <- rmSharedWords(setupSd$sampleNames, sep=c("_"," ","-","/"))
#old#setupSd$groups <- names(setupSd$level) <- .trimRedWord(setupSd$groups, sep=c(" ","_","-","/"), strict=TRUE, silent=silent, callFrom=fxNa, debug=debug)
setupSd$groups <- names(setupSd$level) <- rmSharedWords(setupSd$groups, sep=c("_"," ","-","/"))
}
if(debug) {message(fxNa,"cSG4"); cSG4 <- list(sampleNames=sampleNames,gr=gr,grX=grX,abund=abund,iniSaNa=iniSaNa,iniGr=iniGr,setupSd=setupSd)}
## set result to object
setupSd
}
#' Generic Plotting Of Density Distribution For Quantitation Import-functions
#'
#' This (low-level) function allows (generic) plotting of density distribution for quantitation import-functions
#'
#' @param abund (matrix or data.frame) abundance data, will be plottes as distribution
#' @param quant (matrix or data.frame) optional additional abundance data, to plot 2nd distribution, eg of normalized data
#' @param custLay (matrix) describing sammple-setup, typically produced by
#' @param normalizeMeth (character, length=1) name of normalization method (will be displayed in title of figure)
#' @param softNa (character, length=1) name of quantitation-software (typically 2-letter abbreviation, eg 'MQ')
#' @param refLi (integer) to display number reference lines
#' @param refLiIni (integer) to display initial number reference lines
#' @param notLogAbund (logical) set to \code{TRUE} if \code{abund} is linear but should be plotted as log2
#' @param figMarg (numeric, length=4) custom figure margins (will be passed to \code{\link[graphics]{par}}), defaults to c(3.5, 3.5, 3, 1)
#' @param tit (character) custom title
#' @param las (integer) indicate orientation of text in axes
#' @param cexAxis (numeric) size of numeric axis labels as cex-expansion factor (see also \code{\link[graphics]{par}})
#' @param nameSer (character) custom label for data-sets or columns (length must match number of data-sets)
#' @param cexNameSer (numeric) size of individual data-series labels as cex-expansion factor (see also \code{\link[graphics]{par}})
#' @param silent (logical) suppress messages
#' @param debug (logical) display additional messages for debugging
#' @param callFrom (character) allow easier tracking of messages produced
#' @return This function returns logical value (if data were valid for plotting) and produces a density dustribution figure (if data were found valid)
#' @seealso used in \code{readProtDiscovererFile}, \code{\link{readMaxQuantFile}}, \code{\link{readProlineFile}}, \code{\link{readFragpipeFile}}
#' @examples
#' set.seed(2018); datT8 <- matrix(round(rnorm(800) +3,1), nc=8, dimnames=list(paste(
#' "li",1:100,sep=""), paste(rep(LETTERS[1:3],c(3,3,2)),letters[18:25],sep="")))
#' .plotQuantDistr(datT8, quant=NULL, refLi=NULL, tit="Synthetic Data Distribution")
#' @export
.plotQuantDistr <- function(abund, quant, custLay=NULL, normalizeMeth=NULL, softNa=NULL, refLi=NULL, refLiIni=NULL, notLogAbund=NA, figMarg=c(3.5, 3.5, 3, 1), tit=NULL, las=NULL, cexAxis=0.8, nameSer=NULL, cexNameSer=NULL, silent=FALSE, callFrom=NULL, debug=FALSE) {
## generic plotting of densirt distribution for quantitation import-functions
## assume 'abund' (raw, non-normalized) and 'quant' (final normalized) to be valid matrix of data.frame !!
## 'custLay' ..(matrix) for layout()
## 'normalizeMeth' (character)
## 'softNa' .. (char vect, length=1) design method, used in display only
## 'refLi' .. (integer) reference line
## 'refLiIni' .. (integer) initial reference line(s)
fxNa <- wrMisc::.composeCallName(callFrom, newNa=".plotQuantDistr")
oparMar <- graphics::par("mar") # old margins, for rest after figure
oparLayout <- graphics::par("mfcol") # old layout, for rest after figure
on.exit(graphics::par(mar=oparMar, mfcol=oparLayout)) # restore old mar settings
if(debug) {message(fxNa,"pQD0 .. length custLay ", length(custLay)); pQD0 <- list(abund=abund,quant=quant,custLay=custLay,normalizeMeth=normalizeMeth,softNa=softNa,refLi=refLi)}
plotGraph <- TRUE # useful ? (to report if plot can be drawn as output ?)
## check if abund & quant are redundant (while normalizeMeth="none")
abundQuantRed <- FALSE # see if abund and quant are redundant (ie no need to plot twice)
if("none" %in% normalizeMeth && length(abund) >0 && length(quant) >0) {
if(identical(quant, abund) || identical(quant, log2(abund))) { abundQuantRed <- TRUE
if(debug) message(fxNa,"No need for 2nd plot, 'abund' and 'quant' are identical ..") }
}
if(length(custLay) >0) graphics::layout(custLay) else {
if(!identical(normalizeMeth,"none") && length(quant) >0 && !abundQuantRed) { ch1 <- try(graphics::layout(1:2), silent=TRUE)
if(inherits(ch1, "try-error")) message(fxNa,"Problem with figure, Need to restore layout ..") else if(debug) message(fxNa,"Setting layout to 1:2") }}
if(debug) { message(fxNa,"pQD1"); pQD1 <- list(abund=abund,quant=quant,custLay=custLay,normalizeMeth=normalizeMeth,softNa=softNa,refLi=refLi)}
if(length(las) != 1 || !is.integer(las)) {
ch1 <- c(if(length(abund) >0) ncol(abund) >7 || stats::median(nchar(colnames(abund)), na.rm=TRUE) >8 else FALSE,
if(length(quant) >0) ncol(quant) >7 || stats::median(nchar(colnames(quant)), na.rm=TRUE) >8 else FALSE)
las <- if(any(ch1)) 2 else 1 }
if(length(figMarg) != 4 || !is.numeric(figMarg)) { figMarg <- c(3.5, 3.8, 3, 1) # mar: bot,le,top,ri
if(debug) message(fxNa,"Invalid entry for argument 'figArg' (must be umeric of length=4), setting to default")}
ch1 <- try(graphics::par(mar=figMarg), silent=TRUE)
if(inherits(ch1, "try-error")) message(fxNa,"Problem with figure, Need to restore mar ..")
if(is.null(tit)) tit <- paste(softNa," Quantification ")
titSu <- if(length(refLi) >0) paste0(c(if(length(refLiIni) ==1) c("'",refLiIni,"'") else c(" by ",length(refLi)," selected lines")),collapse="") else NULL #
usePa <- c("wrGraph","sm")
misPa <- !sapply(usePa, function(pkg) requireNamespace(pkg, quietly=TRUE))
titQ <- if(length(abund) >0) paste(tit, "(initial)",sep=" ") else tit
.findSuplLi <- function(x, n=3) {
if(length(x) >2) pretty(stats::quantile(x, c(0.15, 0.85), na.rm=TRUE), n) else NULL
}
## check log-status, ie notLogAbund
if(length(abund) >0 && length(quant) >0) {
if(length(notLogAbund) <1) { notLogAbund <- NA
if(!silent) message(fxNa,"Invalid entry for 'notLogAbund', setting to default =NA") }
if(is.na(notLogAbund)) {
dAb <- diff(range(abund, na.rm=TRUE))
dQa <- diff(range(quant, na.rm=TRUE))
sugLogAbun <- dAb > 2^(signif(dQa,3) -2) && dAb > 3*dQa
if(debug) message(fxNa,"Trying to figure out if log2 should be taken for 'abund': ",dAb > 2^(signif(dQa,3) -2)," and ", dAb > 3*dQa)
if(sugLogAbun) {
notLogAbund <- TRUE } # abund is very likely linear scale, while quant is likely log-scale
} }
msg <- c("Invalid entry for 'notLogAbund',","Unknown log-status,"," assuming data is log2, ie notLogAbund=FALSE")
if(length(notLogAbund) <1 && !silent) { notLogAbund <- FALSE
if(!silent) message(fxNa, msg[-2])
} else if(any(is.na(notLogAbund))) { notLogAbund <- notLogAbund[which(is.na(notLogAbund))] <- FALSE
if(!silent) message(fxNa, msg[-1]) }
colNa <- if(length(abund) >0) colnames(abund) else colnames(quant)
if(length(colNa) >0) {
ncharAx <- stats::quantile(nchar(colNa), c(0.5,0.9), na.rm=TRUE)
if(length(cexNameSer) !=1 || any(is.na(cexNameSer))) cexNameSer <- sort(round( c(7/ncharAx[2], 1.9/log(length(colNa)), 1, 0.48), 2))[2] # adjust cex series-names to length and ncol
suplLineA <- if(length(abund) >0) {.findSuplLi(if(isTRUE(notLogAbund)) log2(abund) else abund)} else NA
suplLineQ <- if(length(quant) >0) .findSuplLi(quant) else NA
} else { ncharAx <- cexNameSer <- suplLineA <- suplLineQ <- NA
if(debug) message(fxNa," Note : BOTH 'abund' and 'quant' are EMPTY - NOTHING TO PLOT")}
## check axis cex
if(debug) { message(fxNa,"pQD2 .. misPa ", wrMisc::pasteC(misPa,quoteC="'"),"; abund is linear ",notLogAbund); pQD2 <- list(abund=abund,quant=quant,custLay=custLay,normalizeMeth=normalizeMeth,softNa=softNa,refLi=refLi,tit=tit,titQ=titQ,suplLineA=suplLineA,suplLineQ=suplLineQ,ncharAx=ncharAx) }
chNeg <- sum(abund <0, na.rm=TRUE)
if(chNeg >0 && notLogAbund) { notLogAbund <- FALSE
if(!silent) message(fxNa,"Data suggest taking log2 might be useful, but due to presence of ",chNeg," negative values this is not possible")
}
if(any(misPa, na.rm=TRUE)) {
if(!silent) message(fxNa,"Please install package(s) ", wrMisc::pasteC(usePa[which(misPa)])," from CRAN for drawing vioplots")
if(length(abund) >0) {
## wrGraph not available : simple boxplot
ch1 <- try(graphics::boxplot(if(isTRUE(notLogAbund)) suppressWarnings(log2(abund)) else abund, main=titQ, las=1, outline=FALSE), silent=TRUE)
if(inherits(ch1, "try-error")) {plotGraph <- FALSE; if(!silent) message(fxNa,"UNABLE to draw boxplot of distribution !! ", sub("^Error in",":",ch1))} else {
graphics::abline(h=suplLineA, lty=2, col=grDevices::grey(0.6))} }
## plot normalized
if(length(quant) >0 && !abundQuantRed) {
if(identical(normalizeMeth,"none") || length(quant) <0) {
if(debug) {message(fxNa,"pQD3 .. dim quant: ", nrow(quant)," li and ",ncol(quant)," cols; colnames : ",wrMisc::pasteC(colnames(quant))," ")}
ch1 <- try(graphics::boxplot(quant, main=paste(tit," (",normalizeMeth,"-normalized",titSu,")"), las=1, outline=FALSE), silent=TRUE)
if(inherits(ch1, "try-error")) if(!silent) message(fxNa,"UNABLE to draw boxplot of normalized distribution !! ", sub("^Error in",":",ch1)) else {
graphics::abline(h=suplLineQ, lty=2, col=grDevices::grey(0.6)) } } }
} else { # wrGraph and sm are available
if(debug) { message(fxNa,"pQD4 draw vioplotW , abund is linear ",notLogAbund); pQD4 <- list(abund=abund,quant=quant,tit=tit,normalizeMeth=normalizeMeth,softNa=softNa,refLi=refLi,titQ=titQ,notLogAbund=notLogAbund,titSu=titSu,las=las, cexAxis=cexAxis, nameSer=nameSer, cexNameSer=cexNameSer,notLogAbund=notLogAbund,abundQuantRed=abundQuantRed ) }
if(length(abund) >0) {
if(length(chNeg) <1) abund <- suppressWarnings(log2(abund)) # presume as true 'raw', ie NOT log2
if(debug) message(fxNa," Try 1st Vioplot , is linear abundance data ",notLogAbund,", cexNameSer ",cexNameSer)
ch1 <- try(wrGraph::vioplotW(if(isTRUE(notLogAbund)) log2(abund) else abund, tit=titQ, wex=NULL, las=las, cexAxis=cexAxis, nameSer=nameSer, cexNameSer=cexNameSer, horizontal=FALSE, silent=debug, debug=debug, callFrom=fxNa), silent=TRUE)
if(inherits(ch1, "try-error")) {plotGraph <- FALSE; if(!silent) message(fxNa,"UNABLE to plot vioplotW !! ", sub("^Error in",":",ch1))
} else graphics::abline(h=suplLineA, lty=2, col=grDevices::grey(0.6))
}
## now normalized (and/or log-scale)
if(length(quant) >0 && !abundQuantRed) {
if(debug) {message(fxNa,"pQD5 draw norm/quant as vioplotW() ", length(quant) >0)}
tit1 <- if(length(normalizeMeth) ==1) paste(tit,", ",normalizeMeth,"-normalized",titSu) else paste(tit,", ",titSu)
if(debug) message(fxNa," Try 2nd Vioplot ")
ch1 <- try(wrGraph::vioplotW(quant, tit=tit1, wex=NULL, las=las, cexAxis=cexAxis, nameSer=nameSer, cexNameSer=cexNameSer, horizontal=FALSE, silent=debug, debug=debug,callFrom=fxNa), silent=TRUE)
if(inherits(ch1, "try-error")) { if(!silent) message(fxNa,"UNABLE to plot vioplotW for normalized data !! ", sub("^Error in",":",ch1))
} else graphics::abline(h=suplLineQ, lty=2, col=grDevices::grey(0.6))
}
}
plotGraph }
#' Extract Additional Information To Construct The Colum 'SpecType'
#'
#' This (low-level) function creates the column annot[,'SpecType'] which may help distinguishing different lines/proteins.
#' This information may, for example, be used to normalize only to all proteins of a common backgroud matrix (species).
#' In order to compare \code{specPref} a species-column will be added to the annotation (\code{annot}) - if not already present
#' If $mainSpecies or $conta: match to annot[,"Species"], annot[,"EntryName"], annot[,"GeneName"], if length==1 grep in annot[,"Species"]
#'
#' @details
#' Different to \code{readSampleMetaData} this function also considers the main annotation as axtracted with main quantification data.
#' For example, this function can complement protein annotation data if columns 'Accession','EntryName' or 'SpecType' are missing
#'
#'
#' @param specPref (list) may contain $mainSpecies, $conta ...
#' @param annot (matrix) main protein annotation
#' @param useColumn (factor) columns from annot to use/mine
#' @param suplInp (matrix) additional custom annotation
#' @param silent (logical) suppress messages
#' @param debug (logical) display additional messages for debugging (starting with 'mainSpecies','conta' and others - later may overwrite prev settings)
#' @param callFrom (character) allow easier tracking of messages produced
#' @return This function returns a matrix with additional column 'SpecType'
#' @seealso used in \code{readProtDiscovererFile}, \code{\link{readMaxQuantFile}}, \code{\link{readProlineFile}}, \code{\link{readFragpipeFile}}
#' @examples
#' annot1 <- cbind( Leading.razor.protein=c("sp|P00925|ENO2_YEAST",
#' "sp|Q3E792|RS25A_YEAST", "sp|P09938|RIR2_YEAST", "sp|P09938|RIR2_YEAST",
#' "sp|Q99186|AP2M_YEAST", "sp|P00915|CAH1_HUMAN"),
#' Species= rep(c("Saccharomyces cerevisiae","Homo sapiens"), c(5,1)))
#' specPref1 <- list(conta="CON_|LYSC_CHICK",
#' mainSpecies="OS=Saccharomyces cerevisiae", spike="P00915") # MQ type
#' .extrSpecPref(specPref1, annot1, useColumn=c("Species","Leading.razor.protein"))
#' @export
.extrSpecPref <- function(specPref, annot, useColumn=c("Species","EntryName","GeneName","Accession"), suplInp=NULL, silent=FALSE, debug=FALSE, callFrom=NULL) {
## create column annot[,'SpecType']
## if $mainSpecies or $conta: match to annot[,"Species"], annot[,"EntryName"], annot[,"GeneName"], if length==1 grep in annot[,"Species"]
## if other : match to annot[,"Species"], annot[,"Accession"], annot[,"EntryName"], annot[,"GeneName"], if length==1 grep in annot[,"EntryName"], annot[,"GeneName"], annot[,"Species"]
## 'suplInp' add'l matrix of annot (really needed ?)
## return results in column annot[,"SpecType"] (starting with 'mainSpecies','conta' and others - later may overwrite prev settings)
## special for PD : optional useColumn[5:6] : look by grep for specPref tags in cols "Majority.protein.IDs" and "Fasta.headers"
## 'specPref' ..(list) may contain $mainSpecies, $conta ...
## 'annot' ..(matrix) main protein annotation
## 'useColumn' ..(character) columns from annot to use/mine
## 'suplInp' ..(matrix) additional custom annotation
## aim : add new column 'SpecType'
fxNa <- wrMisc::.composeCallName(callFrom, newNa=".extrSpecPref")
if(debug) {message(fxNa," eSP0"); eSP0 <- list(specPref=specPref,annot=annot,useColumn=useColumn,suplInp=suplInp)}
if(length(annot) <1 || length(dim(annot)) !=2) stop("invalid 'annot' (must be matrix or data.frame)")
## check suplInp & match to useColumn, add to useColumn
if(length(useColumn) >4 && length(suplInp) >0 && length(dim(suplInp))==2) {
chAnn <- useColumn[5:length(useColumn)] %in% colnames(suplInp)
if(any(!chAnn)) useColumn <- c(useColumn[1:4], useColumn[(5:length(useColumn))[which(chAnn)]])
if(length(useColumn) <5) suplInp <- NULL
} else suplInp <- NULL
## check useColumn
chAnn <- useColumn[1:min(length(useColumn), 4)] %in% colnames(annot) # check for length useCol=0 ?? # nolint
if(all(!chAnn)) stop("Unknown/Non-standard 'annot' (missing colnames ",wrMisc::pasteC(useColumn[which(!chAnn)], quoteC="'"),")")
if(debug) {message(fxNa,"eSP1"); eSP1 <- list(specPref=specPref,annot=annot,useColumn=useColumn,suplInp=suplInp,chAnn=chAnn)}
if(!"SpecType" %in% colnames(annot)) {annot <- cbind(annot, SpecType=rep(NA, nrow(annot))); if(debug) message(fxNa,"Adding column 'SpecType' to 'annot'")}
if(length(specPref) > 0) specPref <- specPref[which(sapply(specPref, length) >0)] # remove empty .
## split Leading.razor.protein if "Accession","EntryName" NOT in annot
if(!all(c("Accession","EntryName") %in% colnames(annot))) {
chSplCol <- c("Leading.razor.protein","Proteins","Protein.Groups")
chSplCol <- which(colnames(annot) %in% chSplCol)
if(length(chSplCol) >0) {
chSplCol <- chSplCol[1]
## note : init column may contain multiple annot
tmpA <- sub("^[[:lower:]]+\\|","", annot[,chSplCol]) # remove DB ID
Accession <- sub("\\|.+","", tmpA)
annot <- cbind(annot[,1:(min(2, ncol(annot)))], Accession, EntryName =sub("(\\|.+)|(;.+)","", substring(tmpA, nchar(sub("\\|.+","", Accession)) +2)),
if(ncol(annot) >2) {if(ncol(annot) >3) annot[,3:ncol(annot)] else matrix(annot[,3:ncol(annot)], ncol=1, dimnames=list(NULL, colnames(annot)[3]))}) #
rm(Accession, tmpA) }
}
if(debug) {message(fxNa,"eSP1a"); eSP1a <- list(specPref=specPref,annot=annot,useColumn=useColumn,suplInp=suplInp,chAnn=chAnn)}
## inspect specPref
matrixNa <- c("matrix","Matrix", "mainSpecies") # equivalent names for treating as $matrix
if(is.character(specPref)) specPref <- as.list(specPref)
if(any(matrixNa %in% names(specPref))) {
chOthNames <- matrixNa[-1] %in% names(specPref)
if(any(chOthNames)) specPref$matrix <- specPref[which(chOthNames)] # copy content of similar names into specPref$matrix
if(length(specPref$matrix) ==1) { # if length=1 this must design a name for a group/collection of proteins
specPref$IniMatrix <- specPref$matrix
if(grepl("^UPS", specPref$matrix[1])) { specPref$SpeciesMatrix <- "Homo sapiens"
specPref$EntryNameMatrix <- getUPS1acc()$EntryName
specPref$matrix <- getUPS1acc()$ac
names(specPref$matrix) <- getUPS1acc()$UniProt
} # add similar to UPS1 if available ...
} else if(length(specPref$matrix) <6) specPref$matrix <- sapply(specPref$matrix, inspectSpeciesIndic, silent=TRUE, debug=debug)
}
if("spike" %in% names(specPref)) {
if(length(specPref$spike) ==1) {
specPref$IniSpike <- specPref$spike
if(grepl("^UPS", specPref$spike[1])) { specPref$SpeciesSpike <- "Homo sapiens" # add more if available ...
specPref$EntryNameSpike <- getUPS1acc()$EntryName # eg 'CAH1_HUMAN'
specPref$spike <- getUPS1acc()$ac
names(specPref$spike) <- getUPS1acc()$UniProt # eg 'CAH1'
}
} else if(length(specPref$spike) <6) specPref$spike <- sapply(specPref$spike, inspectSpeciesIndic, silent=TRUE, debug=debug)
}
if(debug) {message(fxNa,"eSP1c"); eSP1c <- list(specPref=specPref,annot=annot,useColumn=useColumn,suplInp=suplInp,chAnn=chAnn)}
## add fasta & integrate to annot as Species
chFasta <- length(specPref) > 0 && "fasta" %in% names(specPref)
if(chFasta) { fasta <- NULL
if(is.list(specPref) && length(dim(specPref$fasta)==2)) fasta <- specPref$fasta
if(file.exists(specPref$fasta)) fasta <- readFasta2(specPref$fasta, delim="|", databaseSign=c("sp","tr","generic","gi"), removeEntries=NULL, tableOut=TRUE, UniprSep=c("OS=","OX=","GN=","PE=","SV="), callFrom=fxNa,silent=silent,debug=debug)
if(length(fasta) >0 && !inherits(fasta,"try-error")) {
if(debug) {message(fxNa,"eSP2"); eSP2 <- list(specPref=specPref,annot=annot,useColumn=useColumn,suplInp=suplInp,chAnn=chAnn,fasta=fasta)}
fasta0= fasta
## "O76070" is in annot$Accession, thus it should be ok to mark as spike
##
## NEED TO ADD UPS1 proteins (option) to fasta ("O76070" not part of fasta !!)
## check if specPref$spike contains Accessions not in fasta AND neither in ; how to know we're in UPS1 ?
if(length(fasta) >1 && all(c("spike", "IniSpike") %in% names(specPref))) {
if(grepl("^UPS", specPref$IniSpike)) {
chSp <- specPref$spike %in% fasta[,"uniqueIdentifier"] # check if already in spike
if(any(!chSp, na.rm=TRUE)) { ## add to fasta
faNew <- matrix(NA, nrow=sum(!chSp), ncol=ncol(fasta))
## problem : 'O76070' : 'entryName' SYUG_HUMAN; 'GN'=SNCG : entryName frequently same as GN but NOT always
faNew[,match(c("database","uniqueIdentifier","entryName","OS","GN"), colnames(fasta))] <- cbind(database=rep("sp", sum(!chSp)),
uniqueIdentifier=specPref$spike[which(!chSp)], entryName=names(specPref$spike[which(!chSp)]), # 'proteinName' & 'sequence' not available
OS=if(length(specPref$SpeciesSpike) >0) specPref$SpeciesSpike else NA, GN=names(specPref$spike[which(!chSp)]))
fasta <- rbind(fasta, faNew)
if(debug) message(fxNa,sum(!chSp)," Additional ",specPref$IniSpike," proteins were added to Fasta information")
}
}
## add here other collections like UPS1 if available ..
}
eqiCol <- matrix(c("Accession","EntryName","Description","Species","Modif", "uniqueIdentifier","entryName","proteinName","OS","Modif"), ncol=2, dimnames=list(NULL, c("annot","fasta")))
## (future) what is the best way to add protein/peptide modifications ?
useLi <- cbind(match(annot[,"Accession"], fasta[,"uniqueIdentifier"]), match(annot[,"EntryName"], fasta[,"entryName"]))
chNA <- colSums(!is.na(useLi))
fasta <- fasta[useLi[,which.max(chNA)], wrMisc::naOmit(match(eqiCol[1:4,2], colnames(fasta)))] # now in order of main data
if("OS" %in% colnames(fasta)) { fasta[,"OS"] <- sub(" \\(.+","", fasta[,"OS"]) } ## strip strain info
chCol <- !eqiCol[,1] %in% colnames(annot) & eqiCol[,2] %in% colnames(fasta)
if(any(chCol)) annot <- cbind(annot, matrix(NA, ncol=sum(chCol), nrow=nrow(annot), dimnames=list(NULL, eqiCol[which(chCol),1]))) ## add annotation columns to annot if not already present
for(i in 1:nrow(eqiCol)) { # complete annotation (by integrating from fasta) separately for each column
isNA <- is.na(annot[,which(colnames(annot)==eqiCol[i,1])])
if(any(isNA)) { isNA <- which(isNA)
annot[isNA, which(colnames(annot)==eqiCol[i,1])] <- fasta[useLi[isNA], which(colnames(fasta)==eqiCol[i,2])]
}
}
if(debug) {message(fxNa,"eSP2c"); eSP2c <- list(specPref=specPref,annot=annot,useColumn=useColumn,suplInp=suplInp,chAnn=chAnn,fasta=fasta)}
}
} else {if(debug) fasta <- NULL} ## finish adding fasta
## main use of specPref to assign info to column 'SpecType'
if(length(specPref) > 0) if(is.list(specPref)) { # remove NA from specPref
chNA <- sapply(specPref, is.na)
if(any(unlist(chNA))) specPref <- sapply(specPref, wrMisc::naOmit)
} else { chNA <- is.na(specPref)
if(all(chNA)) specPref <- NULL else { spNames <- names(specPref[which(!chNA)]); specPref <- as.list(specPref[which(!chNA)]); names(specPref) <- spNames}}
if(length(specPref) > 0) {
if(debug) {message(fxNa,"eSP3"); eSP3 <- list(specPref=specPref,annot=annot,useColumn=useColumn)}
## convert "OS=Saccharomyces cerevisiae" to "Saccharomyces cerevisiae"
chPr <- sapply(specPref, function(x) grep("^OS=[[:upper:]][[:lower:]]+", x))
chPr2 <- sapply(chPr, length) > 0
if(any(chPr2)) {for(i in which(chPr2)) specPref[[i]] <- sub("^OS=", "", specPref[[i]]) }
## if specPref$mainSpecies not existing : copy/convert content specPref$matrix to specPref$mainSpecies (redundant)
if("matrix" %in% names(specPref) && !"mainSpecies" %in% names(specPref)) specPref$mainSpecies <- specPref$matrix
chNa <- c("mainSpecies","conta") %in% names(specPref)
if(any(!chNa) && !silent) message(fxNa," ",wrMisc::pasteC(c("mainSpecies","conta")[which(!chNa)], quoteC="'")," Seem absent from 'specPref' !")
if(debug) {message(fxNa,"eSP3a"); eSP3a <- list(specPref=specPref,annot=annot,useColumn=useColumn,suplInp=suplInp)}
## add species to annot (if not already present)
if(!"Species" %in% tolower(colnames(annot))) {
spec <- spc2 <- rep(NA, nrow(annot))
if("EntryName" %in% colnames(annot)) { hasSep <- grep("_",annot[,"EntryName"])
if(length(hasSep) >0) {
spc2[hasSep] <- sub(".+_","", annot[hasSep,"EntryName"])
spe2 <- unique(wrMisc::naOmit(spc2))
spe3 <- sapply(spe2, inspectSpeciesIndic, silent=TRUE)
for(i in 1:length(spe2)) spec[which(spc2 %in% spe2[i])] <- spe3[i]
annot <- cbind(annot, species=spec) }}
}
## final assigning content of annot[,"SpecType"]
#old# multGrepFields <- c("mainSpecies","spike","conta") # fields from specPref to consider (now has redundant & other ...)
rmSpecPrefFi <- c("matrix","sampleNames","IniMatrix","IniSpike","SpeciesSpike","EntryNameSpike") # rather define fields which are technical copies (to exclude some terms)
multGrepFields <- -1*which(names(specPref) %in% rmSpecPrefFi)
if(length(multGrepFields) <1) multGrepFields <- 1:length(specPref)
##
.MultGrep2 <- function(pat, y) {y <- as.matrix(y); z <- if(length(pat)==1) grepl(pat, y) else rowSums(sapply(pat, grepl, y)) >0
if(length(dim(y)) >1) rowSums(matrix(z, ncol=ncol(y))) >0 else z } # (multiple) grepl() when length of pattern 'pat' >0
mulP <- lapply(specPref[multGrepFields], .MultGrep2, annot)
chLe <- sapply(mulP, length)
## assign to annot
for(i in which(chLe >0)) { markLi <- which(as.logical(mulP[[i]]))
if(length(markLi) >0) annot[markLi,"SpecType"] <- names(mulP)[i] }
## fix problem with specPref$mainSpecies not found
if(length(specPref$mainSpecies) >0 && !any("mainSpecies" %in% annot[,"SpecType"], na.rm=TRUE)) {
isMain <- which(annot[,"Species"] %in% specPref$mainSpecies)
if(length(isMain) >0) annot[isMain,"SpecType"] <- "mainSpecies"
}
if(debug) {message(fxNa,"eSP4"); eSP4 <- list(specPref=specPref,annot=annot,useColumn=useColumn,suplInp=suplInp,mulP=mulP)}
rm(mulP)
}
annot
}
#' Get Matrix With UniProt Abbreviations For Selected Species As Well As Simple Names
#'
#' This (low-level) function allows accessing matrix with UniProt abbreviations for species frequently used in research.
#' This information may be used to harmonize species descriptions or extract species information out of protein-names.
#'
#' @return This function returns a 2-column matrix with species names
#' @seealso used eg in \code{readProtDiscovererFile}, \code{\link{readMaxQuantFile}}, \code{\link{readProlineFile}}, \code{\link{readFragpipeFile}}
#' @examples
#' .commonSpecies()
#' @export
.commonSpecies <- function() {
## matrix with UniProt abbreviations for common species
cbind(ext=c("_HUMAN","_MOUSE","_RAT","_CAVPO","_PIG","_BOVIN","_RABIT", "_SHEEP",
"_CHICK", "_CAEEL", "_DROME","_DANRE",
"_XENLA","_AMBME", "_ARATH","_SOLTU","_BETVV", "_YEAST", "_ECOLI","_MYCTU"),
name=c("Homo sapiens","Mus muscullus","Rattus norvegicus","Cavia porcellus","Sus scrofa","Bos taurus","Oryctolagus cuniculus","Ovis aries",
"Gallus gallus", "Caenorhabditis elegans","Drosophila melanogaster","Danio rerio",
"Xenopus laevis","Ambystoma mexicanum",
"Arabidopsis thaliana","Solanum tuberosum","Beta vulgaris", "Saccharomyces cerevisiae", "Escherichia coli", "Mycobacterium tuberculosis"),
simple=c("Human","Mouse","Rat","Pig","Guinea pigs", "Cow","Rabit", "Sheep",
"Chicken", "Celegans","Droso","Zebrafish",
"Frog","Axolotl", "Arabidopsis","Potato","Sugar beet", "Yeast","Ecoli","Mtuberculosis") )
}
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.