#' Pathway analysis via network smoothing (Pants)
#'
#' Pants algorithm to test if scores of features (i.e. analytes such as a gene, protein, or metabolite)
#' in a pathway or those connected to the pathway in an interaction network are greater than randomized ones.
#' Allows for testing group differences (\code{\link[ezlimma]{limma_contrasts}}) with \code{contrast.v} & \code{design};
#' correlation (\code{\link[ezlimma]{limma_cor}}) with \code{design}; or
#' mediation (\code{\link[ezlimma]{hitman}}) with \code{exposure} & \code{covariates}.
#'
#' @param Gmat Binary feature (e.g. gene) by pathway inclusion matrix, indicating which features are in which pathways.
#' @param phenotype Vector of sample characteristics (correlation: numeric; contrasts: character).
#' Should be same length as \code{ncol(object)}.
#' @param type Type of \pkg{ezlimma} analysis per feauture; must be one of"contrasts"
#' (\code{\link[ezlimma]{limma_contrasts}}), "correlation" (\code{\link[ezlimma]{limma_cor}}),
#' or "mediation" (\code{\link[ezlimma]{hitman}}). You can specify just the initial letter.
#' @param exposure Numeric vector or matrix of exposures. Ignored if \code{type!="mediation"}.
#' @param ker Laplacian kernel matrix representing the interaction network.
#' @param score_fcn Function that transforms the t-statistics from the contrasts into a non-negative value.
#' Its input must be a vector of same length as number of elements in \code{contrast.v} (usually one).
#' Its output must be a non-negative scalar. Ignored if \code{hitman} is \code{TRUE}.
#' @param annot.df Table of feature annotations that are appended to feature statistics.
#' @param ntop Number of top features that most impact a pathway to include.
#' @param nperm Number of sample permutations to evaluate significance of pathways.
#' @param ret.pwy.dfs Logical; return list of data frames written out to CSVs?
#' @param ret.null.mats Logical; return matrices with null distributions for features and pathways?
#' @param ncores Integer. If > 1, number of cores to use for parallel computing.
#' You can detect how many are available for your system using \code{\link[parallel]{detectCores}}.
#' @param seed Integer seed to set for reproducility.
#' @inheritParams ezlimma::limma_contrasts
#' @inheritParams ezlimma::limma_cor
#' @inheritParams ezlimma::roast_contrasts
#' @inheritParams ezlimma::hitman
#' @details Without \code{mediation}, \code{phenotype}'s are permuted, since this properly permutes the \code{object} to
#' \code{phenotype} mapping. \code{object} could be equivalently permuted.
#' With \code{mediation}, because \code{object} is tested for its association to both \code{phenotype}
#' and \code{exposure}, \code{colnames(object)} are permuted, which offers more available permutations.
#'
#' Scores for features in the kernel but not in the data are assigned a score of zero by default for sparsity.
#' Scores for features and pathways are compared to null scores, which are generated by permuting the columns of
#' \code{object} and rerunning the algorithm. These are the stats returned in \code{feature.stats}.
#'
#' For \code{\link[parallel]{makeCluster}}, the cluster \code{type} depends on the OS, which is tested in the body
#' of the function using \code{.Platform$OS.type}.
#'
#' If \code{!is.na(name)}, an Excel file with "_pants.xlsx" appended to \code{name} gets written out with links to CSVs
#' containing the statistics and annotation of features most affecting the pathway's score. The annotation (and possibly
#' other statistics) are from \code{annot.df}. Additionally, the CSVs contain whether each
#' feature is in the pathway, and an \code{impact} column describing the impact of each feature on the pathway's
#' score. Since a pathway's score is calculated in \code{pants}, \code{impact} uses the feature statistics calculated
#' in \code{pants} by comparing to permutation. The feature statistics from \pkg{ezlimma} and those from
#' \code{pants} are nearly identical, though; the main difference is that \code{pants} feature significances are limited
#' by the number of permutations, so they flatten near the extreme. The features with the largest magnitude impact score
#' are selected and can be visualized with \code{ezlimmaplot::plot_pwy}. These features may increase or decrease a
#' pathway's score.
#'
#' @return List of at least two data frames:
#' \describe{
#' \item{\code{pwy.stats}}{A data frame with columns
#' \describe{
#' \item{\code{nfeatures}}{number of features in the pathway.}
#' \item{\code{score}}{only returned if \code{ret.null.mats} is \code{TRUE};
#' pathway score (larger is more significant) to compare to \code{null.pwy.mat}}
#' \item{\code{z}}{pathway permutation z-score (larger is more significant)}
#' \item{\code{p}}{pathway permutation p-value}
#' \item{\code{FDR}}{pathway FDR calculated from p-values with \code{p.adjust(p, method="BH")}}
#' }}
#'
#' \item{\code{feature.stats}}{A data frame with columns
#' \describe{
#' \item{\code{score}}{without \code{mediation}, feature's score from applying \code{score_fcn} to moderated t-statistics;
#' or with \code{mediation}, parametric z-score from \code{hitman}'s mediation p-value.}
#' \item{\code{z}}{feature non-parametric z-score (larger is more significant) from comparing \code{score} vs.
#' this feature's scores in permutations (before smoothing)}
#' \item{\code{p}}{feature's non-parametric permutation p-value}
#' \item{\code{FDR}}{feature's non-parametric FDR from permutation \code{p}}
#' }}
#'
#' if \code{ret.pwy.dfs} is \code{TRUE}:
#' \item{\code{pwy.dfs}}{List of data frames written out to CSVs}
#'
#' And if \code{ret.null.mats} is \code{TRUE}:
#' \item{\code{null.feature.mat}}{Matrix with features as rows and permutations as columns, where each element represents
#' the score of that feature in that permutation}
#' \item{\code{null.pwy.mat}}{Matrix with pathways as rows and permutations as columns, where each element represents
#' the score of that pathway in that permutation}
#' \item{\code{sample.perms}}{Matrix with samples as rows and permutations as columns, where each element represents
#' the index of the sample simulated to represent the sample in the row in that permutation}
#' }
#' @examples
#' # A workflow is described in the vignette; instructions to view the vignette are in the README.
#' @importFrom utils methods
#' @importFrom zeallot %<-%
#' @export
pants <- function(object, Gmat, phenotype=NULL, type=c("contrasts", "correlation", "mediation"),
contrast.v=NULL, design=NULL, exposure=NULL, covariates=NULL,
ker=NULL, annot.df=NULL, ntop=25, score_fcn=abs, nperm=10^4-1,
ret.pwy.dfs=FALSE, ret.null.mats=FALSE, min.nfeats=3, ncores=1, name=NA, seed=0){
type <- match.arg(type)
if (is.null(ker)){
ker <- diag_kernel(object.rownames = rownames(object), Gmat.rownames = rownames(Gmat))
}
stopifnot(length(intersect(rownames(ker), rownames(object))) > 0, any(rownames(Gmat) %in% colnames(ker)),
colnames(object)==names(phenotype), is.null(annot.df) || any(rownames(annot.df) %in% rownames(object)),
is.numeric(ncores), is.numeric(min.nfeats), is.numeric(ntop))
c(Gmat, nfeats.per.pwy) %<-% subset_gmat(object=object, Gmat=Gmat, min.nfeats=min.nfeats)
if (type=="mediation"){
if (ncol(as.matrix(exposure))==1){
stopifnot(colnames(object)==names(exposure))
} else {
stopifnot(colnames(object)==rownames(exposure))
}
hm <- ezlimma::hitman(E=exposure, M=object, Y=phenotype, covariates=covariates)
# transform to one-sided z-score
score.v <- stats::qnorm(p=hm[rownames(object), "EMY.p"], lower.tail = FALSE)
} else {
score.v <- score_features(object=object, phenotype=phenotype, type=type, contrast.v=contrast.v,
design=design, score_fcn=score_fcn)
}
if (ncores > 1){
# PSOCK works without parallel::clusterExport
cl.type <- ifelse(.Platform$OS.type=="windows", "PSOCK", "FORK")
cl <- parallel::makeCluster(spec=ncores, type=cl.type)
}
set.seed(seed)
try({
if (type=="mediation"){
sample.perms <- ezpermutations(xx=colnames(object), nperm=nperm)
med.fun <- function(cn.perm){
# permute object
# must set permuted names to NULL st limma_contrasts doesn't complain that they clash with colnames(object)
object.tmp <- object[,cn.perm]
# to avoid names error in stopifnot
colnames(object.tmp) <- colnames(object)
# permuted hm result
hm.tmp <- ezlimma::hitman(E=exposure, M=object.tmp, Y=phenotype, covariates=covariates)
# return z-scores from permuted object
stats::qnorm(p=hm.tmp[rownames(object), "EMY.p"], lower.tail = FALSE)
}
score.mat <- sapply_by_ncores(ncores=ncores, X=sample.perms, FUN=med.fun, cl=cl)
} else {
# perms <- lapply(seq_len(nperm), function(i) sample.int(length(phenotype)))
sample.perms <- ezpermutations(xx=phenotype, nperm=nperm)
contr.fun <- function(ph.perm){
# must set permuted names to NULL st limma_contrasts doesn't complain thay they clash with colnames(object)
# pheno.tmp <- stats::setNames(ph.perm, nm=NULL)
score_features(object=object, phenotype=ph.perm, type=type, contrast.v=contrast.v,
design=design, score_fcn=score_fcn)
}
score.mat <- sapply_by_ncores(ncores=ncores, X=sample.perms, FUN=contr.fun, cl=cl)
}
}) # end try
if (ncores > 1) parallel::stopCluster(cl=cl)
dimnames(score.mat) <- list(rownames(object), paste0("perm", 1:ncol(score.mat)))
# could use zeallot, but would still need temporary score.mat.comb
# match_mats is fast
mm <- match_mats(score.mat = cbind(v=score.v, score.mat), ker=ker, Gmat=Gmat)
score.mat <- mm$score.mat[,-1]; score.v <- mm$score.mat[,1]; ker <- mm$ker; Gmat <- mm$Gmat
rm(mm) #to save memory
if (any(names(score.v) != rownames(ker) | rownames(ker) != rownames(Gmat))){
stop("Internal function match_mats failed.", call. = FALSE)
}
# feature p-values (for plotting)
# features in object & in kernel
feature.stats <- data.frame(score = score.v,
matrix(NA, nrow=length(score.v), ncol=3, dimnames=list(rownames(score.mat), c("z", "p", "FDR"))))
# need to coerce score.mat to matrix to prevent rowSums error
feature.stats[,c("z", "p")] <- p_ecdf(eval.v=score.v, score.mat = as.matrix(score.mat), alternative = "greater")
feature.stats[,"FDR"] <- stats::p.adjust(feature.stats[,"p"], method="BH")
# need to compare to pwys, sometimes runs out of memory
# use scores, not z-scores, since used to calc pwy.v/nfeats.per.pwy
pwy.v <- (score.v %*% ker %*% Gmat)[1,]
# null pwy scores
pwy.mat <- as.matrix(Matrix::t(Matrix::crossprod(score.mat, ker) %*% Gmat))
# feat.score.avg=pwy.v/nfeats.per.pwy is confusing and not helpful enough to include
pwy.stats <- data.frame(nfeatures=nfeats.per.pwy, score=pwy.v, z=NA, p=NA)
rownames(pwy.stats) <- colnames(Gmat)
pwy.stats[,c("z", "p")] <- p_ecdf(eval.v=pwy.v, score.mat=pwy.mat, alternative = "greater")
pwy.stats$FDR <- stats::p.adjust(pwy.stats$p, method="BH")
pwy.stats <- pwy.stats[order(pwy.stats$p, -pwy.stats$nfeatures),]
# return res
res <- list(pwy.stats=pwy.stats, feature.stats=feature.stats)
if (ret.null.mats){
res$null.feature.mat <- as.matrix(score.mat)
res$null.pwy.mat <- as.matrix(pwy.mat)
res$sample.perms <- simplify2array(sample.perms)
dimnames(res$sample.perms) <- list(colnames(object), dimnames(score.mat)[[2]])
} else {
# only include "score" if ret.null.mats
res$pwy.stats <- res$pwy.stats[,setdiff(colnames(res$pwy.stats), "score")]
}
# compute impact & write xlsx file with links
if (is.null(annot.df)){
feat.df <- feature.stats
} else {
feat.df <- data.frame(feature.stats, annot.df[rownames(feature.stats),])
}
if (!is.na(name)){
if (type=="mediation") name <- paste0(name, "_pants_hitman") else name <- paste0(name, "_pants")
}
wpx <- write_pants_xl(zscores=feature.stats[, "z", drop=FALSE], pwy.tab=res$pwy.stats, feat.tab=feat.df,
Gmat=Gmat, ker=ker, name=name, ntop=ntop)
if (ret.pwy.dfs) res <- c(res, pwy.dfs=list(wpx$pwy.csvs))
return(res)
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.