#' Perform Network Analysis
#'
#' @description
#' This function works for network analysis, which calculates
#' count and power of interaction pairs among all given clusters.
#'
#' @inheritParams InsideObjectInterCell
#' @param sel.clusters.X Defining one part of interacting clusters. The options can be
#' found from \code{\link{ListAllClusters}}. See details for help.
#' @param sel.clusters.Y Defining the other part of interacting clusters. The options can be
#' found from \code{\link{ListAllClusters}}. See details for help.
#' @param sel.exprs.change Options are 'Xup.Yup', 'Xup.Ydn', 'Xdn.Yup', 'Xdn.Ydn'.
#' It gives the corresponding expression change of every interacting gene pair.
#' @param sel.some.genes.X It gives the genes to be expected to be expressed in clusters given by parameter \code{sel.clusters.X}.
#' It can either be return by \code{FetchGeneOI}, or just \code{character} though losing some gene properties.
#' @param sel.some.genes.Y It gives the genes to be expected to be expressed in clusters given by parameter \code{sel.clusters.Y}.
#' It can be generated as paramter \code{sel.some.genes.X}.
#' @param sel.genes.option Options are 'intersect' or 'union'. 'intersect' strictly restricts gene pair to
#' have one gene partner in \code{sel.some.genes.X} and the other in \code{sel.some.genes.Y}. 'union' restricts
#' gene pair to have at least one gene either in \code{sel.some.genes.X} or \code{sel.some.genes.Y}.
#' @param sel.gene.pairs Directly specify the desired gene pairs. It should be given in standard table that is generated
#' by \code{\link{FormatCustomGenePairs}}. To note, it's strictly aligned to clusters, see details.
#' @param run.permutation It decides whether to permutation test to distinguish significant changed gene pairs with the rest.
#' If set TRUE, parameter \code{perm.expression}, \code{perm.pval.sides} and \code{perm.pval.cutoff} should be provided. If set FALSE,
#' no statistical test is used, which means all gene pairs will be passed to result.
#' @param perm.expression A list of permutations of gene expression. It should be generated by function \code{Tool.GenPermutation()}.
#' @param perm.pval.sides Default is 'two-sides'. All allowed options are 'one-side-large' or 'one-side-small' or 'two-sides'.
#' Explicit explanation are given in details.
#' @param perm.pval.cutoff The P value cutoff for
#' @param force.process It stops the program when no subset of genes are selected either by \code{sel.some.genes.X} and \code{sel.some.genes.Y}
#' or by \code{sel.gene.pairs}, which may take long time to process. To force process, set this to TRUE.
#' @param verbose If set TRUE, the progress bar will be shown.
#'
#' @details
#' This function performs network analysis and calculates:
#' \itemize{
#' \item count: count of interaction.
#' \item power: strength of interaction, which is calculated by formula: SUM(Power(pair[i~j])).
#' }
#'
#' \bold{sel.clusters.X} and \bold{sel.clusters.Y}:
#' Interactions are defined by pairs of interacting clusters, or say cluster groups.
#' For one interaction, e.g. cluster-Myeloid ~ cluster-T_cell, the fromer one will be
#' restricted to clusters given in \code{sel.clusters.X} and the latter one will be
#' retricted to clusters given in \code{sel.clusters.Y}.
#' The clusters given by \code{ListAllClusters} list all available clusters, and users can
#' manually pick some or all of them.
#'
#' \bold{sel.exprs.change}:
#' The *up means the gene is up-regulated, i.e. its LogFC (log fold change) > 0.
#' The *dn means the gene is down-regulated. Take 'Xup.Ydn' for example,
#' it gets to explore the interaction between cluster X and cluster Y by taking up-regulated genes from X and
#' down-regulated genes from Y.
#'
#' \bold{sel.gene.pairs}:
#' The gene pairs given in this parameter are strictly aligned to clusters given by
#' parameter \code{sel.clusters.X} and \code{sel.clusters.Y}. After standardizing by \code{FormatCustomGenePairs},
#' the gene pairs are given in 4 columns, and 2 of them are named 'inter.GeneName.A', 'inter.GeneName.B'.
#' The corresponding relation is that genes listed in 'inter.GeneName.A' are used to compare with genes expressed by clusters given in \code{sel.clusters.X}
#' and genes listed in 'inter.GeneName.B' are used to compare with genes expressed by clusters given in \code{sel.clusters.Y}.
#' For example, if the user gives C3~C3ar1 in \code{sel.gene.pairs}, Myeloid_cell and T_cell in \code{sel.clusters.X}, and
#' fibroblast and B_cell in \code{sel.clusters.Y}, then C3 will be tested in Myeloid_cell and T_cell but not fibroblast and B_cell,
#' and C3ar1 will be only tested in fibroblast and B_cell. If the user need C3~C3ar1 to be tested in the opposite way,
#' one more row C3ar1~C3 should be given in \code{sel.gene.pairs}.
#'
#' \bold{perm.pval.sides} and \bold{perm.pval.cutoff}:
#' The \code{perm.pval.sides} can be 'two-side', 'one-side-large' and 'one-side-small'.
#' Take option 'two-side' as example.
#' The null hypothesis is given as power of one gene pair from actual scRNA-seq is equal to it from cell label permutation.
#' Suppose one gene pair A~B, and cell clusters G1, G2, G3 exist.
#' The average expression of A and B in all cells are denoted as allavg(A) and allavg(B),
#' and the average expression of A and B in each cell clusters are denoted as E(G1, A), E(G1, B), E(G2, A), so on so forth.
#' The permutations of cell labels generate one list of new expression, and will be denoted like Perm(i, G1, A), i goes from 1 to permutation times.
#' To test whether A~B is significant between cluster G1 and G2, we calculate the power of this gene pair,
#' which is the multiply of expression, and thus
#' H0 is made as E(G1, A) * E(G2, B) - allavg(A) * allavg(B) = Perm(i, G1, A) * Perm(i, G2, B) - allavg(A) * allavg(B), i goes from 1 to permutation times.
#' By getting the times when abs(Equation-left) <= abs(Equation-right), we get the P value.
#' By comparing the P value with \code{perm.pval.cutoff}, we judge whether the gene pair A~B between G1 and G2 is significant.
#'
#' For 'one-side-large', the H0 is E(G1, A) * E(G2, B) - allavg(A) * allavg(B) > Perm(i, G1, A) * Perm(i, G2, B) - allavg(A) * allavg(B) .
#' P value is calculated as (Equation-left) <= (Equation-right).
#' For 'one-side-small', the H0 is E(G1, A) * E(G2, B) - allavg(A) * allavg(B) < Perm(i, G1, A) * Perm(i, G2, B) - allavg(A) * allavg(B) .
#' P value is calculated as (Equation-left) >= (Equation-right).
#'
#'
#' @return A \code{InterCell} object.
#'
#' @importFrom dplyr bind_rows left_join
#' @import progress
#' @importFrom pbapply pbapply
#' @importFrom future nbrOfWorkers
#' @importFrom future.apply future_apply
#'
#' @export
#'
AnalyzeInterInFullView <- function(
object,
sel.clusters.X = NULL,
sel.clusters.Y = NULL,
sel.exprs.change = c("Xup.Yup", "Xup.Ydn", "Xdn.Yup", "Xdn.Ydn"),
sel.some.genes.X = NULL,
sel.some.genes.Y = NULL,
sel.genes.option = "intersect",
sel.gene.pairs = NULL,
run.permutation = FALSE,
perm.expression = NULL,
perm.pval.sides = "two-sides",
perm.pval.cutoff = 0.05, # two-sided p val
force.process = FALSE,
verbose = TRUE
) {
kGenesSplit <- getGenePairSplit(object)
# check sel.clusters.*
inside.check.sel.clusters <- function(allowed.clusters, sel.clusters, suffix = c("X", "Y")) {
if (is.null(sel.clusters)) {
sel.clusters <- allowed.clusters
sel.clusters <- sel.clusters[order(sel.clusters)]
} else {
not.valid.clusters <- setdiff(sel.clusters, allowed.clusters)
if (length(not.valid.clusters) > 0) {
warning("Given Undefined clusters in `sel.clusters.", suffix, "`: ", paste0(not.valid.clusters, collapse = ", "), ". ")
}
sel.clusters <- intersect(sel.clusters, allowed.clusters)
}
return(unique(sel.clusters))
}
# check
all.clusters <- unique(object@fgenes$cluster)
sel.clusters.X <- inside.check.sel.clusters(all.clusters, sel.clusters.X, suffix = "X")
sel.clusters.Y <- inside.check.sel.clusters(all.clusters, sel.clusters.Y, suffix = "Y")
# check selection on gene pairs or some genes
if (!is.null(sel.gene.pairs)) {
# check data.frame
if (!is.data.frame(sel.gene.pairs)) {
stop("User-selected gene pairs should be given in table (data.frame format)!")
}
# check if it is standardized data.frame
std.colnames.1t4 <- c("inter.GeneID.A", "inter.GeneID.B", "inter.GeneName.A", "inter.GeneName.B")
if (!identical(colnames(sel.gene.pairs)[1:4], std.colnames.1t4)) {
stop("Non-standardized table of gene pairs are given, please use function `FormatCustomGenePairs` first to get standardized one!")
}
}
sel.property.X <- sel.property.Y <- NULL
if (!is.null(sel.some.genes.X)) {
if (is.list(sel.some.genes.X) && all(c("genes", "property") %in% names(sel.some.genes.X))) {
sel.property.X <- sel.some.genes.X$property
sel.some.genes.X <- sel.some.genes.X$genes
} else {
if (!is.character(sel.some.genes.X)) {
sel.some.genes.X <- as.character(sel.some.genes.X)
}
}
}
if (!is.null(sel.some.genes.Y)) {
if (is.list(sel.some.genes.Y) && all(c("genes", "property") %in% names(sel.some.genes.Y))) {
sel.property.Y <- sel.some.genes.Y$property
sel.some.genes.Y <- sel.some.genes.Y$genes
} else {
if (!is.character(sel.some.genes.Y)) {
sel.some.genes.Y <- as.character(sel.some.genes.Y)
}
}
}
# gene pairs if given, genes given will be overwritten
if (is.null(sel.gene.pairs)) {
if (is.null(sel.some.genes.X) && is.null(sel.some.genes.Y)) {
if (force.process == FALSE) {
stop("No restriction applied on neither genes or gene pairs.",
" Program will get long time to be finished.",
" If insisting, set `force.process` to be TRUE.")
}
} else {
if (is.null(sel.some.genes.X) || is.null(sel.some.genes.Y)) {
print("Processing with partial selection on genes.")
}
}
} else {
print(paste("Processing with given", nrow(sel.gene.pairs), "gene pairs."))
if (!is.null(sel.some.genes.X) || !is.null(sel.some.genes.Y)) {
warning("Given selection on genes is overrided by given selection on gene pairs!")
}
}
# check sel.genes.option
std.sel.genes.option <- c("intersect", "union")
sel.genes.func <- base::intersect # in default using 'intersect'
sel.genes.option <- sel.genes.option[1] # only length 1 allowed
valid.option <- intersect(sel.genes.option, std.sel.genes.option)
sel.genes.func <- switch(valid.option,
"intersect" = base::intersect,
"union" = base::union,
warning("Given invalid user option on selecting genes: ", paste0(sel.genes.option, collapse = ", "), ". ",
"Using default 'intersect' option!")
)
# check exprs change
not.valid.exprs.c <- setdiff(sel.exprs.change, kexprs.change)
if (length(not.valid.exprs.c) > 0) {
warning("Given invalid options for expression change: ", paste0(not.valid.exprs.c, collapse = ", ", ". "))
}
sel.exprs.change <- intersect(sel.exprs.change, kexprs.change)
if (length(sel.exprs.change) == 0) {
stop("No expression change selection is given in parameter `sel.exprs.change`. Options are 'Xup.Yup', 'Xup.Ydn', 'Xdn.Yup', 'Xdn.Ydn'.")
}
# check if permutation test is used
actual.exprs <- NA
if (run.permutation == TRUE) {
# check perm.pval.sides
allowed.sides <- c("two-sides", "one-side-large", "one-side-small")
if (!perm.pval.sides %in% allowed.sides) {
stop("Invalid parameter 'perm.pval.sides', only 'two-sides', 'one-side-large' and 'one-side-small' are allowed.")
}
# check permutation
if (is.null(perm.expression)) {
stop("Attempt to run with permutation but data not given. Please give it in parameter 'perm.expression'.")
}
if (!is.list(perm.expression) || (is.list(perm.expression) && !all(c("actual", "perm") %in% names(perm.expression))) ) {
stop("Given invalid permutation data, please use function 'Tool.GenPermutation' to generate it.")
}
# split perm and actual
actual.exprs <- perm.expression$actual
perm.expression <- perm.expression$perm
# check if all result are unified (just the rows and columns for simiple checking)
tmp.simple.check.p <- t(vapply(perm.expression, function(x) { c(nrow(x), ncol(x)) }, FUN.VALUE = integer(2)))
if (!all(tmp.simple.check.p[, 1] == tmp.simple.check.p[1, 1]) ||
!all(tmp.simple.check.p[, 2] == tmp.simple.check.p[1, 2])) {
stop("Non-unified permutation matrices are given! Please recheck data format.")
}
# check if all involved genes are given. After simple checking, here think every permutation is the same
tmp.gene.p.check <- unique(object@fgenes$gene) %in% rownames(perm.expression[[1]])
if (all(!tmp.gene.p.check)) {
stop("No valid genes are given in permutation matrix! If you give the wrong element?")
}
if (!all(tmp.gene.p.check)) {
warning("Those marker genes are unexpectly not detected in permutation matrix: ",
paste0(setdiff(unique(object@fgenes$gene), rownames(perm.expression[[1]])), collapse = ", "), ".")
}
}
## analyze interaction network
interact.pairs.all <- list(
list.clusters = list(x.axis = sel.clusters.X, y.axis = sel.clusters.Y),
data.allpairs = list(),
anno.allpairs = list(location.A = list(), location.B = list(), type.A = list(), type.B = list()),
name.allpairs = character(),
cnt.allpairs = integer(),
strength.allpairs = single()
)
## perform network analysis
# if prog.bar shown
if (verbose == TRUE) {
prog.bar.fv <- progress::progress_bar$new(total = length(sel.clusters.X) * length(sel.clusters.Y))
prog.bar.fv$tick(0)
}
# the used gene pairs
used.proc.pairs.db <- object@database@pairs.db
# pre-slim for used database
if (!is.null(sel.gene.pairs)) {
tmp.ref.gpairs <- paste0(used.proc.pairs.db$inter.GeneName.A, used.proc.pairs.db$inter.GeneName.B, sep = kGenesSplit)
#
tmp.used.conv <- paste0(sel.gene.pairs$inter.GeneName.A, sel.gene.pairs$inter.GeneName.B, sep = kGenesSplit)
tmp.used.rev <- paste0(sel.gene.pairs$inter.GeneName.B, sel.gene.pairs$inter.GeneName.A, sep = kGenesSplit)
# set the pairs database
used.proc.pairs.db <- sel.gene.pairs[union(which(tmp.used.conv %in% tmp.ref.gpairs), which(tmp.used.rev %in% tmp.ref.gpairs)), ]
} else { # slim for some genes
tmp.used.genes <- c(sel.some.genes.X, sel.some.genes.Y)
if (length(tmp.used.genes) > 0) {
used.proc.pairs.db <- used.proc.pairs.db[union(which(used.proc.pairs.db$inter.GeneName.A %in% tmp.used.genes),
which(used.proc.pairs.db$inter.GeneName.B %in% tmp.used.genes)), ]
}
}
# the process
for (ix in sel.clusters.X) {
for (jy in sel.clusters.Y) {
interact.name <- paste0(ix, object@tool.vars$cluster.split, jy)
fgenes.t.X <- object@fgenes[which(object@fgenes$cluster == ix), ]
fgenes.t.Y <- object@fgenes[which(object@fgenes$cluster == jy), ]
genes.X <- fgenes.t.X$gene
genes.Y <- fgenes.t.Y$gene
# generate all possible gene pairs
ref.GeneA <- used.proc.pairs.db$inter.GeneName.A
ref.GeneB <- used.proc.pairs.db$inter.GeneName.B
# conv
inds.gpairs.conv <- intersect(which(ref.GeneA %in% genes.X), which(ref.GeneB %in% genes.Y))
gpairs.result <- used.proc.pairs.db[inds.gpairs.conv, ]
if (is.null(sel.gene.pairs)) { # using reference database, then rev. ones should be collected
# rev
inds.gpairs.rev <- intersect(which(ref.GeneB %in% genes.X), which(ref.GeneA %in% genes.Y))
cols.rev <- ReverseOddEvenCols(std.index.colname.end.dual)
gpairs.rev <- used.proc.pairs.db[inds.gpairs.rev, c(cols.rev, setdiff(seq_len(ncol(used.proc.pairs.db)), cols.rev))]
colnames(gpairs.rev) <- colnames(used.proc.pairs.db)
# merge result
gpairs.result <- rbind(gpairs.result, gpairs.rev)
}
# prioritized usage of selection to process
if (is.null(sel.gene.pairs)) { # use selected genes
inds.sel.X <- inds.sel.Y <- seq_len(nrow(gpairs.result))
if (!is.null(sel.some.genes.X)) {
inds.sel.X <- which(gpairs.result$inter.GeneName.A %in% sel.some.genes.X)
}
if (!is.null(sel.some.genes.Y)) {
inds.sel.Y <- which(gpairs.result$inter.GeneName.B %in% sel.some.genes.Y)
}
# use select gene merge option and get result
gpairs.result <- gpairs.result[sel.genes.func(inds.sel.X, inds.sel.Y), ]
}
# add logfc, pval data from fgenes, as well as Exprs from fgenes
tmp.sel.cols <- setdiff(object@misc$musthave.colnames, "cluster")
tmp.tomodf.cnt <- length(tmp.sel.cols) - 2 # as substraction below starts from 0, -2 / one for gene, one for index shift to start from 0
if (tmp.tomodf.cnt < 0) {
stop("Unexpected must have columns not matched!")
}
gpairs.result <- left_join(gpairs.result, fgenes.t.X[, tmp.sel.cols], by = c("inter.GeneName.A" = "gene"))
tmp.change.cols <- ncol(gpairs.result) - (tmp.tomodf.cnt):0
colnames(gpairs.result)[tmp.change.cols] <- paste("inter", colnames(gpairs.result)[tmp.change.cols], "A", sep = ".")
gpairs.result <- left_join(gpairs.result, fgenes.t.Y[, tmp.sel.cols], by = c("inter.GeneName.B" = "gene"))
tmp.change.cols <- ncol(gpairs.result) - (tmp.tomodf.cnt):0
colnames(gpairs.result)[tmp.change.cols] <- paste("inter", colnames(gpairs.result)[tmp.change.cols], "B", sep = ".")
# reorder the result table
if ("Exprs" %in% object@misc$musthave.colnames) {
tmp.paired.order.cols <- paste("inter", rep(c("GeneID", "GeneName", "LogFC", "PVal", "Exprs"), each = 2), c("A", "B"), sep = ".")
} else {
tmp.paired.order.cols <- paste("inter", rep(c("GeneID", "GeneName", "LogFC", "PVal"), each = 2), c("A", "B"), sep = ".")
}
gpairs.result <- gpairs.result[, c(tmp.paired.order.cols, setdiff(colnames(gpairs.result), tmp.paired.order.cols))]
# check exprs change
inds.subg.logfc <- integer()
inds.A.up <- which(gpairs.result$inter.LogFC.A > 0)
inds.A.dn <- which(gpairs.result$inter.LogFC.A <= 0)
inds.B.up <- which(gpairs.result$inter.LogFC.B > 0)
inds.B.dn <- which(gpairs.result$inter.LogFC.B <= 0)
if ("Xup.Yup" %in% sel.exprs.change) {
inds.subg.logfc <- c(inds.subg.logfc, intersect(inds.A.up, inds.B.up))
}
if ("Xup.Ydn" %in% sel.exprs.change) {
inds.subg.logfc <- c(inds.subg.logfc, intersect(inds.A.up, inds.B.dn))
}
if ("Xdn.Yup" %in% sel.exprs.change) {
inds.subg.logfc <- c(inds.subg.logfc, intersect(inds.A.dn, inds.B.up))
}
if ("Xdn.Ydn" %in% sel.exprs.change) {
inds.subg.logfc <- c(inds.subg.logfc, intersect(inds.A.dn, inds.B.dn))
}
gpairs.result <- gpairs.result[inds.subg.logfc, ]
# further unique result
gpairs.result <- DoPartUnique(gpairs.result, c(1,2))
# get result for other attributes
inside.property.apply <- function(property, data.location, data.type) {
if (is.null(property)) {
return(list(location = data.location, type = data.type))
}
if (!is.null(property$location) && length(property$location) > 0) {
data.location <- subset(data.location, GO.Term.target %in% property$location)
}
if (!is.null(property$location.score && length(property$location.score) > 0)) {
data.location <- subset(data.location, score %in% property$location.score)
}
if (!is.null(property$type) && length(property$type) > 0) {
data.type <- subset(data.type, Keyword.Name %in% property$type)
}
return(list(location = data.location, type = data.type))
}
# location & type for A
tmp.property.A <- inside.property.apply(sel.property.X,
object@database@anno.location.db[which(object@database@anno.location.db$GeneID %in% gpairs.result$inter.GeneID.A), ],
object@database@anno.type.db[which(object@database@anno.type.db$GeneID %in% gpairs.result$inter.GeneID.A), ])
# location & type for B
tmp.property.B <- inside.property.apply(sel.property.Y,
object@database@anno.location.db[which(object@database@anno.location.db$GeneID %in% gpairs.result$inter.GeneID.B), ],
object@database@anno.type.db[which(object@database@anno.type.db$GeneID %in% gpairs.result$inter.GeneID.B), ])
# add power and pval in data
if (run.permutation == TRUE) {
tmp.formula.exprs <- object@formulae[[("TG.EXPRS")]]
gpairs.result[, "inter.Power"] <- tmp.formula.exprs(gpairs.result[, "inter.Exprs.A"], gpairs.result[, "inter.Exprs.B"])
gpairs.result[, "inter.Confidence"] <- rep(1, times = nrow(gpairs.result)) # wait for all done and merge add p value
gpairs.result[, "inter.ToCheck.A"] <- rep(ix, times = nrow(gpairs.result))
gpairs.result[, "inter.ToCheck.B"] <- rep(jy, times = nrow(gpairs.result))
} else {
tmp.formula.to.use <- object@formulae[c("TG.LOGFC", "TG.PVAL")]
names(tmp.formula.to.use) <- c("LogFC", "PVal")
gpairs.result[, "inter.Power"] <- tmp.formula.to.use[["LogFC"]](gpairs.result[, "inter.LogFC.A"], gpairs.result[, "inter.LogFC.B"])
gpairs.result[, "inter.Confidence"] <- tmp.formula.to.use[["PVal"]](gpairs.result[, "inter.PVal.A"], gpairs.result[, "inter.PVal.B"])
# save result when not use permutation
interact.pairs.all$cnt.allpairs <- c(interact.pairs.all$cnt.allpairs, nrow(gpairs.result))
interact.pairs.all$strength.allpairs <- c(interact.pairs.all$strength.allpairs, sum(gpairs.result[, "inter.Power"]))
}
# get result saved
interact.pairs.all$data.allpairs[[interact.name]] <- gpairs.result
interact.pairs.all$anno.allpairs$location.A[[interact.name]] <- tmp.property.A$location
interact.pairs.all$anno.allpairs$location.B[[interact.name]] <- tmp.property.B$location
interact.pairs.all$anno.allpairs$type.A[[interact.name]] <- tmp.property.A$type
interact.pairs.all$anno.allpairs$type.B[[interact.name]] <- tmp.property.B$type
interact.pairs.all$name.allpairs <- c(interact.pairs.all$name.allpairs, interact.name)
if (verbose == TRUE) {
prog.bar.fv$tick()
}
}
}
# if use permutation, here add pval
if (run.permutation == TRUE) {
tmp.all.result <- bind_rows(interact.pairs.all$data.allpairs)
tmp.inv.ga <- unique(tmp.all.result[, c("inter.GeneName.A", "inter.ToCheck.A")])
tmp.inv.gb <- unique(tmp.all.result[, c("inter.GeneName.B", "inter.ToCheck.B")])
colnames(tmp.inv.ga) <- colnames(tmp.inv.gb) <- c("gene", "cluster")
tmp.involved.gdf <- unique(rbind(tmp.inv.ga, tmp.inv.gb))
# slim perm result
perm.expression <- vapply(perm.expression, inv.gdf = tmp.involved.gdf,
function(mat, inv.gdf) {
as.numeric(mat[cbind(match(inv.gdf$gene, rownames(mat)), match(inv.gdf$cluster, colnames(mat)))])
}, FUN.VALUE = double(nrow(tmp.involved.gdf)))
# get permutation result
tmp.get.perm.res.func <- function(x, perm.expression, actual.exprs, inv.gdf, perm.pval.sides) {
# actual
tmp.avg.it <- actual.exprs$allavg[which(names(actual.exprs$allavg) == x["inter.GeneName.A"])] *
actual.exprs$allavg[which(names(actual.exprs$allavg) == x["inter.GeneName.B"])]
# perm
inds.ga <- intersect(which(inv.gdf$gene == x["inter.GeneName.A"]), which(inv.gdf$cluster == x["inter.ToCheck.A"]))
inds.gb <- intersect(which(inv.gdf$gene == x["inter.GeneName.B"]), which(inv.gdf$cluster == x["inter.ToCheck.B"]))
nulldist.power <- perm.expression[inds.ga, ] * perm.expression[inds.gb, ]
nulldist.power <- nulldist.power[order(nulldist.power, decreasing = TRUE)]
nulldist.power <- nulldist.power - tmp.avg.it
# return P value
tmp.pval <- 1
if (perm.pval.sides == "two-sides") {
tmp.pval <- sum(abs(as.numeric(x["inter.Power"]) - tmp.avg.it) <= abs(nulldist.power - tmp.avg.it)) / length(nulldist.power)
}
if (perm.pval.sides == "one-side-large") {
tmp.pval <- sum(as.numeric(x["inter.Power"]) - tmp.avg.it <= nulldist.power - tmp.avg.it) / length(nulldist.power)
}
if (perm.pval.sides == "one-side-small") {
tmp.pval <- sum(as.numeric(x["inter.Power"]) - tmp.avg.it >= nulldist.power - tmp.avg.it) / length(nulldist.power)
}
tmp.pval
}
if (nbrOfWorkers() == 1) {
tmp.all.result[, "inter.Confidence"] <- as.numeric(pbapply(tmp.all.result, MARGIN = 1,
perm.expression = perm.expression, actual.exprs = actual.exprs,
inv.gdf = tmp.involved.gdf, perm.pval.sides = perm.pval.sides,
FUN = tmp.get.perm.res.func
))
} else {
tmp.all.result[, "inter.Confidence"] <- as.numeric(future_apply(tmp.all.result, MARGIN = 1,
perm.expression = perm.expression, actual.exprs = actual.exprs,
inv.gdf = tmp.involved.gdf, perm.pval.sides = perm.pval.sides,
FUN = tmp.get.perm.res.func
))
}
# truncate data by pval
tmp.all.result <- subset(tmp.all.result, inter.Confidence < perm.pval.cutoff)
# redist and fetch result
tmp.all.result[, "inter.Distr.interact"] <- paste(tmp.all.result[, "inter.ToCheck.A"], tmp.all.result[, "inter.ToCheck.B"], sep = object@tool.vars$cluster.split)
tmp.allinters.names <- paste(rep(sel.clusters.X, each = length(sel.clusters.Y)),
rep(sel.clusters.Y, times = length(sel.clusters.X)), sep = object@tool.vars$cluster.split)
for (i.inter in tmp.allinters.names) {
tmp.df <- subset(tmp.all.result, inter.Distr.interact == i.inter)
interact.pairs.all$data.allpairs[[i.inter]] <- tmp.df[, setdiff(colnames(tmp.df), c("inter.ToCheck.A", "inter.ToCheck.B", "inter.Distr.interact"))]
# location
interact.pairs.all$anno.allpairs$location.A[[i.inter]] <- interact.pairs.all$anno.allpairs$location.A[[i.inter]][which(interact.pairs.all$anno.allpairs$location.A[[i.inter]]$GeneID %in% tmp.df$inter.GeneID.A), ]
interact.pairs.all$anno.allpairs$location.B[[i.inter]] <- interact.pairs.all$anno.allpairs$location.B[[i.inter]][which(interact.pairs.all$anno.allpairs$location.B[[i.inter]]$GeneID %in% tmp.df$inter.GeneID.B), ]
# type
interact.pairs.all$anno.allpairs$type.A[[i.inter]] <- interact.pairs.all$anno.allpairs$type.A[[i.inter]][which(interact.pairs.all$anno.allpairs$type.A[[i.inter]]$GeneID %in% tmp.df$inter.GeneID.A), ]
interact.pairs.all$anno.allpairs$type.B[[i.inter]] <- interact.pairs.all$anno.allpairs$type.B[[i.inter]][which(interact.pairs.all$anno.allpairs$type.B[[i.inter]]$GeneID %in% tmp.df$inter.GeneID.B), ]
# strength and count
interact.pairs.all$cnt.allpairs <- c(interact.pairs.all$cnt.allpairs, nrow(tmp.df))
interact.pairs.all$strength.allpairs <- c(interact.pairs.all$strength.allpairs, sum(tmp.df[, "inter.Power"]))
}
}
# result
object <- setFullViewResult(object, interact.pairs.all)
return(object)
}
#' Get Result for Network Analysis
#'
#' @description
#' This function summarizes the result of network analysis in graph and table, and gives clues to
#' most active intercellular communications.
#'
#' @inheritParams InsideObjectInterCell
#' @param show.clusters.in.x Clusters chosen to show in x-axis, corresponding to paramter \code{sel.clusters.X} in \code{\link{AnalyzeInterInFullView}}.
#' @param show.clusters.in.y Clusters chosen to show in y-axis, corresponding to paramter \code{sel.clusters.Y} in \code{\link{AnalyzeInterInFullView}}.
#' @param sel.cluster.group.method Options are 'all', 'diagonal' and 'diagonal-2'. It controls
#' 'all' is recommended for most cases, and
#' 'diagonal*' is recommended when same restrictions are applied to clusters in X and Y.
#' @param power.apply.func The function used to modify the power value, usually \code{log1p}, etc.
#' @param cnt.apply.func The function used to modify the count value, usually \code{sqrt} to avoid extremes problem.
#' @param plot.power.range This will extend the plotting range beyond actual value range for power.
#' @param plot.cnt.range This will extend the plotting range beyond actual value range for count.
#' @param nodes.size.range 2 numbers that specifies the range of sizes of the nodes shown in the graph.
#' @param nodes.colour.seq Given colours will be used to generate colour gradient for plotting.
#' @param nodes.colour.value.seq Numeric. If set NULL, colours given in \code{nodes.colour.seq} will be evenly placed.
#' Otherwise, numeric values with the same length of \code{nodes.colour.seq} should be given to specify the positions corresponding to each colour.
#' See parameter \code{values} in \code{ggplot2::scale_colour_gradientn} for details.
#' @param plot.axis.x.name X-axis name when plotting.
#' @param plot.axis.y.name Y-axis name when plotting.
#'
#'
#' @return List. Use \code{Tool.ShowPlot()} to see the \bold{plot}, \code{Tool.WriteTables()} to save the result \bold{table} in .csv files.
#' \itemize{
#' \item plot: the object of \pkg{ggplot2}.
#' \item table: a list of \code{data.frame}.
#' }
#'
#' @import ggplot2
#'
#' @export
#'
GetResultFullView <- function(
object,
show.clusters.in.x = NULL,
show.clusters.in.y = NULL,
sel.cluster.group.method = "all",
power.apply.func = NULL,
cnt.apply.func = NULL,
plot.power.range = NULL,
plot.cnt.range = NULL,
nodes.size.range = c(1, 6),
nodes.colour.seq = c("#00809D", "#EEEEEE", "#C30000"),
nodes.colour.value.seq = c(0.0, 0.5, 1.0),
plot.axis.x.name = "clusters-x",
plot.axis.y.name = "clusters-y"
) {
interact.pairs.acted <- getFullViewResult(object)
kClustersSplit <- getClusterSplit(object)
# pre-check parameter
if (!is.null(power.apply.func) && !is.function(power.apply.func)) {
stop("Power function is not given properly. Please give it one function.")
}
if (!is.null(cnt.apply.func) && !is.function(cnt.apply.func)) {
stop("Count function is not given properly. Please give it one function.")
}
# process
# cluster selection
fac.clusters <- interact.pairs.acted$list.clusters # all clusters related
fac.x.clusters <- fac.clusters$x.axis
fac.y.clusters <- fac.clusters$y.axis
if (!is.null(show.clusters.in.x) && length(show.clusters.in.x) != 0) { # select part of clusters to be shown in x-axis
fac.x.clusters <- intersect(show.clusters.in.x, fac.x.clusters)
print(paste0("Reading clusters shown in X-axis: ", paste0(fac.x.clusters, collapse = ", "), "."))
}
if (!is.null(show.clusters.in.y) && length(show.clusters.in.y) != 0) { # select part of clusters to be shown in y-axis
fac.y.clusters <- intersect(show.clusters.in.y, fac.y.clusters)
print(paste0("Reading clusters shown in Y-axis: ", paste0(fac.y.clusters, collapse = ", "), "."))
}
if (length(fac.x.clusters) < 1 || length(fac.y.clusters) < 1) { # check
stop("Error: X-Y plot needs at least one item in each axis!")
}
### data process for selected part of data
inside.diagonal.selection <- function(vec.a, vec.b, identical.rm = FALSE) {
if (sum(levels(factor(vec.a)) != levels(factor((vec.b)))) != 0) {
stop("Diagonal method only allows clusters in x and y to be of same value range!")
}
vec.b <- rev(vec.b[match(vec.a, vec.b)]) # re-order to be the same order
res.va <- res.vb <- NULL
for (i in seq_along(vec.a)) {
tmp.len <- ifelse(identical.rm == FALSE, length(vec.a) - i + 1, length(vec.a) - i)
res.va <- c(res.va, rep(vec.a[i], times = tmp.len))
res.vb <- c(res.vb, vec.b[seq_len(tmp.len)])
}
return(list(xsel = res.va, ysel = res.vb))
}
select.cluster.groups <- switch(sel.cluster.group.method,
"all" = {
col.x.data <- rep(fac.x.clusters, each = length(fac.y.clusters))
col.y.data <- rep(fac.y.clusters, times = length(fac.x.clusters))
list(xsel = col.x.data, ysel = col.y.data)
},
"diagonal" = inside.diagonal.selection(fac.x.clusters, fac.y.clusters),
"diagonal-2" = inside.diagonal.selection(fac.x.clusters, fac.y.clusters, identical.rm = TRUE),
stop("Undefined method for selecting cluster groups. Please re-check the given parameter: sel.cluster.group.method")
)
# select
col.x.data <- select.cluster.groups$xsel
col.y.data <- select.cluster.groups$ysel
names.part.data <- paste(col.x.data, col.y.data, sep = kClustersSplit)
ind.names.m <- match(names.part.data, interact.pairs.acted$name.allpairs)
# cnt
cnt.xy.data <- as.integer(interact.pairs.acted$cnt.allpairs)
cnt.part.data <- cnt.xy.data[ind.names.m]
if (!is.null(cnt.apply.func)) {
cnt.part.data <- cnt.apply.func(cnt.part.data)
}
# power
power.xy.data <- as.numeric(interact.pairs.acted$strength.allpairs)
power.part.data <- power.xy.data[ind.names.m]
if (!is.null(power.apply.func)) {
power.part.data <- power.apply.func(power.part.data)
}
# plot data preparation
pairs.plot.db <- data.frame(
pair.name = names.part.data,
x = col.x.data,
y = col.y.data,
cnt = cnt.part.data,
power = power.part.data,
stringsAsFactors = FALSE
)
# check if it needs x-axis text rotation
if.need.x.axis.rotate <- FALSE
x.axis.text.len <- sapply(pairs.plot.db$x, function(x) {
nchar(as.character(x))
})
if (max(x.axis.text.len) > 4) {
if.need.x.axis.rotate <- TRUE
}
# limits range modification function
inside.limits.adj.func <- function(user.limits.adj) {
force(user.limits.adj)
function(limits) {
res.limits <- limits
if (!is.null(user.limits.adj)) {
res.limits <- user.limits.adj
}
return(res.limits)
}
}
# plot process
plot.res <- ggplot(pairs.plot.db, aes(x, y))
plot.res <- plot.res + labs(x = plot.axis.x.name, y = plot.axis.y.name)
plot.res <- plot.res + geom_point(aes(size = cnt, colour = power)) +
scale_x_discrete(limits = unique(col.x.data), breaks = unique(col.x.data)) +
scale_y_discrete(limits = unique(col.y.data), breaks = unique(col.y.data)) +
scale_size(name = "Count", range = nodes.size.range,
limits = inside.limits.adj.func(plot.cnt.range)) +
scale_colour_gradientn(name = "Power", colours = nodes.colour.seq, values = nodes.colour.value.seq,
limits = inside.limits.adj.func(plot.power.range), na.value = NA)
plot.res <- plot.res + theme_classic() # change theme
# rotate labels in x-axis when needed
if (if.need.x.axis.rotate) {
plot.res <- plot.res + theme(axis.text.x = element_text(angle = 45, hjust = 1))
}
## construct result tables
# table raw.res
res.raw.table <- data.frame(
pair.name = pairs.plot.db$pair.name,
x = pairs.plot.db$x,
y = pairs.plot.db$y,
cnt = pairs.plot.db$cnt,
power = pairs.plot.db$power,
stringsAsFactors = FALSE
)
# table cnt & power
len.row <- length(fac.y.clusters)
len.col <- length(fac.x.clusters)
res.cnt.table <- matrix(data = c(0), nrow = len.row, ncol = len.col)
res.power.table <- matrix(data = c(0.0), nrow = len.row, ncol = len.col)
for (i in 1:len.col) {
for (j in 1:len.row) {
this.index <- len.row * (i - 1) + j
res.cnt.table[j, i] <- pairs.plot.db$cnt[this.index]
res.power.table[j, i] <- round(pairs.plot.db$power[this.index], 2)
}
}
# set rownames & colnames
rownames(res.cnt.table) <- as.character(fac.y.clusters)
colnames(res.cnt.table) <- as.character(fac.x.clusters)
rownames(res.power.table) <- as.character(fac.y.clusters)
colnames(res.power.table) <- as.character(fac.x.clusters)
#end# return
list(plot = plot.res, table = list(raw.res.table = res.raw.table, cnt.table = res.cnt.table, power.table = res.power.table))
}
#WeightRidges <- function(
# object,
# target.cluster,
# target.is.sender = FALSE
#) {
# allinteracts <- getFullViewResult(object)
# kClustersSplit <- getClusterSplit(object)
# kGenesSplit <- getGenePairSplit(object)
#
# if (target.is.sender) {
# rel.cluster.groups <- grep(paste0("^", target.cluster), allinteracts$name.allpairs, value = TRUE)
# rel.tg.cluster <- unlist(strsplit(rel.cluster.groups, split = paste0(target.cluster, kClustersSplit)))
# } else {
# rel.cluster.groups <- grep(paste0(target.cluster, "$"), allinteracts$name.allpairs, value = TRUE)
# rel.tg.cluster <- unlist(strsplit(rel.cluster.groups, split = paste0(kClustersSplit, target.cluster)))
# }
# rel.tg.cluster <- setdiff(rel.tg.cluster, "")
#
# ridge.plot.data <- bind_rows(lapply(seq_along(rel.cluster.groups),
# ref.data = allinteracts$data.allpairs, cluster.split = kClustersSplit,
# rel.cluster.groups = rel.cluster.groups, rel.tg.cluster = rel.tg.cluster,
# function(x, ref.data, cluster.split, rel.cluster.groups, rel.tg.cluster) {
# tmp.df <- ref.data[[x]]
# tmp.df[, "Plot.tg.cluster"] <- rep(rel.tg.cluster[x], times = nrow(tmp.df))
# tmp.df
# }))
# ridge.plot.data[, "packed.inters"] <- paste(ridge.plot.data$inter.GeneName.A, ridge.plot.data$inter.GeneName.B, sep = kGenesSplit)
# ridge.plot.data <- ridge.plot.data[, c("packed.inters", "Plot.tg.cluster", "inter.Power")]
# ggplot(ridge.plot.data) +
# geom_density_ridges(aes(x = log(inter.Power), y = Plot.tg.cluster)) +
# theme_ridges() +
# theme(axis.text.x = element_text(angle = 90))
#}
#
#
#WeigthFlow <- function(
# object
#) {
#
#}
# %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
# Other functions
# %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
#' List Cluster Groups
#'
#' This function is to help on listing all or some of cluster groups (i.e. interacting clusters).
#'
#' @inheritParams InsideObjectInterCell
#' @param use.former It controls whether to list cluster groups relating to one or some clusters(.X) given in
#' parameter \code{sel.clusters.X} of \code{link{AnalyzeInterInFullView}}.
#' @param cluster.former The name of clusters, which is used find all X~? interactions (X is cluters given in these parameter).
#' @param use.latter It controls whether to list cluster groups relating to one or some clusters(.Y) given in
#' parameter \code{sel.clusters.Y} of \code{link{AnalyzeInterInFullView}}.
#' @param cluster.latter The name of clusters, which is used find all ?~Y interactions (Y is cluters given in these parameter).
#'
#' @return Character. Names of cluster groups.
#'
#' @examples
#' \dontrun{
#' ListClusterGroups(object, use.former = TRUE, cluster.former = "T_cell")
#' # the output of this will be "T_cell~Macrophage", "T_cell~B_cell" and things like that, depending on available clusters.
#'
#' ListClusterGroups(object, use.latter = TRUE, cluster.latter = "T_cell")
#' # the output of this will be "Macrophage~T_cell", "B_cell~T_cell" and things like that, depending on available clusters.
#' }
#'
#' @export
#'
ListClusterGroups <- function(
object,
use.former = FALSE,
cluster.former = character(), # to X
use.latter = FALSE,
cluster.latter = character() # to Y
) {
this.fullview <- getFullViewResult(object)
this.allowed.names <- this.fullview$name.allpairs
this.former.allowed <- this.fullview$list.clusters$x.axis
this.latter.allowed <- this.fullview$list.clusters$y.axis
# pre-check
if ((use.former == FALSE && length(cluster.former) > 0) ||
(use.former == TRUE && length(cluster.former) == 0)) {
stop("To fetch the cluster relating to former, please set `use.former` = TRUE, and give valid clusters in `cluster.former`.")
}
if ((use.latter == FALSE && length(cluster.latter) > 0) ||
(use.latter == TRUE && length(cluster.latter) == 0)) {
stop("To fetch the cluster relating to latter, please set `use.latter` = TRUE, and give valid clusters in `cluster.latter`.")
}
res.cg <- character()
if (use.former == TRUE) {
not.valid.cluster.X <- setdiff(cluster.former, this.former.allowed)
if (length(not.valid.cluster.X) > 0) {
warning("Given undefined clusters in X: ", paste0(not.valid.cluster.X, collapse = ", "), ". ")
}
cluster.former <- unique(intersect(cluster.former, this.former.allowed))
former.res.list <- lapply(cluster.former, ref.names = this.allowed.names, function(x, ref.names) {
grep(paste0("^", x), ref.names, value = TRUE)
}
)
res.cg <- c(res.cg, as.character(unlist(former.res.list)))
}
if (use.latter == TRUE) {
not.valid.cluster.Y <- setdiff(cluster.latter, this.latter.allowed)
if (length(not.valid.cluster.Y) > 0) {
warning("Given undefined clusters in Y: ", paste0(not.valid.cluster.Y, collapse = ", "), ". ")
}
cluster.latter <- unique(intersect(cluster.latter, this.latter.allowed))
latter.res.list <- lapply(cluster.latter, ref.names = this.allowed.names, function(x, ref.names) {
grep(paste0(x, "$"), ref.names, value = TRUE)
}
)
res.cg <- c(res.cg, as.character(unlist(latter.res.list)))
}
# if nothing is specified return all cluster groups
if (use.former == FALSE && length(cluster.former) == 0 &&
use.latter == FALSE && length(cluster.latter) == 0) {
res.cg <- this.allowed.names
}
res.cg <- unique(res.cg)
return(res.cg)
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.