R/FullView.R

Defines functions ListClusterGroups GetResultFullView AnalyzeInterInFullView

Documented in AnalyzeInterInFullView GetResultFullView ListClusterGroups

#' Perform Network Analysis
#'
#' @description
#' This function works for network analysis, which calculates
#' count and power of interaction pairs among all given clusters.
#'
#' @inheritParams InsideObjectInterCell
#' @param sel.clusters.X Defining one part of interacting clusters. The options can be 
#'  found from \code{\link{ListAllClusters}}. See details for help.
#' @param sel.clusters.Y Defining the other part of interacting clusters. The options can be 
#'  found from \code{\link{ListAllClusters}}. See details for help.
#' @param sel.exprs.change Options are 'Xup.Yup', 'Xup.Ydn', 'Xdn.Yup', 'Xdn.Ydn'.
#'  It gives the corresponding expression change of every interacting gene pair.
#' @param sel.some.genes.X It gives the genes to be expected to be expressed in clusters given by parameter \code{sel.clusters.X}. 
#'  It can either be return by \code{FetchGeneOI}, or just \code{character} though losing some gene properties.
#' @param sel.some.genes.Y It gives the genes to be expected to be expressed in clusters given by parameter \code{sel.clusters.Y}. 
#'  It can be generated as paramter \code{sel.some.genes.X}.
#' @param sel.genes.option Options are 'intersect' or 'union'. 'intersect' strictly restricts gene pair to 
#'  have one gene partner in \code{sel.some.genes.X} and the other in \code{sel.some.genes.Y}. 'union' restricts 
#'  gene pair to have at least one gene either in \code{sel.some.genes.X} or \code{sel.some.genes.Y}.
#' @param sel.gene.pairs Directly specify the desired gene pairs. It should be given in standard table that is generated 
#'  by \code{\link{FormatCustomGenePairs}}. To note, it's strictly aligned to clusters, see details.
#' @param run.permutation It decides whether to permutation test to distinguish significant changed gene pairs with the rest.
#'  If set TRUE, parameter \code{perm.expression}, \code{perm.pval.sides} and \code{perm.pval.cutoff} should be provided. If set FALSE, 
#'  no statistical test is used, which means all gene pairs will be passed to result.
#' @param perm.expression A list of permutations of gene expression. It should be generated by function \code{Tool.GenPermutation()}.
#' @param perm.pval.sides Default is 'two-sides'. All allowed options are 'one-side-large' or 'one-side-small' or 'two-sides'. 
#'  Explicit explanation are given in details.
#' @param perm.pval.cutoff The P value cutoff for 
#' @param force.process It stops the program when no subset of genes are selected either by \code{sel.some.genes.X} and \code{sel.some.genes.Y}
#'  or by \code{sel.gene.pairs}, which may take long time to process. To force process, set this to TRUE.
#' @param verbose If set TRUE, the progress bar will be shown.
#'
#' @details
#' This function performs network analysis and calculates:
#' \itemize{
#'   \item count: count of interaction.
#'   \item power: strength of interaction, which is calculated by formula: SUM(Power(pair[i~j])).
#' }
#'
#' \bold{sel.clusters.X} and \bold{sel.clusters.Y}: 
#' Interactions are defined by pairs of interacting clusters, or say cluster groups. 
#' For one interaction, e.g. cluster-Myeloid ~ cluster-T_cell, the fromer one will be 
#' restricted to clusters given in \code{sel.clusters.X} and the latter one will be 
#' retricted to clusters given in \code{sel.clusters.Y}.
#' The clusters given by \code{ListAllClusters} list all available clusters, and users can 
#' manually pick some or all of them. 
#'
#' \bold{sel.exprs.change}: 
#' The *up means the gene is up-regulated, i.e. its LogFC (log fold change) > 0.
#' The *dn means the gene is down-regulated. Take 'Xup.Ydn' for example,
#' it gets to explore the interaction between cluster X and cluster Y by taking up-regulated genes from X and 
#' down-regulated genes from Y. 
#'
#' \bold{sel.gene.pairs}: 
#' The gene pairs given in this parameter are strictly aligned to clusters given by 
#' parameter \code{sel.clusters.X} and \code{sel.clusters.Y}. After standardizing by \code{FormatCustomGenePairs},
#' the gene pairs are given in 4 columns, and 2 of them are named 'inter.GeneName.A', 'inter.GeneName.B'.
#' The corresponding relation is that genes listed in 'inter.GeneName.A' are used to compare with genes expressed by clusters given in \code{sel.clusters.X} 
#' and genes listed in 'inter.GeneName.B' are used to compare with genes expressed by clusters given in \code{sel.clusters.Y}.
#' For example, if the user gives C3~C3ar1 in \code{sel.gene.pairs}, Myeloid_cell and T_cell in \code{sel.clusters.X}, and 
#' fibroblast and B_cell in \code{sel.clusters.Y}, then C3 will be tested in Myeloid_cell and T_cell but not fibroblast and B_cell,
#' and C3ar1 will be only tested in fibroblast and B_cell. If the user need C3~C3ar1 to be tested in the opposite way, 
#' one more row C3ar1~C3 should be given in \code{sel.gene.pairs}. 
#'
#' \bold{perm.pval.sides} and \bold{perm.pval.cutoff}:
#' The \code{perm.pval.sides} can be 'two-side', 'one-side-large' and 'one-side-small'.
#' Take option 'two-side' as example.
#' The null hypothesis is given as power of one gene pair from actual scRNA-seq is equal to it from cell label permutation.
#' Suppose one gene pair A~B, and cell clusters G1, G2, G3 exist.
#' The average expression of A and B in all cells are denoted as allavg(A) and allavg(B),
#' and the average expression of A and B in each cell clusters are denoted as E(G1, A), E(G1, B), E(G2, A), so on so forth.
#' The permutations of cell labels generate one list of new expression, and will be denoted like Perm(i, G1, A), i goes from 1 to permutation times.
#' To test whether A~B is significant between cluster G1 and G2, we calculate the power of this gene pair,
#' which is the multiply of expression, and thus 
#' H0 is made as E(G1, A) * E(G2, B) - allavg(A) * allavg(B) = Perm(i, G1, A) * Perm(i, G2, B) - allavg(A) * allavg(B), i goes from 1 to permutation times.
#' By getting the times when abs(Equation-left) <= abs(Equation-right), we get the P value. 
#' By comparing the P value with \code{perm.pval.cutoff}, we judge whether the gene pair A~B between G1 and G2 is significant.
#' 
#' For 'one-side-large', the H0 is E(G1, A) * E(G2, B) - allavg(A) * allavg(B) > Perm(i, G1, A) * Perm(i, G2, B) - allavg(A) * allavg(B) .
#' P value is calculated as (Equation-left) <= (Equation-right).
#' For 'one-side-small', the H0 is E(G1, A) * E(G2, B) - allavg(A) * allavg(B) < Perm(i, G1, A) * Perm(i, G2, B) - allavg(A) * allavg(B) .
#' P value is calculated as (Equation-left) >= (Equation-right).
#'
#'
#' @return A \code{InterCell} object.
#'
#' @importFrom dplyr bind_rows left_join
#' @import progress
#' @importFrom pbapply pbapply
#' @importFrom future nbrOfWorkers
#' @importFrom future.apply future_apply
#'
#' @export
#'
AnalyzeInterInFullView <- function(
	object,
	sel.clusters.X = NULL,
	sel.clusters.Y = NULL,
	sel.exprs.change = c("Xup.Yup", "Xup.Ydn", "Xdn.Yup", "Xdn.Ydn"),
	sel.some.genes.X = NULL,
	sel.some.genes.Y = NULL,
	sel.genes.option = "intersect", 
	sel.gene.pairs = NULL,
	run.permutation = FALSE,
	perm.expression = NULL,
	perm.pval.sides = "two-sides",
	perm.pval.cutoff = 0.05,  # two-sided p val
	force.process = FALSE,
	verbose = TRUE
) {
	kGenesSplit <- getGenePairSplit(object)
	# check sel.clusters.*
	inside.check.sel.clusters <- function(allowed.clusters, sel.clusters, suffix = c("X", "Y")) {
		if (is.null(sel.clusters)) {
			sel.clusters <- allowed.clusters
			sel.clusters <- sel.clusters[order(sel.clusters)]
		} else {
			not.valid.clusters <- setdiff(sel.clusters, allowed.clusters)
			if (length(not.valid.clusters) > 0) {
				warning("Given Undefined clusters in `sel.clusters.", suffix, "`: ", paste0(not.valid.clusters, collapse = ", "), ". ")
			}
			sel.clusters <- intersect(sel.clusters, allowed.clusters)
		}
		return(unique(sel.clusters))
	}
	# check
	all.clusters <- unique(object@fgenes$cluster)
	sel.clusters.X <- inside.check.sel.clusters(all.clusters, sel.clusters.X, suffix = "X")
	sel.clusters.Y <- inside.check.sel.clusters(all.clusters, sel.clusters.Y, suffix = "Y")

	# check selection on gene pairs or some genes
	if (!is.null(sel.gene.pairs)) {
		# check data.frame
		if (!is.data.frame(sel.gene.pairs)) {
			stop("User-selected gene pairs should be given in table (data.frame format)!")
		}
		# check if it is standardized data.frame
		std.colnames.1t4 <- c("inter.GeneID.A", "inter.GeneID.B", "inter.GeneName.A", "inter.GeneName.B")
		if (!identical(colnames(sel.gene.pairs)[1:4], std.colnames.1t4)) {
			stop("Non-standardized table of gene pairs are given, please use function `FormatCustomGenePairs` first to get standardized one!")
		}
	}
	sel.property.X <- sel.property.Y <- NULL
	if (!is.null(sel.some.genes.X)) {
		if (is.list(sel.some.genes.X) && all(c("genes", "property") %in% names(sel.some.genes.X))) {
			sel.property.X <- sel.some.genes.X$property
			sel.some.genes.X <- sel.some.genes.X$genes
		} else {
			if (!is.character(sel.some.genes.X)) {
				sel.some.genes.X <- as.character(sel.some.genes.X)
			}
		}
	}
	if (!is.null(sel.some.genes.Y)) {
		if (is.list(sel.some.genes.Y) && all(c("genes", "property") %in% names(sel.some.genes.Y))) {
			sel.property.Y <- sel.some.genes.Y$property
			sel.some.genes.Y <- sel.some.genes.Y$genes
		} else {
			if (!is.character(sel.some.genes.Y)) {
				sel.some.genes.Y <- as.character(sel.some.genes.Y)
			}
		}
	}
	# gene pairs if given, genes given will be overwritten
	if (is.null(sel.gene.pairs)) {
		if (is.null(sel.some.genes.X) && is.null(sel.some.genes.Y)) {
			if (force.process == FALSE) {
				stop("No restriction applied on neither genes or gene pairs.",
					" Program will get long time to be finished.",
					" If insisting, set `force.process` to be TRUE.")
			}
		} else {
			if (is.null(sel.some.genes.X) || is.null(sel.some.genes.Y)) {
				print("Processing with partial selection on genes.")
			}
		}
	} else {
		print(paste("Processing with given", nrow(sel.gene.pairs), "gene pairs."))
		if (!is.null(sel.some.genes.X) || !is.null(sel.some.genes.Y)) {
			warning("Given selection on genes is overrided by given selection on gene pairs!")
		}
	}
	# check sel.genes.option
	std.sel.genes.option <- c("intersect", "union")
	sel.genes.func <- base::intersect  # in default using 'intersect'
	sel.genes.option <- sel.genes.option[1]  # only length 1 allowed
	valid.option <- intersect(sel.genes.option, std.sel.genes.option)
	sel.genes.func <- switch(valid.option, 
		"intersect" = base::intersect,
		"union" = base::union,
		warning("Given invalid user option on selecting genes: ", paste0(sel.genes.option, collapse = ", "), ". ", 
			"Using default 'intersect' option!")
	)

	# check exprs change
	not.valid.exprs.c <- setdiff(sel.exprs.change, kexprs.change)
	if (length(not.valid.exprs.c) > 0) {
		warning("Given invalid options for expression change: ", paste0(not.valid.exprs.c, collapse = ", ", ". "))
	}
	sel.exprs.change <- intersect(sel.exprs.change, kexprs.change)
	if (length(sel.exprs.change) == 0) {
		stop("No expression change selection is given in parameter `sel.exprs.change`. Options are 'Xup.Yup', 'Xup.Ydn', 'Xdn.Yup', 'Xdn.Ydn'.")
	}

	# check if permutation test is used
	actual.exprs <- NA
	if (run.permutation == TRUE) {
		# check perm.pval.sides
		allowed.sides <- c("two-sides", "one-side-large", "one-side-small")
		if (!perm.pval.sides %in% allowed.sides) {
			stop("Invalid parameter 'perm.pval.sides', only 'two-sides', 'one-side-large' and 'one-side-small' are allowed.")
		}
		# check permutation
		if (is.null(perm.expression)) {
			stop("Attempt to run with permutation but data not given. Please give it in parameter 'perm.expression'.")
		}
		if (!is.list(perm.expression) || (is.list(perm.expression) && !all(c("actual", "perm") %in% names(perm.expression))) ) {
			stop("Given invalid permutation data, please use function 'Tool.GenPermutation' to generate it.")
		}
		# split perm and actual
		actual.exprs <- perm.expression$actual
		perm.expression <- perm.expression$perm
		# check if all result are unified (just the rows and columns for simiple checking)
		tmp.simple.check.p <- t(vapply(perm.expression, function(x) { c(nrow(x), ncol(x)) }, FUN.VALUE = integer(2)))
		if (!all(tmp.simple.check.p[, 1] == tmp.simple.check.p[1, 1]) ||
			!all(tmp.simple.check.p[, 2] == tmp.simple.check.p[1, 2])) {
			stop("Non-unified permutation matrices are given! Please recheck data format.")
		}
		# check if all involved genes are given. After simple checking, here think every permutation is the same
		tmp.gene.p.check <- unique(object@fgenes$gene) %in% rownames(perm.expression[[1]])
		if (all(!tmp.gene.p.check)) {
			stop("No valid genes are given in permutation matrix! If you give the wrong element?")
		}
		if (!all(tmp.gene.p.check)) {
			warning("Those marker genes are unexpectly not detected in permutation matrix: ",
				paste0(setdiff(unique(object@fgenes$gene), rownames(perm.expression[[1]])), collapse = ", "), ".")
		}
	}

	## analyze interaction network
	interact.pairs.all <- list(
		list.clusters = list(x.axis = sel.clusters.X, y.axis = sel.clusters.Y),
		data.allpairs = list(),
		anno.allpairs = list(location.A = list(), location.B = list(), type.A = list(), type.B = list()),
		name.allpairs = character(),
		cnt.allpairs = integer(),
		strength.allpairs = single()
	)

	##  perform network analysis
	# if prog.bar shown
	if (verbose == TRUE) {
		prog.bar.fv <- progress::progress_bar$new(total = length(sel.clusters.X) * length(sel.clusters.Y))
		prog.bar.fv$tick(0)
	}
	# the used gene pairs 
	used.proc.pairs.db <- object@database@pairs.db
	# pre-slim for used database
	if (!is.null(sel.gene.pairs)) {
		tmp.ref.gpairs <- paste0(used.proc.pairs.db$inter.GeneName.A, used.proc.pairs.db$inter.GeneName.B, sep = kGenesSplit)
		#
		tmp.used.conv <- paste0(sel.gene.pairs$inter.GeneName.A, sel.gene.pairs$inter.GeneName.B, sep = kGenesSplit)
		tmp.used.rev  <- paste0(sel.gene.pairs$inter.GeneName.B, sel.gene.pairs$inter.GeneName.A, sep = kGenesSplit)
		# set the pairs database
		used.proc.pairs.db <- sel.gene.pairs[union(which(tmp.used.conv %in% tmp.ref.gpairs), which(tmp.used.rev %in% tmp.ref.gpairs)), ]
	} else {  # slim for some genes
		tmp.used.genes <- c(sel.some.genes.X, sel.some.genes.Y)
		if (length(tmp.used.genes) > 0) {
			used.proc.pairs.db <- used.proc.pairs.db[union(which(used.proc.pairs.db$inter.GeneName.A %in% tmp.used.genes), 
				which(used.proc.pairs.db$inter.GeneName.B %in% tmp.used.genes)), ]
		}
	}

	# the process
	for (ix in sel.clusters.X) {
		for (jy in sel.clusters.Y) {
			interact.name <- paste0(ix, object@tool.vars$cluster.split, jy)
			fgenes.t.X <- object@fgenes[which(object@fgenes$cluster == ix), ]
			fgenes.t.Y <- object@fgenes[which(object@fgenes$cluster == jy), ]
			genes.X <- fgenes.t.X$gene
			genes.Y <- fgenes.t.Y$gene

			# generate all possible gene pairs
			ref.GeneA <- used.proc.pairs.db$inter.GeneName.A
			ref.GeneB <- used.proc.pairs.db$inter.GeneName.B
			# conv
			inds.gpairs.conv <- intersect(which(ref.GeneA %in% genes.X), which(ref.GeneB %in% genes.Y))
			gpairs.result <- used.proc.pairs.db[inds.gpairs.conv, ]
			
			if (is.null(sel.gene.pairs)) {  # using reference database, then rev. ones should be collected
				# rev
				inds.gpairs.rev <- intersect(which(ref.GeneB %in% genes.X), which(ref.GeneA %in% genes.Y))
				cols.rev <- ReverseOddEvenCols(std.index.colname.end.dual)
				gpairs.rev <- used.proc.pairs.db[inds.gpairs.rev, c(cols.rev, setdiff(seq_len(ncol(used.proc.pairs.db)), cols.rev))]
				colnames(gpairs.rev) <- colnames(used.proc.pairs.db)
				# merge result
				gpairs.result <- rbind(gpairs.result, gpairs.rev)
			}
			
			# prioritized usage of selection to process
			if (is.null(sel.gene.pairs)) {  # use selected genes
				inds.sel.X <- inds.sel.Y <- seq_len(nrow(gpairs.result))
				if (!is.null(sel.some.genes.X)) {
					inds.sel.X <- which(gpairs.result$inter.GeneName.A %in% sel.some.genes.X)
				}
				if (!is.null(sel.some.genes.Y)) {
					inds.sel.Y <- which(gpairs.result$inter.GeneName.B %in% sel.some.genes.Y)
				}
				# use select gene merge option and get result
				gpairs.result <- gpairs.result[sel.genes.func(inds.sel.X, inds.sel.Y), ]
			}

			# add logfc, pval data from fgenes, as well as Exprs from fgenes
			tmp.sel.cols <- setdiff(object@misc$musthave.colnames, "cluster")
			tmp.tomodf.cnt <- length(tmp.sel.cols) - 2  # as substraction below starts from 0, -2 / one for gene, one for index shift to start from 0
			if (tmp.tomodf.cnt < 0) {
				stop("Unexpected must have columns not matched!")
			}
			gpairs.result <- left_join(gpairs.result, fgenes.t.X[, tmp.sel.cols], by = c("inter.GeneName.A" = "gene"))
			tmp.change.cols <- ncol(gpairs.result) - (tmp.tomodf.cnt):0
			colnames(gpairs.result)[tmp.change.cols] <- paste("inter", colnames(gpairs.result)[tmp.change.cols], "A", sep = ".")
			gpairs.result <- left_join(gpairs.result, fgenes.t.Y[, tmp.sel.cols], by = c("inter.GeneName.B" = "gene"))
			tmp.change.cols <- ncol(gpairs.result) - (tmp.tomodf.cnt):0
			colnames(gpairs.result)[tmp.change.cols] <- paste("inter", colnames(gpairs.result)[tmp.change.cols], "B", sep = ".")
			# reorder the result table
			if ("Exprs" %in% object@misc$musthave.colnames) {
				tmp.paired.order.cols <- paste("inter", rep(c("GeneID", "GeneName", "LogFC", "PVal", "Exprs"), each = 2), c("A", "B"), sep = ".")
			} else {
				tmp.paired.order.cols <- paste("inter", rep(c("GeneID", "GeneName", "LogFC", "PVal"), each = 2), c("A", "B"), sep = ".")
			}
			gpairs.result <- gpairs.result[, c(tmp.paired.order.cols, setdiff(colnames(gpairs.result), tmp.paired.order.cols))]
			
			# check exprs change
			inds.subg.logfc <- integer()
			inds.A.up <- which(gpairs.result$inter.LogFC.A > 0)
			inds.A.dn <- which(gpairs.result$inter.LogFC.A <= 0)
			inds.B.up <- which(gpairs.result$inter.LogFC.B > 0)
			inds.B.dn <- which(gpairs.result$inter.LogFC.B <= 0)
			if ("Xup.Yup" %in% sel.exprs.change) {
				inds.subg.logfc <- c(inds.subg.logfc, intersect(inds.A.up, inds.B.up))
			}
			if ("Xup.Ydn" %in% sel.exprs.change) {
				inds.subg.logfc <- c(inds.subg.logfc, intersect(inds.A.up, inds.B.dn))
			}
			if ("Xdn.Yup" %in% sel.exprs.change) {
				inds.subg.logfc <- c(inds.subg.logfc, intersect(inds.A.dn, inds.B.up))
			}
			if ("Xdn.Ydn" %in% sel.exprs.change) {
				inds.subg.logfc <- c(inds.subg.logfc, intersect(inds.A.dn, inds.B.dn))
			}
			gpairs.result <- gpairs.result[inds.subg.logfc, ]

			# further unique result
			gpairs.result <- DoPartUnique(gpairs.result, c(1,2))


			# get result for other attributes
			inside.property.apply <- function(property, data.location, data.type) {
				if (is.null(property)) {
					return(list(location = data.location, type = data.type))
				}

				if (!is.null(property$location) && length(property$location) > 0) {
					data.location <- subset(data.location, GO.Term.target %in% property$location)
				}
				if (!is.null(property$location.score && length(property$location.score) > 0)) {
					data.location <- subset(data.location, score %in% property$location.score)
				}
				if (!is.null(property$type) && length(property$type) > 0) {
					data.type <- subset(data.type, Keyword.Name %in% property$type)
				}
				return(list(location = data.location, type = data.type))
			}
			# location & type for A
			tmp.property.A <- inside.property.apply(sel.property.X, 
				object@database@anno.location.db[which(object@database@anno.location.db$GeneID %in% gpairs.result$inter.GeneID.A), ],
				object@database@anno.type.db[which(object@database@anno.type.db$GeneID %in% gpairs.result$inter.GeneID.A), ])
			# location & type for B
			tmp.property.B <- inside.property.apply(sel.property.Y,
				object@database@anno.location.db[which(object@database@anno.location.db$GeneID %in% gpairs.result$inter.GeneID.B), ],
				object@database@anno.type.db[which(object@database@anno.type.db$GeneID %in% gpairs.result$inter.GeneID.B), ])

			# add power and pval in data
			if (run.permutation == TRUE) {
				tmp.formula.exprs <- object@formulae[[("TG.EXPRS")]]
				gpairs.result[, "inter.Power"] <- tmp.formula.exprs(gpairs.result[, "inter.Exprs.A"], gpairs.result[, "inter.Exprs.B"])
				gpairs.result[, "inter.Confidence"] <- rep(1, times = nrow(gpairs.result))  # wait for all done and merge add p value
				gpairs.result[, "inter.ToCheck.A"] <- rep(ix, times = nrow(gpairs.result))
				gpairs.result[, "inter.ToCheck.B"] <- rep(jy, times = nrow(gpairs.result))
			} else {
				tmp.formula.to.use <- object@formulae[c("TG.LOGFC", "TG.PVAL")]
				names(tmp.formula.to.use) <- c("LogFC", "PVal")
				gpairs.result[, "inter.Power"] <- tmp.formula.to.use[["LogFC"]](gpairs.result[, "inter.LogFC.A"], gpairs.result[, "inter.LogFC.B"])
				gpairs.result[, "inter.Confidence"] <- tmp.formula.to.use[["PVal"]](gpairs.result[, "inter.PVal.A"], gpairs.result[, "inter.PVal.B"])
				# save result when not use permutation
				interact.pairs.all$cnt.allpairs <- c(interact.pairs.all$cnt.allpairs, nrow(gpairs.result))
				interact.pairs.all$strength.allpairs <- c(interact.pairs.all$strength.allpairs, sum(gpairs.result[, "inter.Power"]))
			}

			# get result saved
			interact.pairs.all$data.allpairs[[interact.name]] <- gpairs.result
			interact.pairs.all$anno.allpairs$location.A[[interact.name]] <- tmp.property.A$location
			interact.pairs.all$anno.allpairs$location.B[[interact.name]] <- tmp.property.B$location
			interact.pairs.all$anno.allpairs$type.A[[interact.name]] <- tmp.property.A$type
			interact.pairs.all$anno.allpairs$type.B[[interact.name]] <- tmp.property.B$type
			interact.pairs.all$name.allpairs <- c(interact.pairs.all$name.allpairs, interact.name)
			
			if (verbose == TRUE) {
				prog.bar.fv$tick()
			}
		}
	}

	# if use permutation, here add pval
	if (run.permutation == TRUE) {
		tmp.all.result <- bind_rows(interact.pairs.all$data.allpairs)
		tmp.inv.ga <- unique(tmp.all.result[, c("inter.GeneName.A", "inter.ToCheck.A")])
		tmp.inv.gb <- unique(tmp.all.result[, c("inter.GeneName.B", "inter.ToCheck.B")])
		colnames(tmp.inv.ga) <- colnames(tmp.inv.gb) <- c("gene", "cluster")
		tmp.involved.gdf <- unique(rbind(tmp.inv.ga, tmp.inv.gb))
		# slim perm result
		perm.expression <- vapply(perm.expression, inv.gdf = tmp.involved.gdf,
			function(mat, inv.gdf) {
				as.numeric(mat[cbind(match(inv.gdf$gene, rownames(mat)), match(inv.gdf$cluster, colnames(mat)))])
			}, FUN.VALUE = double(nrow(tmp.involved.gdf)))

		# get permutation result
		tmp.get.perm.res.func <- function(x, perm.expression, actual.exprs, inv.gdf, perm.pval.sides) {
			# actual
			tmp.avg.it <- actual.exprs$allavg[which(names(actual.exprs$allavg) == x["inter.GeneName.A"])] * 
						  actual.exprs$allavg[which(names(actual.exprs$allavg) == x["inter.GeneName.B"])]				
			# perm
			inds.ga <- intersect(which(inv.gdf$gene == x["inter.GeneName.A"]), which(inv.gdf$cluster == x["inter.ToCheck.A"]))
			inds.gb <- intersect(which(inv.gdf$gene == x["inter.GeneName.B"]), which(inv.gdf$cluster == x["inter.ToCheck.B"]))
			nulldist.power <- perm.expression[inds.ga, ] * perm.expression[inds.gb, ]
			nulldist.power <- nulldist.power[order(nulldist.power, decreasing = TRUE)]
			nulldist.power <- nulldist.power - tmp.avg.it
			# return P value
			tmp.pval <- 1
			if (perm.pval.sides == "two-sides") {
				tmp.pval <- sum(abs(as.numeric(x["inter.Power"]) - tmp.avg.it) <= abs(nulldist.power - tmp.avg.it)) / length(nulldist.power)
			}
			if (perm.pval.sides == "one-side-large") {
				tmp.pval <- sum(as.numeric(x["inter.Power"]) - tmp.avg.it <= nulldist.power - tmp.avg.it) / length(nulldist.power)
			}
			if (perm.pval.sides == "one-side-small") {
				tmp.pval <- sum(as.numeric(x["inter.Power"]) - tmp.avg.it >= nulldist.power - tmp.avg.it) / length(nulldist.power)
			}
			tmp.pval
		}
		if (nbrOfWorkers() == 1) {
			tmp.all.result[, "inter.Confidence"] <- as.numeric(pbapply(tmp.all.result, MARGIN = 1,
				perm.expression = perm.expression, actual.exprs = actual.exprs,
				inv.gdf = tmp.involved.gdf, perm.pval.sides = perm.pval.sides,
				FUN = tmp.get.perm.res.func
			))
		} else {
			tmp.all.result[, "inter.Confidence"] <- as.numeric(future_apply(tmp.all.result, MARGIN = 1,
				perm.expression = perm.expression, actual.exprs = actual.exprs,
				inv.gdf = tmp.involved.gdf, perm.pval.sides = perm.pval.sides,
				FUN = tmp.get.perm.res.func
			))
		}

		# truncate data by pval
		tmp.all.result <- subset(tmp.all.result, inter.Confidence < perm.pval.cutoff)
		# redist and fetch result
		tmp.all.result[, "inter.Distr.interact"] <- paste(tmp.all.result[, "inter.ToCheck.A"], tmp.all.result[, "inter.ToCheck.B"], sep = object@tool.vars$cluster.split)
		tmp.allinters.names <- paste(rep(sel.clusters.X, each = length(sel.clusters.Y)),
			rep(sel.clusters.Y, times = length(sel.clusters.X)), sep = object@tool.vars$cluster.split)
		for (i.inter in tmp.allinters.names) {
			tmp.df <- subset(tmp.all.result, inter.Distr.interact == i.inter)
			interact.pairs.all$data.allpairs[[i.inter]] <- tmp.df[, setdiff(colnames(tmp.df), c("inter.ToCheck.A", "inter.ToCheck.B", "inter.Distr.interact"))]
			# location
			interact.pairs.all$anno.allpairs$location.A[[i.inter]] <- interact.pairs.all$anno.allpairs$location.A[[i.inter]][which(interact.pairs.all$anno.allpairs$location.A[[i.inter]]$GeneID %in% tmp.df$inter.GeneID.A), ]
			interact.pairs.all$anno.allpairs$location.B[[i.inter]] <- interact.pairs.all$anno.allpairs$location.B[[i.inter]][which(interact.pairs.all$anno.allpairs$location.B[[i.inter]]$GeneID %in% tmp.df$inter.GeneID.B), ]
			# type
			interact.pairs.all$anno.allpairs$type.A[[i.inter]] <- interact.pairs.all$anno.allpairs$type.A[[i.inter]][which(interact.pairs.all$anno.allpairs$type.A[[i.inter]]$GeneID %in% tmp.df$inter.GeneID.A), ]
			interact.pairs.all$anno.allpairs$type.B[[i.inter]] <- interact.pairs.all$anno.allpairs$type.B[[i.inter]][which(interact.pairs.all$anno.allpairs$type.B[[i.inter]]$GeneID %in% tmp.df$inter.GeneID.B), ]
			# strength and count
			interact.pairs.all$cnt.allpairs <- c(interact.pairs.all$cnt.allpairs, nrow(tmp.df))
			interact.pairs.all$strength.allpairs <- c(interact.pairs.all$strength.allpairs, sum(tmp.df[, "inter.Power"]))
		}
	}
	

	# result
	object <- setFullViewResult(object, interact.pairs.all)
	return(object)
}





#' Get Result for Network Analysis
#' 
#' @description
#' This function summarizes the result of network analysis in graph and table, and gives clues to 
#' most active intercellular communications.
#'
#' @inheritParams InsideObjectInterCell
#' @param show.clusters.in.x Clusters chosen to show in x-axis, corresponding to paramter \code{sel.clusters.X} in \code{\link{AnalyzeInterInFullView}}.
#' @param show.clusters.in.y Clusters chosen to show in y-axis, corresponding to paramter \code{sel.clusters.Y} in \code{\link{AnalyzeInterInFullView}}.
#' @param sel.cluster.group.method Options are 'all', 'diagonal' and 'diagonal-2'. It controls 
#'  'all' is recommended for most cases, and 
#'  'diagonal*' is recommended when same restrictions are applied to clusters in X and Y.
#' @param power.apply.func The function used to modify the power value, usually \code{log1p}, etc.
#' @param cnt.apply.func   The function used to modify the count value, usually \code{sqrt} to avoid extremes problem.
#' @param plot.power.range This will extend the plotting range beyond actual value range for power.
#' @param plot.cnt.range This will extend the plotting range beyond actual value range for count.
#' @param nodes.size.range 2 numbers that specifies the range of sizes of the nodes shown in the graph.
#' @param nodes.colour.seq Given colours will be used to generate colour gradient for plotting.
#' @param nodes.colour.value.seq Numeric. If set NULL, colours given in \code{nodes.colour.seq} will be evenly placed.
#' Otherwise, numeric values with the same length of \code{nodes.colour.seq} should be given to specify the positions corresponding to each colour.
#' See parameter \code{values} in \code{ggplot2::scale_colour_gradientn} for details.
#' @param plot.axis.x.name X-axis name when plotting.
#' @param plot.axis.y.name Y-axis name when plotting.
#' 
#'
#' @return List. Use \code{Tool.ShowPlot()} to see the \bold{plot}, \code{Tool.WriteTables()} to save the result \bold{table} in .csv files.
#' \itemize{
#'   \item plot: the object of \pkg{ggplot2}.
#'   \item table: a list of \code{data.frame}.
#' }
#'
#' @import ggplot2
#'
#' @export
#'
GetResultFullView <- function(
	object,
	show.clusters.in.x = NULL,
	show.clusters.in.y = NULL,
	sel.cluster.group.method = "all",
	power.apply.func = NULL,
	cnt.apply.func = NULL,
	plot.power.range = NULL, 
	plot.cnt.range = NULL, 
	nodes.size.range = c(1, 6),
	nodes.colour.seq = c("#00809D", "#EEEEEE", "#C30000"),
	nodes.colour.value.seq = c(0.0, 0.5, 1.0),
	plot.axis.x.name = "clusters-x",
	plot.axis.y.name = "clusters-y"
) {
	interact.pairs.acted <- getFullViewResult(object)
	kClustersSplit <- getClusterSplit(object)

	# pre-check parameter
	if (!is.null(power.apply.func) && !is.function(power.apply.func)) {
		stop("Power function is not given properly. Please give it one function.")
	}
	if (!is.null(cnt.apply.func) && !is.function(cnt.apply.func)) {
		stop("Count function is not given properly. Please give it one function.")	
	}

	# process
	# cluster selection
	fac.clusters <- interact.pairs.acted$list.clusters  # all clusters related
	fac.x.clusters <- fac.clusters$x.axis
	fac.y.clusters <- fac.clusters$y.axis
	if (!is.null(show.clusters.in.x) && length(show.clusters.in.x) != 0) {  # select part of clusters to be shown in x-axis
		fac.x.clusters <- intersect(show.clusters.in.x, fac.x.clusters)
		print(paste0("Reading clusters shown in X-axis: ", paste0(fac.x.clusters, collapse = ", "), "."))
	}
	if (!is.null(show.clusters.in.y) && length(show.clusters.in.y) != 0) {  # select part of clusters to be shown in y-axis
		fac.y.clusters <- intersect(show.clusters.in.y, fac.y.clusters)
		print(paste0("Reading clusters shown in Y-axis: ", paste0(fac.y.clusters, collapse = ", "), "."))
	}
	if (length(fac.x.clusters) < 1 || length(fac.y.clusters) < 1) {  # check
		stop("Error: X-Y plot needs at least one item in each axis!")
	}


	### data process for selected part of data
	inside.diagonal.selection <- function(vec.a, vec.b, identical.rm = FALSE) {
		if (sum(levels(factor(vec.a)) != levels(factor((vec.b)))) != 0) {
			stop("Diagonal method only allows clusters in x and y to be of same value range!")
		}
		vec.b <- rev(vec.b[match(vec.a, vec.b)])  # re-order to be the same order
		res.va <- res.vb <- NULL
		for (i in seq_along(vec.a)) {
			tmp.len <- ifelse(identical.rm == FALSE, length(vec.a) - i + 1, length(vec.a) - i)
			res.va <- c(res.va, rep(vec.a[i], times = tmp.len))
			res.vb <- c(res.vb, vec.b[seq_len(tmp.len)])
		}
		return(list(xsel = res.va, ysel = res.vb))
	}
	select.cluster.groups <- switch(sel.cluster.group.method,
		"all" = {
				col.x.data <- rep(fac.x.clusters, each = length(fac.y.clusters))
				col.y.data <- rep(fac.y.clusters, times = length(fac.x.clusters))
				list(xsel = col.x.data, ysel = col.y.data)
			},
		"diagonal" = inside.diagonal.selection(fac.x.clusters, fac.y.clusters),
		"diagonal-2" = inside.diagonal.selection(fac.x.clusters, fac.y.clusters, identical.rm = TRUE),
		stop("Undefined method for selecting cluster groups. Please re-check the given parameter: sel.cluster.group.method")
	)
	# select 
	col.x.data <- select.cluster.groups$xsel
	col.y.data <- select.cluster.groups$ysel
	names.part.data <- paste(col.x.data, col.y.data, sep = kClustersSplit)
	ind.names.m <- match(names.part.data, interact.pairs.acted$name.allpairs)
	# cnt
	cnt.xy.data <- as.integer(interact.pairs.acted$cnt.allpairs)
	cnt.part.data <- cnt.xy.data[ind.names.m]
	if (!is.null(cnt.apply.func)) {
		cnt.part.data <- cnt.apply.func(cnt.part.data)
	}
	# power
	power.xy.data <- as.numeric(interact.pairs.acted$strength.allpairs)
	power.part.data <- power.xy.data[ind.names.m]
	if (!is.null(power.apply.func)) {
		power.part.data <- power.apply.func(power.part.data)
	}

	# plot data preparation
	pairs.plot.db <- data.frame(
		pair.name = names.part.data, 
		x = col.x.data, 
		y = col.y.data,
		cnt = cnt.part.data,
		power = power.part.data,
		stringsAsFactors = FALSE
	)
	# check if it needs x-axis text rotation
	if.need.x.axis.rotate <- FALSE
	x.axis.text.len <- sapply(pairs.plot.db$x, function(x) {
		nchar(as.character(x))
	})
	if (max(x.axis.text.len) > 4) {
		if.need.x.axis.rotate <- TRUE
	}
	# limits range modification function
	inside.limits.adj.func <- function(user.limits.adj) {
		force(user.limits.adj)
		function(limits) {
			res.limits <- limits
			if (!is.null(user.limits.adj)) {
				res.limits <- user.limits.adj
			}
			return(res.limits)
		}
	}

	# plot process
	plot.res <- ggplot(pairs.plot.db, aes(x, y))
	plot.res <- plot.res + labs(x = plot.axis.x.name, y = plot.axis.y.name)
	plot.res <- plot.res + geom_point(aes(size = cnt, colour = power)) + 
					scale_x_discrete(limits = unique(col.x.data), breaks = unique(col.x.data)) + 
					scale_y_discrete(limits = unique(col.y.data), breaks = unique(col.y.data)) + 
					scale_size(name = "Count", range = nodes.size.range,
						limits = inside.limits.adj.func(plot.cnt.range)) + 
					scale_colour_gradientn(name = "Power", colours = nodes.colour.seq, values = nodes.colour.value.seq,
						limits = inside.limits.adj.func(plot.power.range), na.value = NA)
	plot.res <- plot.res + theme_classic()  # change theme
	# rotate labels in x-axis when needed
	if (if.need.x.axis.rotate) {
		plot.res <- plot.res + theme(axis.text.x = element_text(angle = 45, hjust = 1))
	}

	## construct result tables
	# table raw.res
	res.raw.table <- data.frame(
		pair.name = pairs.plot.db$pair.name,
		x = pairs.plot.db$x,
		y = pairs.plot.db$y,
		cnt = pairs.plot.db$cnt,
		power = pairs.plot.db$power,
		stringsAsFactors = FALSE
	)
	# table cnt & power
	len.row <- length(fac.y.clusters)
	len.col <- length(fac.x.clusters)
	res.cnt.table <- matrix(data = c(0), nrow = len.row, ncol = len.col)
	res.power.table <- matrix(data = c(0.0), nrow = len.row, ncol = len.col)
	for (i in 1:len.col) {
		for (j in 1:len.row) {
			this.index <- len.row * (i - 1) + j
			res.cnt.table[j, i] <- pairs.plot.db$cnt[this.index]
			res.power.table[j, i] <- round(pairs.plot.db$power[this.index], 2)
		}
	}
	# set rownames & colnames
	rownames(res.cnt.table) <- as.character(fac.y.clusters)
	colnames(res.cnt.table) <- as.character(fac.x.clusters)
	rownames(res.power.table) <- as.character(fac.y.clusters)
	colnames(res.power.table) <- as.character(fac.x.clusters)

	#end# return 
	list(plot = plot.res, table = list(raw.res.table = res.raw.table, cnt.table = res.cnt.table, power.table = res.power.table))
}




#WeightRidges <- function(
#	object,
#	target.cluster,
#	target.is.sender = FALSE
#) {
#	allinteracts <- getFullViewResult(object)
#	kClustersSplit <- getClusterSplit(object)
#	kGenesSplit <- getGenePairSplit(object)
#
#	if (target.is.sender) {
#		rel.cluster.groups <- grep(paste0("^", target.cluster), allinteracts$name.allpairs, value = TRUE)
#		rel.tg.cluster <- unlist(strsplit(rel.cluster.groups, split = paste0(target.cluster, kClustersSplit)))
#	} else {
#		rel.cluster.groups <- grep(paste0(target.cluster, "$"), allinteracts$name.allpairs, value = TRUE)
#		rel.tg.cluster <- unlist(strsplit(rel.cluster.groups, split = paste0(kClustersSplit, target.cluster)))
#	}
#	rel.tg.cluster <- setdiff(rel.tg.cluster, "")
	#
#	ridge.plot.data <- bind_rows(lapply(seq_along(rel.cluster.groups), 
#		ref.data = allinteracts$data.allpairs, cluster.split = kClustersSplit,
#		rel.cluster.groups = rel.cluster.groups, rel.tg.cluster = rel.tg.cluster,
#		function(x, ref.data, cluster.split, rel.cluster.groups, rel.tg.cluster) {
#			tmp.df <- ref.data[[x]]
#			tmp.df[, "Plot.tg.cluster"] <- rep(rel.tg.cluster[x], times = nrow(tmp.df))
#			tmp.df
#		}))
#	ridge.plot.data[, "packed.inters"] <- paste(ridge.plot.data$inter.GeneName.A, ridge.plot.data$inter.GeneName.B, sep = kGenesSplit)
#	ridge.plot.data <- ridge.plot.data[, c("packed.inters", "Plot.tg.cluster", "inter.Power")]
#	ggplot(ridge.plot.data) +
#		geom_density_ridges(aes(x = log(inter.Power), y = Plot.tg.cluster)) + 
#		theme_ridges() +
#		theme(axis.text.x = element_text(angle = 90))
#}
#
#
#WeigthFlow <- function(
#	object
#) {
#
#}



# %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
# Other functions
# %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

#' List Cluster Groups
#'
#' This function is to help on listing all or some of cluster groups (i.e. interacting clusters).
#'
#' @inheritParams InsideObjectInterCell
#' @param use.former It controls whether to list cluster groups relating to one or some clusters(.X) given in 
#'  parameter \code{sel.clusters.X} of \code{link{AnalyzeInterInFullView}}.
#' @param cluster.former The name of clusters, which is used find all X~? interactions (X is cluters given in these parameter). 
#' @param use.latter It controls whether to list cluster groups relating to one or some clusters(.Y) given in 
#'  parameter \code{sel.clusters.Y} of \code{link{AnalyzeInterInFullView}}.
#' @param cluster.latter The name of clusters, which is used find all ?~Y interactions (Y is cluters given in these parameter). 
#'
#' @return Character. Names of cluster groups.
#'
#' @examples
#' \dontrun{
#'  ListClusterGroups(object, use.former = TRUE, cluster.former = "T_cell")
#'  # the output of this will be "T_cell~Macrophage", "T_cell~B_cell" and things like that, depending on available clusters.
#' 
#'  ListClusterGroups(object, use.latter = TRUE, cluster.latter = "T_cell")
#'  # the output of this will be "Macrophage~T_cell", "B_cell~T_cell" and things like that, depending on available clusters.
#' }
#'
#' @export
#' 
ListClusterGroups <- function(
	object,
	use.former = FALSE,
	cluster.former = character(),  # to X
	use.latter = FALSE,
	cluster.latter = character()  # to Y
) {
	this.fullview <- getFullViewResult(object)
	this.allowed.names <- this.fullview$name.allpairs
	this.former.allowed <- this.fullview$list.clusters$x.axis
	this.latter.allowed <- this.fullview$list.clusters$y.axis

	# pre-check
	if ((use.former == FALSE && length(cluster.former) > 0) ||
		  (use.former == TRUE && length(cluster.former) == 0)) {
		stop("To fetch the cluster relating to former, please set `use.former` = TRUE, and give valid clusters in `cluster.former`.")
	}
	if ((use.latter == FALSE && length(cluster.latter) > 0) ||
		  (use.latter == TRUE && length(cluster.latter) == 0)) {
		stop("To fetch the cluster relating to latter, please set `use.latter` = TRUE, and give valid clusters in `cluster.latter`.")
	}

	res.cg <- character()
	if (use.former == TRUE) {
		not.valid.cluster.X <- setdiff(cluster.former, this.former.allowed)
		if (length(not.valid.cluster.X) > 0) {
			warning("Given undefined clusters in X: ", paste0(not.valid.cluster.X, collapse = ", "), ". ")
		}
		cluster.former <- unique(intersect(cluster.former, this.former.allowed))
		former.res.list <- lapply(cluster.former, ref.names = this.allowed.names, function(x, ref.names) {
				grep(paste0("^", x), ref.names, value = TRUE)
			}
		)
		res.cg <- c(res.cg, as.character(unlist(former.res.list)))
	}

	if (use.latter == TRUE) {
		not.valid.cluster.Y <- setdiff(cluster.latter, this.latter.allowed)
		if (length(not.valid.cluster.Y) > 0) {
			warning("Given undefined clusters in Y: ", paste0(not.valid.cluster.Y, collapse = ", "), ". ")
		}
		cluster.latter <- unique(intersect(cluster.latter, this.latter.allowed))
		latter.res.list <- lapply(cluster.latter, ref.names = this.allowed.names, function(x, ref.names) {
				grep(paste0(x, "$"), ref.names, value = TRUE)
			}
		)
		res.cg <- c(res.cg, as.character(unlist(latter.res.list)))
	}

	# if nothing is specified return all cluster groups
	if (use.former == FALSE && length(cluster.former) == 0 &&
		use.latter == FALSE && length(cluster.latter) == 0) {
		res.cg <- this.allowed.names
	}

	res.cg <- unique(res.cg)
	return(res.cg)
}
ZJUDBlab/InterCellDB documentation built on March 19, 2023, 4:56 p.m.