cbce: Correlation Bi-community Extraction method
In miheerdew/cbce: Correlation Bi-Community Extraction method

View source: R/cbce.R

cbce	R Documentation

Correlation Bi-community Extraction method

Description

Consider two sets of high-dimensional measurements on the same set of samples. CBCE (Correlation Bi-Community Extraction method) finds sets of variables from the first measurement and sets of variables from the second measurement which are correlated to each other.

Usage

cbce(
  X,
  Y,
  alpha = 0.05,
  alpha.init = alpha,
  cov = NULL,
  cache.size = (utils::object.size(X) + utils::object.size(Y))/2,
  start_frac = 1,
  start_nodes = list(x = sample(1:ncol(X), ceiling(ncol(X) * start_frac)), y =
    sample(1:ncol(Y), ceiling(ncol(Y) * start_frac))),
  max_iterations = 20,
  size_threshold = 0.5 * exp(log(ncol(X))/2 + log(ncol(Y))/2),
  interaction = interaction_none,
  heuristic_search = FALSE,
  filter_low_score = TRUE,
  diagnostic = diagnostics
)

Arguments

`X, Y`	Numeric Matices. Represents the two groups of variables. Rows represent samples and columns represent variables.
`alpha`	`\in (0,1)`. Controls the type1 error for the update (for the multiple testing procedure).
`alpha.init`	`\in (0,1)` Controls the type1 error for the initialization step. This could be more liberal (i.e greater than) than the alpha for the update step.
`cov`	The covariates to account for; This should be a matrix with the same number of rows as X and Y. Each column represents a covariate whose effect needs to be removed. If this is null, no covariate will be removed.
`cache.size`	integer The amount of memory to dedicate for caching correlations. This will speed things up. Defaults to the average memory required by X and Y matrices
`start_frac`	`\in (0,1)` The random proportion of nodes to start extractions from. This is used to randomly sample `start_nodes`. If `start_node` is provided this parameter is ignored.
`start_nodes`	list The initial set of variables to start with. If this is provided, `start_frac` will be ignored. If Null, extractions are run starting from each varable from X and Y. Otherwise `start_node$x` gives the X variables to start from and `start_nodes$y` gives the Y variables to start from.
`max_iterations`	integer The maximum number of iterations per extraction. If a fixed point is not found by this step, the extraciton is terminated. This limit is set so that the program terminates.
`size_threshold`	The maximum size of bimodule we want to search for. The search will be terminated when sets grow beyond this size. The size of a bimodule is defined as the geometric mean of its X and Y sizes.
`interaction`	(internal) This is a function that will be called between extractions to allow interaction with the program. For instance one cas pass the function `interaction_gui` (EXPERIMENTAL) or `interaction_cli`.
`heuristic_search`	Use a fast, but incomplete, version of heuristic search that doesn't start from nodes inside bimodules already found.
`filter_low_score`	Should we remove bimodules with low score? (recommended).
`diagnostic`	(internal) This is a internal function for probing the internal state of the method. It will be called at special hooks and can look into what the method is doing. Pass either `diagnostics`, `diagnostics_none`.

Details

cbce applies an update function (mapping subsets of variables to subsets of variables) iteratively until a fixed point is found. These fixed points are reported as communities. The update starts from a single variable (the initialization step) and is repeated till either a fixed point is found or some set repeats. Each such run is called an extraction. Since the extraction only starts from singleton node, there are ncol(X)+ncol(Y) possible extractions.

Value

The return value is a list with the results and meta-data about the extraction. The most useful field is comms - this is a list of all the Correlation Bi-communities that was detected after filtering, while comms.fil consist of all the communities that were found after filtering similar communities.

Examples

library(cbce)
#Sample size
n <- 40
#Dimension of measurement 1
dx <- 20
#Dimension of measurement 2
dy <- 50
#Correlation strength
rho <- 0.5
set.seed(1245)
# Assume first measurement is gaussian
X <- matrix(rnorm(dx*n), nrow=n, ncol=dx)
# Measurements 3:6 in set 2 are correlated to 4:7 in set 1
Y <- matrix(rnorm(dy*n), nrow=n, ncol=dy)
Y[, 3:6] <- sqrt(1-rho)*Y[, 3:6] + sqrt(rho)*rowSums(X[, 4:5])
res <- cbce(X, Y)
#Recovers the indices 4:5 for X and 3:6 for Y
#If the strength of the correlation was higher
#all the indices could be recovered.
res$comms

miheerdew/cbce documentation built on Aug. 28, 2023, 2:18 p.m.