cbce | R Documentation |
Consider two sets of high-dimensional measurements on the same set of samples. CBCE (Correlation Bi-Community Extraction method) finds sets of variables from the first measurement and sets of variables from the second measurement which are correlated to each other.
cbce(
X,
Y,
alpha = 0.05,
alpha.init = alpha,
cov = NULL,
cache.size = (utils::object.size(X) + utils::object.size(Y))/2,
start_frac = 1,
start_nodes = list(x = sample(1:ncol(X), ceiling(ncol(X) * start_frac)), y =
sample(1:ncol(Y), ceiling(ncol(Y) * start_frac))),
max_iterations = 20,
size_threshold = 0.5 * exp(log(ncol(X))/2 + log(ncol(Y))/2),
interaction = interaction_none,
heuristic_search = FALSE,
filter_low_score = TRUE,
diagnostic = diagnostics
)
X, Y |
Numeric Matices. Represents the two groups of variables. Rows represent samples and columns represent variables. |
alpha |
|
alpha.init |
|
cov |
The covariates to account for; This should be a matrix with the same number of rows as X and Y. Each column represents a covariate whose effect needs to be removed. If this is null, no covariate will be removed. |
cache.size |
integer The amount of memory to dedicate for caching correlations. This will speed things up. Defaults to the average memory required by X and Y matrices |
start_frac |
|
start_nodes |
list The initial set of variables to start with.
If this is provided, |
max_iterations |
integer The maximum number of iterations per extraction. If a fixed point is not found by this step, the extraciton is terminated. This limit is set so that the program terminates. |
size_threshold |
The maximum size of bimodule we want to search for. The search will be terminated when sets grow beyond this size. The size of a bimodule is defined as the geometric mean of its X and Y sizes. |
interaction |
(internal) This is a function that will be called
between extractions to allow interaction with the program.
For instance one cas pass the function |
heuristic_search |
Use a fast, but incomplete, version of heuristic search that doesn't start from nodes inside bimodules already found. |
filter_low_score |
Should we remove bimodules with low score? (recommended). |
diagnostic |
(internal) This is a internal function for
probing the internal state of the method. It will be
called at special hooks and can look into what the method is doing.
Pass either |
cbce
applies an update function (mapping subsets of
variables to subsets of variables) iteratively until a fixed point
is found. These fixed points are reported as communities.
The update starts from a single variable (the initialization step)
and is repeated till either a fixed point is found or some set
repeats. Each such run is called an extraction. Since the extraction
only starts from singleton node, there are ncol(X)+ncol(Y)
possible extractions.
The return value is a list with the results and
meta-data about the extraction. The most useful field is
comms
- this is a list of all the Correlation Bi-communities
that was detected after filtering, while comms.fil
consist of all the communities that were found after
filtering similar communities.
library(cbce)
#Sample size
n <- 40
#Dimension of measurement 1
dx <- 20
#Dimension of measurement 2
dy <- 50
#Correlation strength
rho <- 0.5
set.seed(1245)
# Assume first measurement is gaussian
X <- matrix(rnorm(dx*n), nrow=n, ncol=dx)
# Measurements 3:6 in set 2 are correlated to 4:7 in set 1
Y <- matrix(rnorm(dy*n), nrow=n, ncol=dy)
Y[, 3:6] <- sqrt(1-rho)*Y[, 3:6] + sqrt(rho)*rowSums(X[, 4:5])
res <- cbce(X, Y)
#Recovers the indices 4:5 for X and 3:6 for Y
#If the strength of the correlation was higher
#all the indices could be recovered.
res$comms
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.