biocorex | R Documentation |
Function which implements the CorEx algorithm for data with features of typical biomedical data such as continuous variables, missing data and under-sampled data.
biocorex( data, n_hidden = 1, dim_hidden = 2, marginal_description = "gaussian", smooth_marginals = FALSE, eps = 1e-06, verbose = FALSE, repeats = 1, return_all_runs = FALSE, max_iter = 100, logpx_method = "pycorex" )
data |
Data provided by user. For biocorex data can either be continuous (gaussian) or discrete (consectutive integers 0, 1, 2, 3...etc). Data types cannot by mixed in this implementation. |
n_hidden |
An integer number of hidden variables to search for. Default = 1. |
dim_hidden |
Each hidden unit can take |
marginal_description |
Character string which determines the marginal distribution of the data. single marginal description applies to all variables in biocorex. Can be "gaussian" or "discrete". Default is "gaussian". |
smooth_marginals |
Boolean (TRUE/FALSE) which indicates whether Bayesian smoothing of marginal estimates should be used. |
eps |
The maximal change in TC across 10 iterations needed signal convergence |
verbose |
Default FALSE. If TRUE, biocorex feeds back to user the iteration count and TCS each iteration. Useful to see progression if fitting a larger dataset. |
repeats |
How many times to run biocorex on the data using random initial values. Corex will return the run which leads to the maximum TC. Default is 1. For a new dataset, recommend to leave it as 1 to see how long biocorex takes, however for more trustworthy results a higher numbers recommended (e.g. 25). |
return_all_runs |
Default FALSE. If FALSE biocorex returns a single object of class rcorex. If TRUE biocorex returns all runs of biocorex as a list - the length of which = |
max_iter |
numeric. Maximum number of iterations before ending. Default = 100 |
logpx_method |
EXPERIMENTAL - A character string that controls the method used to calculate log_p_xi. If "pycorex" uses the same method as the Python version of biocorex, if set to "mean" calculates an estimate of log_p_xi by averaging across n_hidden estimates. NOTE, that mean may become the default option after further testing. |
This function is a port of the original biocorex function in Python by Greg Ver Steeg: https://github.com/gregversteeg/bio_corex. Reference: Greg Ver Steeg and Aram Galstyan. "Discovering Structure in High-Dimensional Data Through Correlation Explanation." NIPS, 2014. arXiv preprint arXiv:1406.1222.
Returns either a rcorex object or a list of repeated runs as determined by the return_all_runs
argument. An rcorex object is a list that contains the following components:
#'
data - the user data supplied in call to corex.
call - the call used to run corex.
tcs - a vector of TC for n_hidden variables.
alpha - a 2D adjaceny matrix of connections between input variables and hidden variables.
p_y_given_x - a 3D array of numerics in range (0, 1), that represent the probability that each observed x variable belongs to n_hidden latent variables of dimension dim_hidden. p_y_given_x has dimensions (n_hidden, n_samples, dim_hidden).
theta - a list of the estimated parameters
log_p_y - a 2D matrix representing the log of the marginal probability of the latent variables.
log_z - a 2D matrix containing the pointwise estimate of total correlation explained by each latent variable for each sample - this is used to estimate overall total correlation.
dim_visible - only present if discrete marginals were specified. Lists the number of discrete levels that exist in the data.
iterations - the number of iterations for which the algorithm ran.
tc_history - a list that records the TC results for each iteration of the algorithm.
marginal_description - a character string which determines the marginal distribution of the data.
mis - an array that specifies the mutual information between each observed variable and hidden variable.
clusters - a vector that assigns a hidden variable label to each input variable.
labels - a 2D matrix of dimensions (nrow(data), n_hidden)
that assigns a dimension label for each hidden variable to each row of data.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.