sort_results: sort_results
In jpkrooney/rcorex: An Implementation of Total Correlation Explanation

sort_results

R Documentation

sort_results

Description

Internal function to sort biocorex results for output to user.

Usage

sort_results(
  data,
  cl,
  n_hidden,
  dim_visible,
  marginal_description,
  smooth_marginals,
  tcs,
  alpha,
  p_y_given_x_3d,
  theta,
  log_p_y,
  log_z,
  tc_history,
  names,
  state,
  logpx_method
)

Arguments

`data`	Data provided by user
`cl`	User call to biocorex
`n_hidden`	An integer number of hidden variables to search for.
`dim_visible`	The dimension of the data provided when discrete marginal distribution is specified - i.e. the number of discrete levels that exist in the data. Must be positive integer.
`marginal_description`	Character string which determines the marginal distribution of the data. single marginal description applies to all variables in biocorex.
`smooth_marginals`	Boolean (TRUE/FALSE) which indicates whether Bayesian smoothing of marginal estimates should be used.
`tcs`	Vector of length n_hidden - contains the TC for each hidden factor. This is used to decided the sort order for all the other parameters such that hidden factors are returned to used in order of largest TC to smallest TC.
`alpha`	Adjacency matrix between input variables and hidden variables. In range [0,1].
`p_y_given_x_3d`	A 3D array of numerics in range (0, 1), that represent the probability of n_hidden latent variables of dimension dim_hidden, for each observed x variable with dimensions (n_hidden, n_samples, dim_hidden)
`theta`	List of estimated parameters
`log_p_y`	A 2D matrix representing the log of the marginal probability of the latent variables.
`log_z`	A 2D matrix containing the pointwise estimate of total correlation explained by each latent variable for each sample - this is used to estimate overall total correlation.
`tc_history`	A list that records the TC results for each iteration of the algorithm. Used to calculate if convergence has been reached.
`names`	A vector of the variables names of the supplied data.
`state`	A string that describes the final state of corex (i.e. "Converged", "Negative tcs", "Unconverged").
`logpx_method`	EXPERIMENTAL - A character string that controls the method used to calculate log_p_xi. "pycorex" uses the same method as the Python version of biocorex, "mean" calculates an estimate of log_p_xi by averaging across n_hidden estimates.

Value

Returns list of corex algorthim results sorted in descending order by TC of the latent variables. The list includes the following elements:

data - the user data supplied in call to corex.
call - the call used to run corex.
tcs - a vector of TC for n_hidden variables.
alpha - a 2D adjaceny matrix of connections between input variables and hidden variables.
p_y_given_x - a 3D array of numerics in range (0, 1), that represent the probability that each observed x variable belongs to n_hidden latent variables of dimension dim_hidden. p_y_given_x has dimensions (n_hidden, n_samples, dim_hidden).
theta - a list of the estimated parameters
log_p_y - a 2D matrix representing the log of the marginal probability of the latent variables.
log_z - a 2D matrix containing the pointwise estimate of total correlation explained by each latent variable for each sample - this is used to estimate overall total correlation.
dim_visible - only present if discrete marginals were specified. Lists the number of discrete levels that exist in the data.
iterations - the number of iterations for which the algorithm ran.
tc_history - a list that records the TC results for each iteration of the algorithm.
marginal_description - a character string which determines the marginal distribution of the data.
mis - an array that specifies the mutual information between each observed variable and hidden variable.
clusters - a vector that assigns a hidden variable label to each input variable.
labels - a 2D matrix of dimensions (nrow(data), n_hidden) that assigns a dimension label for each hidden variable to each row of data.