betaMix: Fit a two-component beta mixture model to the matrix of all...

View source: R/betaMix.R

betaMixR Documentation

Fit a two-component beta mixture model to the matrix of all pairwise correlations.

Description

From the pairwise correlations, the function calculates the statistics z_j=sin^2(arccos(cor(y_i,y_j))) and fits the two-component model using the EM algorithm.

Usage

betaMix(
  M,
  dbname = NULL,
  tol = 1e-04,
  calcAcc = 1e-09,
  maxalpha = 1e-04,
  ppr = 0.05,
  mxcnt = 200,
  ahat = 8,
  bhat = 3,
  bmax = 0.999,
  subsamplesize = 50000,
  seed = 912469,
  ind = TRUE,
  msg = TRUE
)

Arguments

M

A matrix with N rows (samples) and P columns (variables).

dbname

The sqlite database, if one is used to store the pairwise correlation data instead of using the cor function and storing the cor(M) matrix in memory (for situations in which P is very large).

tol

The convergence threshold for the EM algorithm (default=1e-4, but taken to be the maximum of the user's input and 1/(P(P-1)/2)).

calcAcc

The calculation accuracy threshold (to avoid values greater than 1 when calling asin.) Default=1e-9.

maxalpha

The probability of Type I error (default=1e-4). For a large P, use a much smaller value.

ppr

The null posterior probability threshold (default=0.05).

mxcnt

The maximum number of EM iterations (default=200).

ahat

The initial value for the first parameter of the nonnull beta distribution (default=8).

bhat

The initial value for the second parameter of the nonnull beta distribution (default=3).

bmax

The RHS of the support of the non-null component (default=0.999)

subsamplesize

If greater than 20000, take a random sample of size subsamplesize to fit the model. Otherwise, use all the data (default=50000).

seed

The random seed to use if selecting a subset with the subsamplesize parameter (default=912469).

ind

Whether the N samples should be assumed to be independent (default=TRUE).

msg

Whether to print intermediate output messages (default=TRUE).

Value

A list with the following:

  • angleMat A PxP matrix with angles between pairs of vectors. If the correlation data is stored in SQLite, then the returned value is database name.

  • z_j The statistics z_j=sin^2(angles).

  • m_0 The posterior null probabilities.

  • p_0 The estimated probability of the null component.

  • ahat The estimated first parameter of the nonnull beta component.

  • bhat The estimated second parameter of the nonnull beta component.

  • N The sample size.

  • etahat If the samples are not assumed to be independent, this corresponds to the effective sample size, ESS=2*etahat+1

  • bmax The user-defined right-hand side of the support of the non-null component.

  • ppthr The estimated posterior probability threshold, under which all the z_j correspond to nonnull edges.

  • P The number of nodes.

  • edges The number of edges found.

  • cnt The number of EM iterations.

Examples

## Not run: 
   data(SIM,package = "betaMix") # variables correspond to columns, samples to rows
   res <- betaMix(betaMix::SIM, maxalpha = 1e-6,ppr = 0.01,subsamplesize = 30000, ind=TRUE)
   plotFittedBetaMix(res)

## End(Not run)

haimbar/betaMix documentation built on Jan. 3, 2023, 12:54 p.m.