IMIX: IMIX

View source: R/IMIX.R

IMIXR Documentation

IMIX

Description

Fitting a multivariate mixture model framework, model selection for the best model, and adaptive procedure for FDR control. Input of summary statistics z scores or p values of two or three data types.

Usage

IMIX(
  data_input,
  data_type = c("p", "z"),
  mu_ini = NULL,
  sigma_ini = NULL,
  p_ini = NULL,
  tol = 1e-06,
  maxiter = 1000,
  seed = 10,
  ini.ind = TRUE,
  model = c("all", "IMIX_ind", "IMIX_cor_twostep", "IMIX_cor_restrict", "IMIX_cor"),
  model_selection_method = c("BIC", "AIC"),
  alpha = 0.2,
  verbose = FALSE,
  sort_label = TRUE
)

Arguments

data_input

An n x d data frame or matrix of the summary statistics z score or p value, n is the nubmer of genes, d is the number of data types. Each row is a gene, each column is a data type.

data_type

Whether the input data is the p values or z scores, default is p value

mu_ini

Initial values for the mean of the independent mixture model distribution. A vector of length 2*d, d is number of data types. Needs to be in a special format: for example, if d=3, needs to be in the format of (null_1,alternative_1,null_2,alternative_2,null_3,alternative_3).

sigma_ini

Initial values for the standard deviations of the two components in each data type. A vector of length 2*d, d is number of data types. Needs to be in a special format: for example, if d=3, needs to be in the format of (null_1,alternative_1,null_2,alternative_2,null_3,alternative_3).

p_ini

Initial values for the proportion of the distribution of the two components in each data type. A vector of length 2*d, d is number of data types. Needs to be in a special format: for example, if d=3, needs to be in the format of (null_1,alternative_1,null_2,alternative_2,null_3,alternative_3).

tol

The convergence criterion. Convergence is declared when the change in the observed data log-likelihood increases by less than epsilon.

maxiter

The maximum number of iteration, default is 1000

seed

Set.seed, default is 10

ini.ind

Use the parameters estimated from IMIX-ind for initial values of other IMIX models, default is TRUE

model

Which model to use to compute the data, default is all

model_selection_method

Model selection information criteria, based on AIC or BIC, default is BIC

alpha

Prespecified nominal level for global FDR control, default is 0.2

verbose

Whether to print the full log-likelihood for each iteration, default is FALSE

sort_label

Whether to sort the component labels in case component labels switched after convergence of the initial values, default is TRUE, notice that if the users chooose not to, they might need to check the interested IMIX model for the converged mean for the true component labels and perform the adaptive FDR control separately for an acurate result

Value

A list of results of IMIX

IMIX_ind

Results of IMIX_ind, assuming all data types are independent

IMIX_cor_twostep

Results of IMIX_cor_twostep, by default the mean is the estimated value of IMIX_ind. If the users are interested to use another mean input, they could directly use function IMIX_cor_twostep and specify the mean

IMIX_cor

Results of IMIX_cor

IMIX_cor_restrict

Results of IMIX_cor_restrict

AIC/BIC

The AIC and BIC values of all fitted models

Selected Model

The model with the smallest AIC or BIC value, this is determined by user specifications in the function input "model_selection_method", by default is BIC

significant_genes_with_FDRcontrol

The output of each gene ordered by the components based on FDR control and within each component ordered by the local FDR, "localFDR" is 1-posterior probability of each gene in the component based on the maximum posterior probability, "class_withoutFDRcontrol" is the classified component based on maximum posterior probability, "class_FDRcontrol" is the classified component based on the across-data-type FDR control at alpha level

estimatedFDR

The estimated marginal FDR value for each component starting from component 2 (component 1 is the global null)

alpha

Prespecified nominal level for the across-data-type FDR control

References

Ziqiao Wang and Peng Wei. 2020. “IMIX: a multivariate mixture model approach to association analysis through multi-omics data integration.” Bioinformatics. <doi:10.1093/bioinformatics/btaa1001>.

Tatiana Benaglia, Didier Chauveau, David R. Hunter, and Derek Young. 2009. “mixtools: An R Package for Analyzing Finite Mixture Models.” Journal of Statistical Software 32 (6): 1–29. https://www.jstatsoft.org/v32/i06/.

Examples

# A toy example
data("data_p")
set.seed(10)
data <- data_p[sample(1:1000,200,replace = FALSE),]
mu_input <- c(0,3,0,3)
sigma_input <- rep(1,4)
p_input <- rep(0.5,4)
test <- IMIX(data_input = data,data_type = "p",mu_ini = mu_input,sigma_ini = sigma_input,
             p_ini = p_input,alpha = 0.1,model_selection_method = "BIC",
             sort_label = FALSE,model = "IMIX_ind")


# The details of this example can be found in Github vignette
# First load the data
data("data_p")

# Specify initial values (this step could be omitted)
mu_input <- c(0,3,0,3)
sigma_input <- rep(1,4)
p_input <- rep(0.5,4)

# Fit IMIX model
test1 <- IMIX(data_input = data_p,data_type = "p",mu_ini = mu_input,sigma_ini = sigma_input,
p_ini = p_input,alpha = 0.1,model_selection_method = "AIC")

#Results
# Print the estimated across-data-type FDR for each component
test1$estimatedFDR

# The AIC and BIC values for each model
test1$`AIC/BIC` 

# The best fitted model selected by AIC
test1$`Selected Model` 

# The output of IMIX_cor_twostep
str(test1$IMIX_cor_twostep) 

# The output of genes with local FDR values and classified components
dim(test1$significant_genes_with_FDRcontrol)
head(test1$significant_genes_with_FDRcontrol)


IMIX documentation built on July 14, 2022, 1:05 a.m.

Related to IMIX in IMIX...