adapt_gmm: Adaptive P-Value Thresholding for Multiple Hypothesis Testing...

Description Usage Arguments Details

View source: R/adapt.R

Description

Fits a Gaussian Mixture model to the distribution of test statistics and returns rejections and fitted parameters.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
adapt_gmm(
  x = NULL,
  pvals = NULL,
  z = NULL,
  se = NULL,
  testing = "one_sided",
  rendpoint = NULL,
  lendpoint = NULL,
  beta_formulas = NULL,
  custom_beta_model = NULL,
  model_type = "nnet",
  nclasses = c(2, 3, 4),
  niter_fit = 5,
  niter_ms = 10,
  nfits = 20,
  alpha_m = NULL,
  zeta = NULL,
  lambda = NULL,
  masking_shape = "tent",
  alphas = seq(0.01, 1, 0.01),
  target_alpha_level = NULL,
  cr = "AIC",
  randomize_pvals = FALSE,
  tol = 1e-04,
  symmetric_modeling = FALSE,
  intercept_model = TRUE,
  return_all_models = FALSE
)

Arguments

x

Data frame of covariates

pvals

Vector of p-values (supply either pvals or test statistics)

z

Vector of test statistics, required if testing='interval'.

se

Vector of standard errors, if left blank when given test statistics, the standard errors are assumed to be 1.

testing

The form of testing procedure, "one_sided", "two_sided", or "interval". Default is "one_sided".

rendpoint

Corresponds to right endpoint of null hypothesis interval. Required if testing='interval'.

lendpoint

Corresponds to left endpoint of null hypothesis interval. If interval testing and lendpoint is blank, lendpoint will be assumed to be -rendpoint.

beta_formulas

List of formulas for the beta model, e.g. paste("splines::ns(x, df = ",c(2,4,6)," )")

custom_beta_model

Optional function to use custom beta model instead of one of the defaults. More details in the vignette.

model_type

Type of model used for modeling beta, options include gam, glm, nnet. Default is nnet.

nclasses

Vector of number of classes in Gaussian Mixture model. The vector corresponds to the possible number of classes to select in the model selection procedure. Minimum number of classes is 2. Note: recommended to use <5 classes. Default is c(2,3,4). The greater the number of degrees of freedom the longer it takes the EM procedure to fit, and the longer the list of possible values, the longer the model selection procedure takes.

niter_fit

Number of iterations of EM per model update.

niter_ms

Number of iterations of EM in model selection.

nfits

Number of model fitting steps.

alpha_m

The maximum possible rejected p-value. We recommend 0.01≤ α_m ≤ 0.1, default is 0.1.

lambda

Controls where p-values are mirrored, boundary of blue region. TODO: Fix wording. We recommend 0.3≤λ≤ 0.5, default is 0.4 This is the most expensive part of the procedure, we recommend smaller number (<5) of iterations for larger problems. Default is 10.

masking_shape

Controls the shape of the masking function, either "tent" or "comb" masking functions. Default is "tent".

alphas

Vector of FDR levels of interest. Default is [0.01,0.02,...,0.89,0.9].

target_alpha_level

Desired FDR level to optimize the procedure over, i.e.

cr

Type of selection criterion in model_selection. Options include "BIC", "AIC", "AICc", "HIC", "cross_validation". Default is "AIC".

randomize_pvals

Boolean for whether to randomize blue p-values, recommended if p_values violates assumptions. Replaces blue p-values with uniform draw in the blue interval. Defaults to FALSE.

tol

Positive scalar for early stopping if mu and tau do not update by more than tol.

symmetric_modeling

Boolean for whether to model the distribution of test statistics with a symmetric model. Only valid for two sided or interval testing.

intercept_model

Boolean. Include intercept only model in the model selection, default is TRUE.

return_all_models

Boolean, whether to return all models used at various alpha levels. Default FALSE. Required TRUE for plot_nn_masking. Warning, can be expensive to store all models for large problems.

verbose

Boolean. Include print statements at each stage of the procedure.

Details

The constraint on these masking function parameters is

0< α_m ≤ λ <λ+ α_mζ≤ 1.

Setting alpha_m to 0.5, lambda to 0.5, zeta to 1, and masking_shape to "tent" results in the AdaPT masking function.


patrickrchao/AdaPTGMM documentation built on Oct. 22, 2021, 7:49 a.m.