happi: Main function for happi, p=q=1; this script contains the...
In statdivlab/happi: a hierarchical approach to pangenomics inference

happi

R Documentation

Main function for happi, p=q=1; this script contains the modularized version of happi with correct implementation of log likelihood

Description

Main function for happi, p=q=1; this script contains the modularized version of happi with correct implementation of log likelihood

Usage

happi(
  outcome,
  covariate = NULL,
  h0_param = 2,
  quality_var = NULL,
  covariate_formula = NULL,
  covariate_formula_h0 = NULL,
  quality_var_formula = NULL,
  data = NULL,
  max_iterations = 1000,
  min_iterations = 15,
  change_threshold = 0.05,
  epsilon = 0,
  method = "splines",
  random_starts = FALSE,
  firth = TRUE,
  spline_df = 3,
  nstarts = 1,
  seed = 13,
  norm_sd = 1,
  run_npLRT = FALSE,
  P = NULL,
  verbose = TRUE
)

Arguments

`outcome`	length-n vector; this is the vector of a target gene's presence/absence; should be coded as 0 or 1
`covariate`	n x p matrix; this is the matrix for the primary predictor/covariate of interest
`h0_param`	the column index in covariate that has beta=zero under the null
`quality_var`	length-n vector; this is the quality variable vector, currently p = 1 TODO(turn into n x q matrix)
`covariate_formula`	alternative to `covariate` argument, a formula for covariates of the form `~ covariate1 + covariate2 + ...`, requires `data` argument
`covariate_formula_h0`	alternative to `h0_param` argument, a formula for covariates in the null model, takes the form `~ 1` for an intercept-only model, requires `data` argument
`quality_var_formula`	alternative to `quality_var` argument, a formula for quality variable of the form `~ quality_var`, requires `data` argument
`data`	required with `formula` arguments, a data frame including covariates and the quality variable
`max_iterations`	the maximum number of EM steps that the algorithm will run for
`min_iterations`	the minimum number of EM steps that the algorithm will run for
`change_threshold`	algorithm will terminate early if the likelihood changes by this percentage or less for 5 iterations in a row for both the alternative and the null
`epsilon`	probability of observing a gene when it should be absent; probability between 0 and 1; default is 0. Either a single value or a vector of length n.
`method`	method for estimating f. Defaults to "splines" which fits a monotone spline with df determined by argument spline_df; "isotone" for isotonic regression fit
`random_starts`	whether to pick the starting values of beta's randomly. Defaults to FALSE.
`firth`	use firth penalty? Default is TRUE.
`spline_df`	degrees of freedom (in addition to intercept) to use in monotone spline fit; default 3
`nstarts`	number of starts; Integer. Defaults to `1`. Number of starts for optimization.
`seed`	numeric number to set seed for random multiple starts
`norm_sd`	positive number to set as the standard deviation for the Normal distribution used to draw initial parameter values from.
`run_npLRT`	logical, if TRUE, non-parametric permutation LRT test will also be run.
`P`	if `run_npLRT` is TRUE, number of permutations to run
`verbose`	TRUE to return all information generated by happi, FALSE to only return effect size and p-value

Value

An object of class happi.

Examples

data(TM7_data)
x_matrix <- model.matrix(~tongue, data = TM7_data)
happi_results <- happi (outcome = TM7_data$`Cellulase/cellobiase CelA1`,
covariate=x_matrix, 
quality_var=TM7_data$mean_coverage,
max_iterations=1000, 
change_threshold=0.1,
epsilon=0, 
nstarts = 1, 
spline_df = 3)

statdivlab/happi documentation built on April 19, 2024, 2:04 a.m.