penZINB: Penalized zero-inflated negative binomial regression

Description Usage Arguments Details Value See Also

View source: R/penZINB.R

Description

Perform variable selection for ZINB regression via penalized maximum likelihood.

Usage

1
2
3
4
5
6
7
penZINB(y, X, unpenalizedx = NULL, unpenalizedz = NULL, lambdas = NULL,
  taus = NULL, nlambda = 30, ntau = 5, naPercent = 0.4, maxIT = 1000,
  maxIT2 = 25, track = NULL, theta.st = NULL, stepThrough = NULL,
  optimType = "EM", loud = NULL, warmStart = FALSE, bicgamma = NULL,
  irlsConv = FALSE, weightedPen = TRUE, numericalDeriv = FALSE,
  pfactor = 0.01, oneTheta = FALSE, maxOptimIT = 50, eps = 1e-05,
  convType = 1, start = NULL, order = FALSE, penType = 1)

Arguments

y

zero-inflated count response

X

covariate matrix. Intercept is added within the function. This could take '1' as the input which indicates an intercept-only model.

unpenalizedx, unpenalizedz

Additional unpenalized covariates for negative binomial and logistic regression respectively. Default is NULL.

lambdas, taus

specific tuning parameter values you want to run the model with. Default is NULL where the function will auto-generate a tuning parameter search grid. If default is used, must have input for nlambda and ntau.

nlambda, ntau

number of unique lambda and tau values - default are 30 and 5.

naPercent

allowable percentage of observations with missing values - default is .4.

maxIT

maximum number of EM iterations - default is 1000.

maxIT2

maximum number of iterations for updating the coefficients in the regression model - default is 25.

track

default is NULL (deactivated). Otherwise, it takes a single integer value which activates tracking mode for that tuning parameter pair. See output change details below.

theta.st

default is NULL (deactivated) where theta estimation is done using MLE. Otherwise, takes a single value for theta to hold constant for all estimation.

stepThrough

default is NULL (deactivated). Otherwise needs to be a length 2 vector to activate debugging mode. The first number is the theta iteration and second is the EM iteration to prompt stepthrough debugger.

optimType

options are "EM" and "optim". Default is "EM" which runs the EM algorithm prior to using BFGS optimization. "optim" skips the EM algorithm.

loud

default is NULL (deactivated). Otherwise takes a positive integer x to announce at every xth iteration of EM and numerical optimization algorithm.

warmStart

default is FALSE, which uses the same starting point for all tp. Other options are 'cond', which resets the the starting point to the original starting point when non-convergence happens. TRUE keeps previous estimates as starting points for estimation for the next tuning parameter.

bicgamma

the parameter used in the extended BIC. Default is NULL, which uses the log(the dimension)/log(the sample size).

irlsConv

forces each estimate of beta and gamma to converge first if set to TRUE. Default is FALSE.

weightedPen

default is TRUE. Weights the penalty using the Hessian.

numericalDeriv

default is FALSE. Calculates the Hessian numerically when set to TRUE. Otherwise, calculates it analytically.

pfactor

default is 1e-2. The multiplier for the largest calculated penalty to determine smallest penalty value. Use in conjunction with nlambda/ntau to control the granularity of the tp grid.

oneTheta

default is FALSE (deactivated). If set to TRUE, only estimates theta once per tuning parameter pair.

maxOptimIT

maximum number of iterations for numerical optimization (BFGS) after the EM algorithm. By default is set to 50. Convergence time is long.

eps

threshold for convergence for the EM algorithm - default is 1e-5.

convType

manages the order of convergence within the EM algorithm. Options are 1 (default) and 2. Type 1 forces convergence of the binomial and negative binomial parts together. Type 2 forces convergence of binomial part first, then negative binomial part.

start

default is NULL which sets starting coefficients values to 0. If set to 'jumpstart', then will estimate the starting coefficients from penalized negative binomial estimation and logistic regression based on the penalized library. Otherwise, can also take direct input for starting values. Must be in the form of list(betas = v1, gammas = v2), where v1 and v2 are vectors the length of the number of covariates in X.

order

default is FALSE. If TRUE, then order of estimation is ordered by marginal correlation with response.

penType

options are 1 (default) or 2. 1 is the group log penalty. 2 is lasso.

Details

If tracking, this function returns a nested list of all estimated with the following hierarchichy:

with the following values: loglik, loglik.em, loglikZI, loglikNB, pen, betas, gammas.

Value

A list with each element corresponding to each tuning parameter pair. Each element contains the following components:

X

The design matrix used for calculations. Non-empty only for the first tuning parameter pair.

betas

Non-zero beta coefficients corresponding to betas.w.

gammas

Non-zero gamma coefficients corresponding to gammas.w.

loglik.obs

Observed data log likelihood at convergence.

pen

Value of the penalty at convergence.

theta.r

Theta path.

theta

Theta estimate at convergence.

BIC

BIC value at convergence.

extBIC

Extended BIC value at convergence.

extBICGG

Extended BIC GG value at convergence.

lambda, tau

Tuning parameter pair used.

See Also

gentp for generating tuning parameters of lambdas and taus.


yliu433/scZINB documentation built on Nov. 30, 2020, 9:07 p.m.