zi_fit_pms: Fits a Hurdle conditional model with pms parametrization of...

Description Usage Arguments Details Value Examples

View source: R/zero_fit_pms.R

Description

Fits a Hurdle conditional model with pms parametrization of specified degree.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
zi_fit_pms(
  V,
  Y,
  left,
  right,
  extra_regressors = NULL,
  extra_reg_pen_factors = NULL,
  p_V_degree = 1,
  p_Y_degree = 1,
  p_Y_V_degree = 1,
  mu_V_degree = 1,
  mu_Y_degree = 1,
  mu_Y_V_degree = 1,
  value_only = TRUE,
  tol = 1e-08,
  maxit = 1e+05,
  seed = NULL,
  penalize_decider = function(X) {     ncol(X) >= nrow(X)/2 },
  nfits = 10,
  runs = 2
)

Arguments

V

A matrix of 0/1s, equal to Y != 0.

Y

A data matrix of the same size as V.

left

An integer between 1 and ncol(Y). The index of the variable to be fit.

right

A vector of integers between 1 and ncol(Y) different from left. Indices of the "regressors".

extra_regressors

A matrix with the same number of rows as V and Y, extra regressors to be included in both regressions (conditional log odds/conditional mean). Defaults to NULL.

extra_reg_pen_factors

A vector of non-negative numbers, defaults to NULL. Penalty factors for extra_regressors. If the main design matrix has d columns, c(rep(1, d), extra_reg_pen_factors) will be passed as the penalty.factor argument to glmnet::glmnet(). If intercept == TRUE, a 0 will also be prepended.

p_V_degree

A non-negative integer, the degree for the Vo in the Hurdle polynomial for the conditional log odds. Defaults to 1.

p_Y_degree

A non-negative integer, the degree for the Yo in the Hurdle polynomial for the conditional log odds. Defaults to 1.

p_Y_V_degree

A non-negative integer, the degree for interaction between Vo and Yo in the Hurdle polynomial for the conditional log odds. Defaults to 1. If equal to 1, no interaction will be included (since it would be either a pure V term or a pure Y term).

mu_V_degree

A non-negative integer, the degree for the Vo in the Hurdle polynomial for the conditional mean. Defaults to 1.

mu_Y_degree

A non-negative integer, the degree for the Yo in the Hurdle polynomial for the conditional mean. Defaults to 1.

mu_Y_V_degree

A non-negative integer, the degree for interaction between Vo and Yo in the Hurdle polynomial for the conditional mean. Defaults to 1. If equal to 1, no interaction will be included (since it would be either a pure V term or a pure Y term).

value_only

If TRUE, returns the minimized negative log likelihood only. Defaults to TRUE.

tol

A number, tolerance. Defaults to 1e-8. Passed to stats::glm() for penalized logistic regressions, or as the thresh argument to glmnet::glmnet() for both logistic and linear regressions if penalized.

maxit

An integer, the maximum number of iterations. Defaults to 100000. Passed to stats::glm() for penalized logistic regressions, or to glmnet::glmnet() for both logistic and linear regressions if penalized.

seed

A number, the random seed passed to zi_fit_lm() for both regressions (conditional log odds/conditional mean).

penalize_decider

A logical or a function that takes a design matrix and returns a logical. Defaults to function(X){ncol(X)>=nrow(X)/2}. Used to decide whether to use penalized l2 (ridge) regression (if TRUE) when fitting each conditional distribution. Note that for either regression (conditional log odds/conditional mean), if the fits for unpenalized regressions are almost perfect, penalized regressions will be automatically used.

nfits

A positive integer, defaults to 10. Used for penalized regressions, as number of folds if CV_BIC == TRUE (nfits argument to glmnet::cv.glmnet(), with nlambda set to 100), or the number of lambdas if BIC == FALSE (as the nlambda argument to glmnet::glmnet()).

runs

A positive integer, the number of reruns. The fit with the maximum likelihood will be returned. Defaults to 2.

Details

A Hurdle conditional model with pms parametrization for the left node given those in right has log density with respect to the sum of the Lebesgue measure and a point mass at 0 equal to (in terms of y) log(1-p) if y == 0, or log(p)-(y-mu)^2/2/sigmasq otherwise. That is, it is a mixture of a binomial with probability of success p and a Gaussian with conditional mean mu and conditional variance sigmasq. Here sigmasq is assumed constant, and parameters log(p/(1-p)) and mu are Hurdle polynomials, i.e. polynomials in the values for right and their indicators. This function thus fits such a model using Y[,left], Y[,right] and V[,right] = (Y[,right] != 0), using a logistic for the log odds log(p/(1-p)) and a linear regression for mu.

Writing Yo <- Y[,right], a Hurdle polynomial in parents Yo is a polynomial in Yo and their 0/1 indicators Vo. The V_degree of a term that is a product of some columns of Vo only is the number of parents that appears in it. For example, V1 * V2 * V3 has V_degree equal to 3. Note that V1^p is equal to V1 for any p >= 1 so it does not make sense to include a power. The Y_degree of a term that is a product of powers of some columns of Yo only is the degree of a polynomial in its usual sense. For example, Y1^2 * Y2 * Y3^3 has Y_degree equal to 2+1+3=6. The Y_V_degree of a term that involves both some columns of Vo and some of Yo is the sum of the V_degree of the V part and the Y_degree of the Y part. For example, Y1^2 * V2 * Y3^3 * V4 * V5 has Y_V_degree equal to 2+1+3+1+1=8. The design matrix thus includes all possible terms with V_degree, Y_degree, Y_V_degree less than or equal to those specified. For example, if Vo and Yo has two columns and V_degree == 2, Y_degree == 2, Y_V_degree == 2, the design matrix has columns V1, V2, V1*V2, Y1, Y2, Y1*Y2, Y1^2, Y2^2, Y1*V2, Y2*V1. Note that terms like V1*Y1 are not included as it is equivalent to Y1. Parameters p_V_degree, p_Y_degree, p_Y_V_degree, mu_V_degree, mu_Y_degree, and mu_Y_V_degree specify these degrees for the regressions for the log odds log(p/(1-p)) and the conditional mean mu, respectively.

For automatically choosing a uniform degree <= a specified maximum degree, please use zi_fit_pms_choose_degree().

Value

If value_only == TRUE, returns the minimized negative log likelihood only. Otherwise, returns

nll

A number, the minimized negative log likelihood.

par

A vector of length 4*length(right)+3, the fitted parameters, in the other of: the intercept for the a (a scalar), linear coefficients on V[,right] for a, linear coefficients on Y[,right] for a, the intercept for the b (a scalar), linear coefficients on V[,right] for b, linear coefficients on Y[,right] for b.

n

An integer, the sample size.

effective_df

4*length(right)+3, the effective degree of freedom.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
m <- 3; n <- 1000
adj_mat <- make_dag(m, "complete")
dat <- gen_zero_dat(1, "pms", adj_mat, n, k_mode=1, min_num=10, gen_uniform_degree=1)
extra_regressors <- matrix(rnorm(n * 4), nrow=n)
extra_reg_pen_factors <- c(1, 2, 3, 4) / sum(c(1, 2, 3, 4))
zi_fit_pms(dat$V, dat$Y, 3, 1:2, extra_regressors=extra_regressors,
    extra_reg_pen_factors=extra_reg_pen_factors, p_V_degree=2, p_Y_degree=2,
    p_Y_V_degree=2, mu_V_degree=2, mu_Y_degree=2, mu_Y_V_degree=2, value_only=TRUE)
zi_fit_pms(dat$V, dat$Y, 3, 1:2, extra_regressors=extra_regressors,
    extra_reg_pen_factors=extra_reg_pen_factors, p_V_degree=2, p_Y_degree=2,
    p_Y_V_degree=2, mu_V_degree=2, mu_Y_degree=2, mu_Y_V_degree=2, value_only=FALSE)

sqyu/ZiDAG documentation built on Jan. 19, 2021, 4:11 p.m.