zi_fit_pms_choose_degree: Fits and chooses a Hurdle conditional model with pms...

Description Usage Arguments Details Value Examples

View source: R/zero_fit_pms.R

Description

Fits and chooses a Hurdle conditional model with pms parametrization of degree <= a maximum degree.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
zi_fit_pms_choose_degree(
  V,
  Y,
  left,
  right,
  max_uniform_degree,
  extra_regressors = NULL,
  extra_reg_pen_factors = NULL,
  value_only = TRUE,
  tol = 1e-08,
  maxit = 1e+05,
  seed = NULL,
  penalize_decider = function(X) {     ncol(X) >= nrow(X)/2 },
  nfits = 10,
  runs = 2,
  print_best_degree = FALSE
)

Arguments

V

A matrix of 0/1s, equal to Y != 0.

Y

A data matrix of the same size as V.

left

An integer between 1 and ncol(Y). The index of the variable to be fit.

right

A vector of integers between 1 and ncol(Y) different from left. Indices of the "regressors".

max_uniform_degree

A positive integer, the maximum degree for the Hurdle polynomials.

extra_regressors

A matrix with the same number of rows as V and Y, extra regressors to be included in both regressions (conditional log odds/conditional mean). Defaults to NULL.

extra_reg_pen_factors

A vector of non-negative numbers, defaults to NULL. Penalty factors for extra_regressors. If the main design matrix has d columns, c(rep(1, d), extra_reg_pen_factors) will be passed as the penalty.factor argument to glmnet::glmnet(). If intercept == TRUE, a 0 will also be prepended.

value_only

If TRUE, returns the minimized negative log likelihood only. Defaults to TRUE.

tol

A number, tolerance. Defaults to 1e-8. Passed to stats::glm() for penalized logistic regressions, or as the thresh argument to glmnet::glmnet() for both logistic and linear regressions if penalized.

maxit

An integer, the maximum number of iterations. Defaults to 100000. Passed to stats::glm() for penalized logistic regressions, or to glmnet::glmnet() for both logistic and linear regressions if penalized.

seed

A number, the random seed passed to zi_fit_lm() for both regressions (conditional log odds/conditional mean).

penalize_decider

A logical or a function that takes a design matrix and returns a logical. Defaults to function(X){ncol(X)>=nrow(X)/2}. Used to decide whether to use penalized l2 (ridge) regression (if TRUE) when fitting each conditional distribution. Note that for either regression (conditional log odds/conditional mean), if the fits for unpenalized regressions are almost perfect, penalized regressions will be automatically used.

nfits

A positive integer, defaults to 10. Used for penalized regressions, as number of folds if CV_BIC == TRUE (nfits argument to glmnet::cv.glmnet(), with nlambda set to 100), or the number of lambdas if BIC == FALSE (as the nlambda argument to glmnet::glmnet()).

runs

A positive integer, the number of reruns. The fit with the maximum likelihood will be returned. Defaults to 2.

print_best_degree

A logical, whether to print the degree (1, ..., max_uniform_degree) that minimizes the BIC.

Details

A Hurdle conditional model with pms parametrization for the left node given those in right has log density with respect to the sum of the Lebesgue measure and a point mass at 0 equal to (in terms of y) log(1-p) if y == 0, or log(p)-(y-mu)^2/2/sigmasq otherwise. That is, it is a mixture of a binomial with probability of success p and a Gaussian with conditional mean mu and conditional variance sigmasq. Here sigmasq is assumed constant, and parameters log(p/(1-p)) and mu are Hurdle polynomials, i.e. polynomials in the values for right and their indicators. This function thus fits such a model using Y[,left], Y[,right] and V[,right] = (Y[,right] != 0), using a logistic for the log odds log(p/(1-p)) and a linear regression for mu.

Writing Yo <- Y[,right], a Hurdle polynomial in parents Yo is a polynomial in Yo and their 0/1 indicators Vo. The degree of a term in a Hurdle polynomial is the number of V terms plus the sum of the degrees of the Y terms. For example, Y1^2 * V2 * Y3^3 * V4 * V5 has degree equal to 2+1+3+1+1=8. Given a degree, the design matrix thus includes all possible terms with degree less than or equal to the specified degree. For example, if Vo and Yo has two columns and if we choose degree 2, the design matrix has columns V1, V2, V1*V2, Y1, Y2, Y1*Y2, Y1^2, Y2^2, Y1*V2, Y2*V1. Note that terms like V1*Y1 are not included as it is equivalent to Y1.

This function fits models using Hurdle polynomials with degrees 1, 2, ..., max_uniform_degree, and automatically chooses the degree that minimizes the BIC. It is equivalent to calling zi_fit_pms() with all degree arguments equal to d, with d in 1, ..., max_uniform_degree, and returning the one with the smallest BIC.

Value

If value_only == TRUE, returns the minimized negative log likelihood only. Otherwise, returns

nll

A number, the minimized negative log likelihood.

par

A vector of length 4*length(right)+3, the fitted parameters, in the other of: the intercept for the a (a scalar), linear coefficients on V[,right] for a, linear coefficients on Y[,right] for a, the intercept for the b (a scalar), linear coefficients on V[,right] for b, linear coefficients on Y[,right] for b.

n

An integer, the sample size.

effective_df

4*length(right)+3, the effective degree of freedom.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
m <- 3; n <- 1000
adj_mat <- make_dag(m, "complete")
dat <- gen_zero_dat(1, "pms", adj_mat, n, k_mode=1, min_num=10, gen_uniform_degree=1)
extra_regressors <- matrix(rnorm(n * 4), nrow=n)
extra_reg_pen_factors <- c(1, 2, 3, 4) / sum(c(1, 2, 3, 4))
zi_fit_pms_choose_degree(dat$V, dat$Y, 3, 1:2, max_uniform_degree=2L,
    extra_regressors=extra_regressors, extra_reg_pen_factors=extra_reg_pen_factors,
    value_only=TRUE, print_best_degree=TRUE)
zi_fit_pms_choose_degree(dat$V, dat$Y, 3, 1:2, max_uniform_degree=2L,
    extra_regressors=extra_regressors, extra_reg_pen_factors=extra_reg_pen_factors,
    value_only=FALSE, print_best_degree=TRUE)

sqyu/ZiDAG documentation built on Jan. 19, 2021, 4:11 p.m.