star_EM_wls: EM Algorithm for the STAR linear model with weighted least...

View source: R/source_EM.R

star_EM_wlsR Documentation

EM Algorithm for the STAR linear model with weighted least squares

Description

Compute the MLEs and log-likelihood for the STAR linear model. The regression coefficients are estimated using weighted least squares within an EM algorithm. The transformation can be known (e.g., log or sqrt) or unknown (Box-Cox or estimated nonparametrically) for greater flexibility. In the latter case, the empirical CDF is used to determine the transformation, and this CDF incorporates the given weights. Standard function calls including coefficients(), fitted(), and residuals() apply.

Usage

star_EM_wls(
  y,
  X,
  transformation = "np",
  y_max = Inf,
  weights = NULL,
  sd_init = 10,
  tol = 10^-10,
  max_iters = 1000
)

Arguments

y

n x 1 vector of observed counts

X

n x p matrix of predictors

transformation

transformation to use for the latent data; must be one of

  • "identity" (identity transformation)

  • "log" (log transformation)

  • "sqrt" (square root transformation)

  • "np" (nonparametric transformation estimated from empirical CDF)

  • "pois" (transformation for moment-matched marginal Poisson CDF)

  • "neg-bin" (transformation for moment-matched marginal Negative Binomial CDF)

  • "box-cox" (box-cox transformation with learned parameter)

y_max

a fixed and known upper bound for all observations; default is Inf

weights

an optional vector of weights to be used in the fitting process, which produces weighted least squares estimators.

sd_init

add random noise for EM algorithm initialization scaled by sd_init times the Gaussian MLE standard deviation; default is 10

tol

tolerance for stopping the EM algorithm; default is 10^-10;

max_iters

maximum number of EM iterations before stopping; default is 1000

Value

a list with the following elements:

  • coefficients the MLEs of the coefficients

  • fitted.values the fitted values at the MLEs

  • g.hat a function containing the (known or estimated) transformation

  • sigma.hat the MLE of the standard deviation

  • mu.hat the MLE of the conditional mean (on the transformed scale)

  • z.hat the estimated latent data (on the transformed scale) at the MLEs

  • residuals the Dunn-Smyth residuals (randomized)

  • residuals_rep the Dunn-Smyth residuals (randomized) for 10 replicates

  • logLik the log-likelihood at the MLEs

  • logLik0 the log-likelihood at the MLEs for the *unrounded* initialization

  • lambda the Box-Cox nonlinear parameter

  • and other parameters that (1) track the parameters across EM iterations and (2) record the model specifications

Note

Infinite latent data values may occur when the transformed Gaussian model is highly inadequate. In that case, the function returns the *indices* of the data points with infinite latent values, which are significant outliers under the model. Deletion of these indices and re-running the model is one option, but care must be taken to ensure that (i) it is appropriate to treat these observations as outliers and (ii) the model is adequate for the remaining data points.


drkowal/rSTAR documentation built on July 5, 2023, 2:18 p.m.