glm_gp_impl: Internal Function to Fit a Gamma-Poisson GLM
In glmGamPoi: Fit a Gamma-Poisson Generalized Linear Model

Description Usage Arguments Value See Also

Internal Function to Fit a Gamma-Poisson GLM

glm_gp_impl(
  Y,
  model_matrix,
  offset = 0,
  size_factors = c("normed_sum", "deconvolution", "poscounts"),
  overdispersion = TRUE,
  overdispersion_shrinkage = TRUE,
  do_cox_reid_adjustment = TRUE,
  subsample = FALSE,
  verbose = FALSE
)

`Y`	any matrix-like object (e.g. `matrix()`, `DelayedArray()`, `HDF5Matrix()`) with one column per sample and row per gene.
`model_matrix`	a numeric matrix that specifies the experimental design. It can be produced using `stats::model.matrix()`. Default: `NULL`
`offset`	Constant offset in the model in addition to `log(size_factors)`. It can either be a single number, a vector of length `ncol(data)` or a matrix with the same dimensions as `dim(data)`. Note that if data is a DelayedArray or HDF5Matrix, `offset` must be as well. Default: `0`.
`size_factors`	in large scale experiments, each sample is typically of different size (for example different sequencing depths). A size factor is an internal mechanism of GLMs to correct for this effect. `size_factors` is either a numeric vector with positive entries that has the same lengths as columns in the data that specifies the size factors that are used. Or it can be a string that species the method that is used to estimate the size factors (one of `c("normed_sum", "deconvolution", "poscounts")`). Note that `"normed_sum"` and `"poscounts"` are fairly simple methods and can lead to suboptimal results. For the best performance, I recommend to use `size_factors = "deconvolution"` which calls `scran::calculateSumFactors()`. However, you need to separately install the `scran` package from Bioconductor for this method to work. Also note that `size_factors = 1` and `size_factors = FALSE` are equivalent. If only a single gene is given, no size factor is estimated (ie. `size_factors = 1`). Default: `"normed_sum"`.
`overdispersion`	the simplest count model is the Poisson model. However, the Poisson model assumes that variance = mean. For many applications this is too rigid and the Gamma-Poisson allows a more flexible mean-variance relation (variance = mean + mean^2 overdispersion*). `overdispersion` can either be a single boolean that indicates if an overdispersion is estimated for each gene. a numeric vector of length `nrow(data)` fixing the overdispersion to those values. the string `"global"` to indicate that one dispersion is fit across all genes. Note that `overdispersion = 0` and `overdispersion = FALSE` are equivalent and both reduce the Gamma-Poisson to the classical Poisson model. Default: `TRUE`.
`overdispersion_shrinkage`	the overdispersion can be difficult to estimate with few replicates. To improve the overdispersion estimates, we can share information across genes and shrink each individual overdispersion estimate towards a global overdispersion estimate. Empirical studies show however that the overdispersion varies based on the mean expression level (lower expression level => higher dispersion). If `overdispersion_shrinkage = TRUE`, a median trend of dispersion and expression level is fit and used to estimate the variances of a quasi Gamma Poisson model (Lund et al. 2012). Default: `TRUE`.
`do_cox_reid_adjustment`	the classical maximum likelihood estimator of the `overdisperion` is biased towards small values. McCarthy et al. (2012) showed that it is preferable to optimize the Cox-Reid adjusted profile likelihood. `do_cox_reid_adjustment` can be either be `TRUE` or `FALSE` to indicate if the adjustment is added during the optimization of the `overdispersion` parameter. Default: `TRUE`.
`subsample`	the estimation of the overdispersion is the slowest step when fitting a Gamma-Poisson GLM. For datasets with many samples, the estimation can be considerably sped up without loosing much precision by fitting the overdispersion only on a random subset of the samples. Default: `FALSE` which means that the data is not subsampled. If set to `TRUE`, at most 1,000 samples are considered. Otherwise the parameter just specifies the number of samples that are considered for each gene to estimate the overdispersion.
`verbose`	a boolean that indicates if information about the individual steps are printed while fitting the GLM. Default: `FALSE`.