knitr::opts_chunk$set(echo = TRUE)

The **VGAM** package's `vglm()`

function, like the **survey** package's `svymle()`

function, allows for maximum likelihood fitting where linear predictors are added to one or more parameters of a distribution --- but `vglm()`

is a lot faster and has many distributions already built in. This is how we make **svyVGAM** handle complex sampling.

I will write $\beta$ for the regression parameters, $\theta$ for the base parameters of the response distribution, and $\eta$ for the linear predictors. So, for example, in a classical linear model there would be two parameters $\theta$: the mean ($\theta_1$) and variance ($\theta_2$). The mean would have a set of regression parameters and the variance would have a single parameter. Collectively, these would be $\beta$ (maybe $\beta_{11}\dots\beta_{1p}$ and $\beta_{21}$), and the two combinations that are plugged in as $\theta$ would be called $\eta_1$ and $\eta_2$. The big advantage of **VGAM** is that it does a lot of the work for the user: while the user can add new families, there are many pre-prepared ones, and there are built-in ways to constrain parameters to be equal or related in some other way.

To provide survey versions of `vglm()`

, we need to (a) get design-consistent point estimates out of `vglm()`

, and (b) construct design-based standard errors for the fit. The first is easy: `vglm()`

accepts frequency weights, which are equivalent to sampling weights for point estimation with independent observations.

The second can be done in two ways: resampling (which is straightforward, if potentially slow), and linearisation. Linearisation requires computing the influence functions of the parameters
$$h_i(\beta) = -\widehat{\cal I}^{-1}*w U_i(\beta)$$
where $\widehat{\cal I}_w$ is the weighted estimate of the population Fisher information, $U_i=\partial*\beta \ell_i(\beta)$ is the loglikelihood contribution of the $i$th observation, and $w_i$ is its weight. The influence functions have the property
$$\hat\beta-\beta_0 = \sum_i w_i h_i(\beta_0)+o_p(\|\hat\beta-\beta_0\|)$$
so that the variance of $\hat\beta$ is asymptotically the variance of the population total of the influence functions.
The survey package provides a function `svyrecvar()`

to estimate standard errors given the influence functions, or (a bit less efficiently) you can just call `svytotal()`

.

A design object of class `svyrep.design`

contains sets of replicate weights analogous to jackknife or bootstrap replicates. We need to call `vglm()`

with each set of weights. It should be helpful to specify the full-sample estimates as starting values.

One complication is that sets of replicate weights will typically include some zeroes, which `vglm()`

does not allow (eg, a bootstrap or jackknife resample will omit some observations). We set these to $10^{-9}$ times the maximum weight, which has the desired effect that the observations are present in the fit but with (effectively) zero weight.

The `cov.unscaled`

slot of a `summary.vglm`

object contains the inverse of the estimated population Fisher information, $\widehat{\cal I}^{-1}_w$.

The `vglm`

object provides $\partial_\eta\ell_i(\eta)$ for the base parameters $\theta$, and also the model matrices that specify $\partial_\beta\eta$, so we can construct $U_i$. We need to take care with the constraints, which can cause a coefficient $\beta$ to appear in more than one linear predictor.

Suppose $\beta_x$ appears in both $\eta_1$ and $\eta_2$, with $x$ values $x_1$ and $x_2$. The chain rule tells us $$\partial_{\beta_x}\ell_i =\partial_{\eta_1}\ell_i\partial_{\beta_x}\eta_1 + \partial_{\eta_2}\ell_i\partial_{\beta_x}\eta_2 = (\partial_{\eta_1}\ell_i) x_{1i}+ (\partial_{\eta_2}\ell_i) x_{2i} $$ We might have $x_1\equiv x_2\,(=x)$, but that just means $$\partial_{\beta_x}\ell_i = (\partial_{\eta_1}\ell_i) x_{i}+ (\partial_{\eta_2}\ell_i) x_{i} $$

The constraint matrix $C$ consists of 1s and 0s. If there are $p$ parameters in $M$ equations the matrix is $M\times p$, with $C_{jk}=1$ if parameter $k$ is in linear predictor $j$. In the default, unconstrained setup, the constraint matrix consists of an $M\times M$ identity matrix for each parameter, pasted columnwise to give a $M\times pM$ matrix. In the proportional odds model, as another example, there are separate intercepts for each linear predictor but the other parameters all appear in every linear predictor. The first $M\times M$ block is the identity, and the rest of the matrix is a column of 1s for each predictor variable. Another way to say this is that $C_{jk}=\partial_{ (\beta_kx_k)}\eta_j$

So, if we want $\partial\beta\ell_i$, the chain rule says
\begin{eqnarray*}
\frac{\partial \ell_i}{\partial \beta_j} &=& \sum_k\frac{\partial \ell_i}{\partial\eta_k} \frac{\partial \eta_k}{\partial \beta_j}\
&=& \sum_k\frac{\partial \ell_i}{\partial \eta_k} \frac{\partial \eta_k}{\partial (x\beta) j}\frac{\partial (x\beta)_j}{\partial \beta_j}\
&=&\sum_k \frac{\partial \ell_i}{\partial \eta_k} C{kj}x_{ij}
\end{eqnarray*}

There is one further complication. The `model.matrix`

method for `vglm`

objects returns a model matrix of dimension $Mn\times p$ rather than $n\times p$, so we need to sum over the rows for each observation, which can be identified from the row names, and then rescale. The right standardisation appears to come from defining
$$\tilde C_{kj}=\frac{C_{kj}}{\sum_k C_{kj}}$$
and then
$$\frac{\partial \ell_i}{\partial \beta_j}=\sum_k \frac{\partial \ell_i}{\partial \eta_k} \tilde C_{kj}x_{ikj}.$$

**Any scripts or data that you put into this service are public.**

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.