The package ngme
provides functions for estimation and prediction using latent non-Gaussian models. The package has functions for both longitudinal data analysis and for applications in spatial statistics. This document presents the framework for longitudinal data analysis. See the separate vignette for ngme.spatial
for an introduction to the functions for spatial data analysis.
The package ngme
provides functions for mixed effect models of the form
$$
\begin{aligned}
Y_{ij} = {\bf x}{ij}^{\top} {\bf \alpha} + {\bf d}{ij}^{\top} {\bf U}i + W_i(t{ij}) + Z_{ij}.
\end{aligned}
$$
Here,
The stochastic process component $W_i(t)$ as well as the random effects ${\bf U}_i$ can be excluded from the model. One can choose to specify the distributions for the different components as
The stochastic process $W_i(t)$ can be specified as an integrated Random-Walk or as a process with a Mat\'ern covariance, specified as the solution to a stochastic ordinary differential equation $$ (\kappa^2 - d/dt)^{\alpha/2}W_i(t) = d L(t). $$ Here $dL$ is Gaussian or non-Gaussian noise.
The ngme
package includes the function ngme
for parameter estimation
based on maximum likelihood implemented with a computationally efficient
stochastic gradient algorithm. The package also contains the function ngme.par
which supports parallel computations and improved diagnostics.
Predictions based on either nowcasting, forecasting or smoothing distributions could be obtained using the generic function predict
. print
, summary
, plot
, fitted
, residuals
, intervals
functions are available as the supplementary.
The data-set used to illustrate the usage of the package ngme
comes
from a primary care cohort on kidney patients in the city of Salford
in Greater Manchester, United Kingdom. The data-set contains 392,870
repeated measurements in total that belong to 22,910 patients. It contains
the following variables:
For more details of the data-set, see Digge, Sousa and Asar (2015, Biostatistics, 16(3), 522-536).
The code chunk given below shows how to load the data-set and exemplifies the data-format by printing the first 10 rows.
suppressMessages(library(ngme)) data(srft_data) knitr::kable(head(srft_data, 10))
Here we use randomly selected 50 patients' data for the sake of computational time.
set.seed(123) rs_id <- sample(unique(srft_data$id), 500, replace = F) srft_data_sub <- srft_data[srft_data$id %in% rs_id, ]
The below code chunk fits the following model: $$ \log(eGFR){ij} = \beta_0 + \beta_1 bage{i} + \beta_2 sex_i + \beta_3 t_{ij} + \beta_4 \max(0, age_{ij} - 56.5) + U_{0i} + W_i(t_{ij}) + Z_{ij}, $$ with
set.seed(123) srft_fit <- ngme(fixed = log(egfr) ~ sex + bage + fu + pwl, random = ~ 1|id, data = srft_data_sub, reffects = "Normal", process = c("Normal", "fd2"), error = "Normal", timeVar = "fu", nIter = 500, use.process = TRUE, silent = TRUE, mesh = list(cutoff = 1/365, max.dist = 1/12, extend = 0.01), controls = list(pSubsample = 0.1, estimate.fisher = 1) )
summary
function provides a more thorough list of results:
summary(srft_fit)
Parameter trajectories within the stochastic gradient algorithm,
for example for the fixed effects,
can be visualised using the generic plot
function as
plot(srft_fit, param = "fixed")
Marginal fitted values, $\hat{Y}{ij} = {\bf x}{ij} \hat{{\bf \beta}}$,
and marginal residuals, $Y_{ij} - \hat{Y}_{ij}$ can be obtained using
the fitted
and residual
functions, respectively:
#just print the first 10 elements for each head(fitted(srft_fit), 10) head(residuals(srft_fit), 10)
95\% confidence intervals for the fixed effects coefficients
could be obtained using the intervals
function:
intervals(srft_fit)
On one-step forecasts could be obtained using the function
predict
:
pred <- predict(srft_fit, id = unique(srft_data_sub$id)[1], type = "Filter" )
A summary of results including accuracy measures can be viewed by
calling the name of the predicted object.
Predictions for a selected subject, e.g. the first subject in the
srft_data_sub
, could be plotted using
plot(pred, id = unique(srft_data_sub$id)[1], pch = 19, cex = 0.5)
In this plot, dots are observed data, black lines is the mean of the predictive distribution, and red lines are the 95\% prediction intervals.
Exceedence probabilities, for example losing kidney function at least at a relative rate of 5\% in a year, $P(\frac{\partial Y_{ik}^*}{\partial t} < -0.05|Y_{i1}, \ldots, Y_{ik})$, could be calculated using the nowcasting distributions. The following script illustrates obtaining exceedence probabilities.
excursion_ids <- unique(srft_data_sub$id)[1] delta <- 0.01 Bf_pred <- Br_pred <- list() for(i in 1:length(excursion_ids)){ data_i <- srft_data[srft_data$id == excursion_ids[i], ] n_i <- nrow(data_i) Br_pred_i <- matrix(1, nrow = n_i, ncol = 1) Bf_pred_i <- cbind(data_i$sex, data_i$bage, data_i$fu + 0.01, pmax(0, data_i$bage + data_i$fu + 0.01 - 56.5) ) Br_pred[[i]] <- Br_pred_i Bf_pred[[i]] <- Bf_pred_i } derivative_list <- list(Bfixed = Bf_pred, Brandom = Br_pred, delta = delta) set.seed(123) excursions_normal <- predict(srft_fit, id = excursion_ids, controls = list(nSim = 100, nBurnin = 100, predict.derivatives = derivative_list, excursions = list(list(type = '<', level = -0.05, process = 'Xderivative')) ), type = "Nowcast" ) plot(excursions_normal, id = excursion_ids[1], plot_excursions = TRUE, type = "o", pch = 19)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.