Introduction

The package ngme provides functions for estimation and prediction using latent non-Gaussian models. The package has functions for both longitudinal data analysis and for applications in spatial statistics. This document presents the framework for longitudinal data analysis. See the separate vignette for ngme.spatial for an introduction to the functions for spatial data analysis.

Modelling framework for longitudinal data

The package ngme provides functions for mixed effect models of the form $$ \begin{aligned} Y_{ij} = {\bf x}{ij}^{\top} {\bf \alpha} + {\bf d}{ij}^{\top} {\bf U}i + W_i(t{ij}) + Z_{ij}. \end{aligned} $$ Here,

The stochastic process component $W_i(t)$ as well as the random effects ${\bf U}_i$ can be excluded from the model. One can choose to specify the distributions for the different components as

The stochastic process $W_i(t)$ can be specified as an integrated Random-Walk or as a process with a Mat\'ern covariance, specified as the solution to a stochastic ordinary differential equation $$ (\kappa^2 - d/dt)^{\alpha/2}W_i(t) = d L(t). $$ Here $dL$ is Gaussian or non-Gaussian noise.

Functions

The ngme package includes the function ngme for parameter estimation based on maximum likelihood implemented with a computationally efficient stochastic gradient algorithm. The package also contains the function ngme.par which supports parallel computations and improved diagnostics.

Predictions based on either nowcasting, forecasting or smoothing distributions could be obtained using the generic function predict. print, summary, plot, fitted, residuals, intervals functions are available as the supplementary.

Data

The data-set used to illustrate the usage of the package ngme comes from a primary care cohort on kidney patients in the city of Salford in Greater Manchester, United Kingdom. The data-set contains 392,870 repeated measurements in total that belong to 22,910 patients. It contains the following variables:

For more details of the data-set, see Digge, Sousa and Asar (2015, Biostatistics, 16(3), 522-536).

The code chunk given below shows how to load the data-set and exemplifies the data-format by printing the first 10 rows.

suppressMessages(library(ngme))
data(srft_data)
knitr::kable(head(srft_data, 10))

Analysis

Here we use randomly selected 50 patients' data for the sake of computational time.

set.seed(123)
rs_id <- sample(unique(srft_data$id), 500, replace = F)
srft_data_sub <- srft_data[srft_data$id %in% rs_id, ]

The below code chunk fits the following model: $$ \log(eGFR){ij} = \beta_0 + \beta_1 bage{i} + \beta_2 sex_i + \beta_3 t_{ij} + \beta_4 \max(0, age_{ij} - 56.5) + U_{0i} + W_i(t_{ij}) + Z_{ij}, $$ with

   set.seed(123)
   srft_fit <- ngme(fixed = log(egfr) ~ sex + bage + fu + pwl,
                    random = ~ 1|id,
                    data = srft_data_sub,
                    reffects = "Normal",
                    process = c("Normal", "fd2"),
                    error = "Normal",
                    timeVar = "fu",
                    nIter = 500,
                    use.process = TRUE,
                    silent = TRUE,
                    mesh = list(cutoff = 1/365,
                                max.dist = 1/12,
                                extend = 0.01),
                    controls = list(pSubsample = 0.1,
                                    estimate.fisher = 1)
                    )

summary function provides a more thorough list of results:

summary(srft_fit)

Parameter trajectories within the stochastic gradient algorithm, for example for the fixed effects, can be visualised using the generic plot function as

plot(srft_fit, param = "fixed")

Marginal fitted values, $\hat{Y}{ij} = {\bf x}{ij} \hat{{\bf \beta}}$, and marginal residuals, $Y_{ij} - \hat{Y}_{ij}$ can be obtained using the fitted and residual functions, respectively:

#just print the first 10 elements for each
head(fitted(srft_fit), 10)
head(residuals(srft_fit), 10)

95\% confidence intervals for the fixed effects coefficients could be obtained using the intervals function:

intervals(srft_fit)

On one-step forecasts could be obtained using the function predict:

pred <- predict(srft_fit, 
                id = unique(srft_data_sub$id)[1], 
                type = "Filter"
                )

A summary of results including accuracy measures can be viewed by calling the name of the predicted object. Predictions for a selected subject, e.g. the first subject in the srft_data_sub, could be plotted using

 plot(pred, id = unique(srft_data_sub$id)[1], pch = 19, cex = 0.5)

In this plot, dots are observed data, black lines is the mean of the predictive distribution, and red lines are the 95\% prediction intervals.

Exceedence probabilities, for example losing kidney function at least at a relative rate of 5\% in a year, $P(\frac{\partial Y_{ik}^*}{\partial t} < -0.05|Y_{i1}, \ldots, Y_{ik})$, could be calculated using the nowcasting distributions. The following script illustrates obtaining exceedence probabilities.

 excursion_ids <- unique(srft_data_sub$id)[1]

 delta <- 0.01

 Bf_pred <- Br_pred <- list()

 for(i in 1:length(excursion_ids)){

   data_i <- srft_data[srft_data$id == excursion_ids[i], ]

   n_i <- nrow(data_i)

   Br_pred_i <- matrix(1, nrow = n_i, ncol = 1)

   Bf_pred_i <- cbind(data_i$sex,
                      data_i$bage,
                      data_i$fu + 0.01, 
                      pmax(0, data_i$bage + data_i$fu + 0.01 - 56.5)
                      )

   Br_pred[[i]] <- Br_pred_i
   Bf_pred[[i]] <- Bf_pred_i

 }

 derivative_list <- list(Bfixed = Bf_pred,
                         Brandom = Br_pred,
                         delta = delta)

 set.seed(123)
 excursions_normal <- 
   predict(srft_fit,
           id = excursion_ids,
           controls = list(nSim = 100,
                           nBurnin = 100,
                           predict.derivatives = derivative_list,
                           excursions = list(list(type = '<', 
                                                  level = -0.05, 
                                                  process = 'Xderivative'))
                           ),
           type = "Nowcast"
           )
plot(excursions_normal, id = excursion_ids[1], plot_excursions = TRUE,
     type = "o", pch = 19) 


davidbolin/ngme documentation built on Dec. 5, 2023, 11:48 p.m.