knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
set.seed(90210) library(ldmppr) library(dplyr)
To illustrate the use of the ldmppr package, we will be using the included small_example_data dataset. This dataset consists of locations and sizes for r nrow(data("small_example_data")) points in a (25m x 25m) square domain.
data("small_example_data") nrow(small_example_data) head(small_example_data)
We also include a collection of example raster files obtained from an ESS-DIVE repository (see the help file for small_example_data for details).
The ldmppr package provides tools for:
As of version 1.1.0, the main workflow functions return S3 objects with standard methods (print(), plot(), coef(), as.data.frame(), etc.). In most cases you should no longer need helper plotting functions like plot_mpp().
We start by estimating the parameters of a self-correcting spatio-temporal point process using the estimate_process_parameters function. This function makes use of user specified optimization algorithms in the nloptr function to perform estimation through log-likelihood optimization. The function can be run with three different optimization strategies ("local", "global_local", "multires_global_local") that allow the user to control the approach. The "local" strategy performs a local optimization, where the "global_local" first runs a global optimization and then uses the solution as a starting point for a local optimization. Finally, the "multires_global_local" strategy performs similarly to the "global_local", but iterates further at the local level by increasing the resolution of the spatio-temporal grid and repeating the local optimization with increasing degrees of polish. The function requires the observed locations and observed arrival times, which are generated from the observed sizes using a power law mapping function. These inputs are accepted as:
(time, x, y), or(x, y, size) plus a mapping hyperparameter delta (or a vector delta for a delta-search).The power law mapping function depends on the hyperparameter delta, which controls the shape of the mapping relationship between sizes and arrival times. The function also requires the grid values for the optimization process, the initial parameter values, the upper bounds for the parameters, and the optimization algorithm (either global and local, or just local depending on strategy) to use. The function returns the optimal parameter estimates for the self-correcting point process.
Below we show the common case where you start from (x, y, size) with delta = 1.
# Use the (x, y, size) form and let estimate_process_parameters() construct time via delta parameter_estimation_data <- small_example_data # Upper bounds for (t, x, y) ub <- c(1, 25, 25) # Define the integration / grid schedule used by the likelihood approximation grids <- ldmppr_grids( upper_bounds = ub, levels = list( c(20, 20, 20) ) ) # Define optimizer budgets/options for the global + local stages budgets <- ldmppr_budgets( global_options = list( maxeval = 150 ), local_budget_first_level = list( maxeval = 300, xtol_rel = 1e-5 ), local_budget_refinement_levels = list( maxeval = 300, xtol_rel = 1e-5 ) ) # Parameter initialization values: (alpha1, beta1, gamma1, alpha2, beta2, alpha3, beta3, gamma3) parameter_inits <- c(1.5, 8.5, 0.015, 1.5, 3.2, 0.75, 3, 0.08) fit_sc <- estimate_process_parameters( data = parameter_estimation_data, grids = grids, budgets = budgets, parameter_inits = parameter_inits, delta = 1, rescore_control = list( enabled = TRUE, top = 5L, objective_tol = 1e-6, param_tol = 0.10 ), parallel = FALSE, strategy = "global_local", global_algorithm = "NLOPT_GN_CRS2_LM", local_algorithm = "NLOPT_LN_BOBYQA", verbose = TRUE ) # Print method for ldmppr_fit objects fit_sc estimated_parameters <- coef(fit_sc) estimated_parameters
Notes
delta = c(...) and set parallel = TRUE to run one optimizer per delta (optionally in parallel).rescore_control, either as a single logical toggle or a named list.Next, we use the train_mark_model function to train a suitably flexible mark model using location-specific covariates (rasters), (optionally) semi-distance dependent competition indices, and the generated arrival times to predict sizes. The function uses either:
model_type = "xgboost"), ormodel_type = "ranger"),and may be run in parallel if desired. The user has control over the choice of a Gradient Boosting Machine (GBM) or Random Forest (RF) model, the bounds of the spatial domain, the inclusion of competition indices, the competition radius, the correction method, the final model selection metric, the number of cross validation folds, and the size of the parameter tuning grid for the model.
# Load raster covariates shipped with the package raster_paths <- list.files(system.file("extdata", package = "ldmppr"), pattern = "\\.tif$", full.names = TRUE ) raster_paths <- raster_paths[!grepl("_med\\.tif$", raster_paths)] rasters <- lapply(raster_paths, terra::rast) # Scale rasters once up front scaled_rasters <- scale_rasters(rasters) # Train the mark model, passing the estimated self-correcting model fit from Step 1 mark_model <- train_mark_model( data = fit_sc, raster_list = scaled_rasters, scaled_rasters = TRUE, model_type = "xgboost", parallel = FALSE, include_comp_inds = FALSE, competition_radius = 10, edge_correction = "none", selection_metric = "rmse", cv_folds = 5, tuning_grid_size = 20, verbose = TRUE ) # Print method for ldmppr_mark_model objects print(mark_model) # Summary method for ldmppr_mark_model objects summary(mark_model)
predict_marks() is retained as a deprecated compatibility wrapper. For new workflows, use S3 prediction directly via predict(mark_model, ...).
If the user wants to save the trained mark model (and reload it at a later time), the package provides the save_mark_model and load_mark_model functions (which handle model-engine specifics appropriately):
save_path <- tempfile(fileext = ".rds") save_mark_model(mark_model, save_path) mark_model2 <- load_mark_model(save_path) mark_model2
Now that we have (i) an estimated process model and (ii) a trained mark model, we can check the model fit using the check_model_fit function. This function provides a variety of non-parametric summaries for point processes and global envelope tests to assess the goodness of fit of the model. The package provides individual envelope tests for the L, F, G, J, E, and V statistics, or a combined envelope test for all statistics simultaneously by making use of the functionality of the spatstat and GET packages. By plotting the combined envelope test, we can visually assess the goodness of fit of the model and obtain a $p$-value measuring how well the estimated model captures the relationships observed in the reference data. Typically a $p$-value less than 0.05 indicates a poor fit of the model to the data, and the authors of the GET package recommend a minimum of 2500 simulations to obtain a valid $p$-value at the .05 level.
check_model_fit() returns an S3 object with a plot() method (by default it plots the combined global envelope test).
# Check the model fit by passing the estimated process and trained mark models from Steps 1 and 2 model_check <- check_model_fit( t_min = 0, t_max = 1, process = "self_correcting", process_fit = fit_sc, mark_model = mark_model, include_comp_inds = FALSE, thinning = TRUE, edge_correction = "none", competition_radius = 10, n_sim = 199, mark_mode = "mark_model", fg_correction = "km", save_sims = FALSE, verbose = TRUE, seed = 90210 ) # Plot method for ldmppr_model_check objects (defaults to combined global envelope test) plot(model_check)
You can still access the underlying GET objects if you want (e.g., model_check$combined_env), but for most workflows plot(model_check) is sufficient.
Finally, we can simulate a new marked point process realization from the fitted model.
simulate_mpp() returns an object of class ldmppr_sim with plot() and as.data.frame() methods.
# Simulate a new marked point pattern by passing the estimated process and trained mark models from Steps 1 and 2 simulated <- simulate_mpp( process = "self_correcting", process_fit = fit_sc, t_min = 0, t_max = 1, mark_model = mark_model, mark_mode = "mark_model", include_comp_inds = TRUE, competition_radius = 10, edge_correction = "none", thinning = TRUE, seed = 90210 ) # Plot method for ldmppr_sim plot(simulated) # Data-frame view of the simulated realization head(as.data.frame(simulated))
This vignette demonstrates the core ldmppr workflow using the updated S3-based interfaces:
estimate_process_parameters() → returns a fitted process object (ldmppr_fit)train_mark_model() → returns a fitted mark model object (ldmppr_mark_model)check_model_fit() → returns a model-check object (ldmppr_model_check)simulate_mpp() → returns a simulation object (ldmppr_sim)These objects are designed to be composable and to behave like standard R model objects through S3 methods.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.