pf_simplify | R Documentation |
pf
into movement pathsThis function is designed to simplify the pf_archive-class
object from pf
that defines sampled particle histories into a set of movement paths. The function identifies pairs of cells between which movement may have occurred at each time step (if necessary), (re)calculates distances and probabilities between connected cell pairs and then, if specified, links pairwise movements between cells into a set of possible movement paths.
pf_simplify(
archive,
max_n_particles = NULL,
max_n_particles_sampler = c("random", "weighted", "max"),
bathy = NULL,
calc_distance = NULL,
calc_distance_lcp_fast = NULL,
calc_distance_graph = NULL,
calc_distance_limit = NULL,
calc_distance_barrier = NULL,
calc_distance_barrier_limit = NULL,
calc_distance_barrier_grid = NULL,
calc_distance_restrict = FALSE,
calc_distance_algorithm = "bi",
calc_distance_constant = 1,
mobility = NULL,
mobility_from_origin = mobility,
write_history = NULL,
cl = NULL,
varlist = NULL,
use_all_cores = FALSE,
return = c("path", "archive"),
summarise_pr = FALSE,
max_n_copies = NULL,
max_n_copies_sampler = c("random", "weighted", "max"),
max_n_paths = 100L,
add_origin = TRUE,
verbose = TRUE
)
archive |
A |
max_n_particles |
(optional) An integer that defines the maximum number of particles to selected at each time step. If supplied, particle samples are thinned, with |
max_n_particles_sampler |
If |
bathy |
A |
calc_distance |
A character that defines the method used to calculate distances between sequential combinations of particles (see |
calc_distance_lcp_fast |
(optional) If |
calc_distance_graph |
(optional) If |
calc_distance_limit |
(optional) If |
calc_distance_barrier |
(optional) If |
calc_distance_barrier_limit |
(optional) If |
calc_distance_barrier_grid |
(optional) If |
calc_distance_restrict |
(optional) If and |
calc_distance_algorithm , calc_distance_constant |
Additional shortest-distance calculation options if |
mobility , mobility_from_origin |
(optional) The mobility parameters (see |
write_history |
A named list of arguments, passed to |
cl , varlist , use_all_cores |
(optional) Parallelisation options for the first stage of the algorithm, which identifies connected cell pairs, associated distances and movement probabilities. The first parallelisation option is to parallelise the algorithm over time steps via |
return |
A character ( |
summarise_pr |
(optional) For |
max_n_copies |
(optional) For |
max_n_copies_sampler |
(optional) For |
max_n_paths |
(optional) For |
add_origin |
For |
verbose |
A logical input that defines whether or not to print messages to the console to monitor function progress. |
The implementation of this function depends on how pf
has been implemented and the return
argument. Under the default options in pf
, the fast Euclidean distances method is used to sample sequential particle positions, in which case the history of each particle through the landscape is not retained and has to be assembled afterwards. In this case, pf_simplify
calculates the distances between all combinations of cells at each time step, using either a Euclidean distances or shortest distances algorithm according to the input to calc_distance
. Distances are converted to probabilities using the ‘intrinsic’ probabilities associated with each location and the movement models retained in archive
from the call to pf
to identify possible movement paths between cells at each time step. If the fast Euclidean distances method has not been used, then pairwise cell movements are retained by pf
. In this case, the function simply recalculates distances between sequential cell pairs and the associated cell probabilities, which are then processed according to the return
argument.
Following the identification of pairwise cell movements, if return = "archive"
, the function selects all of the unique cells at each time step that were connected to cells at the next time step. (For cells that were selected multiple times at a given time step, due to sampling with replacement in pf
, if summarise_pr
is supplied, only one sample is retained: in maps of the ‘probability of use’ across an area (see pf_plot_map
), this ensures that cell scores depend on the number of time steps when the individual could have occupied a given cell, rather than the total number of samples of a location.) Otherwise, if return = "path"
, pairwise cell movements are assembled into complete movement paths.
If return = "archive"
, the function returns a pf_archive-class
object, as inputted, but in which only the most likely record of each cell that was connected to cells at the next time step is retained and with the method = "pf_simplify"
flag. If return = "path"
, the function returns a pf_path-class
object, which is a dataframe that defines the movement paths.
Edward Lavender
#### Example particle histories
# In these examples, we will use the example particle histories included in flapper
summary(dat_dcpf_histories)
#### Example (1): The default implementation
paths_1 <- pf_simplify(dat_dcpf_histories)
## Demonstration that the distance and probabilities calculations are correct
# The simple method below works if three conditions are met:
# ... The 'intrinsic' probability associated with each cell is the same (as for DC algorithm);
# ... Paths have been reconstructed via pf_simplify() using Euclidean distances;
# ... The calc_movement_pr() movement model applies to all time steps;
require(magrittr)
require(rlang)
paths_1 <-
paths_1 %>%
dplyr::group_by(.data$path_id) %>%
dplyr::mutate(
cell_xp = dplyr::lag(.data$cell_x),
cell_yp = dplyr::lag(.data$cell_y),
cell_dist_chk = sqrt((.data$cell_xp - .data$cell_x)^2 +
(.data$cell_yp - .data$cell_y)^2),
cell_pr_chk = dat_dcpf_histories$args$calc_movement_pr(.data$cell_dist_chk),
dist_equal = .data$cell_dist_chk == .data$cell_dist_chk,
pr_equal = .data$cell_pr == .data$cell_pr_chk
) %>%
data.frame()
utils::head(paths_1)
## Demonstration that the depths of sampled cells are correct
paths_1$cell_z_chk <- raster::extract(
dat_dcpf_histories$args$bathy,
paths_1$cell_id
)
all.equal(paths_1$cell_z, paths_1$cell_z_chk)
## Compare depth time series
# There is a relatively large degree of mismatch here, which reflects
# ... the low resolution bathymetry data used for the algorithm.
pf_plot_1d(paths_1, dat_dc$args$archival)
## Examine paths
# Log likelihood
pf_loglik(paths_1)
# 2-d visualisation
pf_plot_2d(paths_1, dat_dcpf_histories$args$bathy,
add_paths = list(length = 0.05)
)
# 3-d visualisation
pf_plot_3d(paths_1, dat_dcpf_histories$args$bathy)
#### Example (2): Re-calculate distances as shortest distances
## Implement flapper::pf()
# For this example, we need to increase the number of particles
# ... for Euclidean-based sampling to generate viable paths
# ... when we consider shortest distances
set.seed(1)
dcpf_args <- dat_dcpf_histories$args
dcpf_args$calc_distance_euclid_fast <- TRUE
dcpf_args$n <- 50
out_dcpf_2 <- do.call(pf, dcpf_args)
## Implement pf_simplify() using shortest distances
paths_2 <- pf_simplify(out_dcpf_2, calc_distance = "lcp")
# ... Duration: ~ 0.655 s
system.time(
invisible(utils::capture.output(
pf_simplify(out_dcpf_2, calc_distance = "lcp")
))
)
## Demonstrate the LCP calculations are correct
paths_2_lcps <- lcp_interp(paths_2,
out_dcpf_2$args$bathy,
calc_distance = TRUE
)
head(cbind(paths_2$dist, paths_2_lcps$dist_lcp$dist))
## Trial options for increasing speed of shortest-distance calculations
# Speed up shortest-distance calculations via (a) the graph:
# ... Duration: ~0.495 s
# ... Note that you may achieve further speed improvements via
# ... a simplified/contracted graph
# ... ... see cppRouting::cpp_simplify()
# ... ... see cppRouting::cpp_contract()
costs <- lcp_costs(out_dcpf_2$args$bathy)
graph <- lcp_graph_surface(out_dcpf_2$args$bathy, costs$dist_total)
system.time(
invisible(utils::capture.output(
pf_simplify(out_dcpf_2,
calc_distance = "lcp",
calc_distance_graph = graph
)
))
)
# Speed up shortest-distance calculations via (b) the lower Euclid dist limit
# ... Duration: ~0.493 s
costs <- lcp_costs(out_dcpf_2$args$bathy)
graph <- lcp_graph_surface(out_dcpf_2$args$bathy, costs$dist_total)
system.time(
invisible(utils::capture.output(
pf_simplify(out_dcpf_2,
calc_distance = "lcp",
calc_distance_graph = graph,
calc_distance_limit = 100
)
))
)
# Speed up shortest-distance calculations via (c) the barrier
# ... Duration: ~1.411 s (much slower in this example)
coastline <- sf::st_as_sf(dat_coast)
sf::st_crs(coastline) <- NA
system.time(
invisible(utils::capture.output(
pf_simplify(out_dcpf_2,
calc_distance = "lcp",
calc_distance_graph = graph,
calc_distance_limit = 100,
calc_distance_barrier = coastline
)
))
)
# Speed up calculations via (d) mobility limits
# ... (In the examples above, the mobility parameters
# ... can be extracted from out_dcpf_2,
# ... so specifying them directly here in this example makes
# ... no material difference, but this is not necessarily the case
# ... if pf() has been implemented without mobility parameters).
system.time(
invisible(utils::capture.output(
pf_simplify(out_dcpf_2,
calc_distance = "lcp",
calc_distance_graph = graph,
calc_distance_limit = 100,
mobility = 200,
mobility_from_origin = 200
)
))
)
# Speed up calculations via (e) parallelisation
# ... see the details in the documentation.
#### Example (3): Restrict the number of routes to each cell at each time step
# Implement approach for different numbers of copies
# Since we only have sampled a small number of particles for this simulation
# ... this does not make any difference here, but it can dramatically reduce
# ... the time taken to assemble paths and prevent vector memory issues.
paths_3a <- pf_simplify(dat_dcpf_histories, max_n_copies = 1)
paths_3b <- pf_simplify(dat_dcpf_histories, max_n_copies = 5)
paths_3c <- pf_simplify(dat_dcpf_histories, max_n_copies = 7)
# Compare the number of paths retained
unique(paths_3a$path_id)
unique(paths_3b$path_id)
unique(paths_3c$path_id)
#### Example (4): Change the sampling method used to retain paths
# Again, this doesn't make a difference here, but it can when there are
# ... more particles.
paths_4a <- pf_simplify(dat_dcpf_histories,
max_n_copies = 5,
max_n_copies_sampler = "random"
)
paths_4b <- pf_simplify(dat_dcpf_histories,
max_n_copies = 5,
max_n_copies_sampler = "weighted"
)
paths_4c <- pf_simplify(dat_dcpf_histories,
max_n_copies = 5,
max_n_copies_sampler = "max"
)
# Compare retained paths
pf_loglik(paths_3a)
pf_loglik(paths_3b)
pf_loglik(paths_3c)
#### Example (5): Set the maximum number of paths for reconstruction (for speed)
# Reconstruct all paths (note you may experience vector memory limitations)
# unique(pf_simplify(dat_dcpf_histories, max_n_paths = NULL)$path_id)
# Reconstruct one path
unique(pf_simplify(dat_dcpf_histories, max_n_paths = 1)$path_id)
# Reconstruct (at most) five paths
unique(pf_simplify(dat_dcpf_histories, max_n_paths = 5)$path_id)
#### Example (6): Retain/drop the origin, if specified
# For the example particle histories, an origin was specified
dat_dcpf_histories$args$origin
# This is included as 'timestep = 0' in the returned dataframe
# ... with the coordinates re-defined on bathy:
paths_5a <- pf_simplify(dat_dcpf_histories)
paths_5a[1, c("cell_x", "cell_y")]
raster::xyFromCell(
dat_dcpf_histories$args$bathy,
raster::cellFromXY(
dat_dcpf_histories$args$bathy,
dat_dcpf_histories$args$origin
)
)
head(paths_5a)
# If specified, the origin is dropped with add_origin = FALSE
paths_5b <- pf_simplify(dat_dcpf_histories, add_origin = FALSE)
head(paths_5b)
#### Example (6) Get particle samples for connected particles
## Implement DCPF with more particles for demonstration purposes
set.seed(1)
dcpf_args <- dat_dcpf_histories$args
dcpf_args$calc_distance_euclid_fast <- TRUE
dcpf_args$n <- 250
out_dcpf_6a <- do.call(pf, dcpf_args)
head(out_dcpf_6a$history[[1]])
## Extract particle samples for connected particles
# There may be multiple records of any given cell at any given time step
# ... due to sampling with replacement.
out_dcpf_6b <- pf_simplify(out_dcpf_6a, return = "archive")
head(out_dcpf_6b$history[[1]])
table(duplicated(out_dcpf_6b$history[[1]]$id_current))
## Extract particle samples for connected particles,
# ... with only the most likely record of each particle returned.
# We can implement the approach using out_dcpf_6b
# ... to skip distance calculations.
# Now, there is only one (the most likely) record of sampled cells
# ... at each time step.
out_dcpf_6c <- pf_simplify(out_dcpf_6b,
summarise_pr = TRUE,
return = "archive"
)
head(out_dcpf_6c$history[[1]])
table(duplicated(out_dcpf_6c$history[[1]]$id_current))
## Make movement paths
# Again, we use out_dcpf_6b to skip distance calculations.
out_dcpf_6d <- pf_simplify(out_dcpf_6b,
max_n_copies = 2L,
return = "path"
)
## Compare resultant maps
# The map for all particles is influenced by particles that were 'dead ends',
# ... which isn't ideal for a map of space use.
# The map for connected samples deals with this problem, but is influenced by
# ... the total number of samples of each cell, rather than the number of time steps
# ... in which the individual could have been located in a given cell.
# The map for unique, connected samples deals with this issue, so that scores
# ... represent the number of time steps in which the individual could have occupied
# ... a given cell, over the length of the time series.
# The map for the paths are sparser because paths have only been reconstructed
# ... for a sample of sampled particles.
pp <- par(mfrow = c(2, 2), oma = c(2, 2, 2, 2), mar = c(2, 4, 2, 4))
paa <- list(side = 1:4, axis = list(labels = FALSE))
transform <- NULL
m_1 <- pf_plot_map(out_dcpf_6a, dcpf_args$bathy,
transform = transform,
pretty_axis_args = paa, main = "all samples"
)
m_2 <- pf_plot_map(out_dcpf_6b, dcpf_args$bathy,
transform = transform,
pretty_axis_args = paa, main = "connected samples"
)
m_3 <- pf_plot_map(out_dcpf_6c, dcpf_args$bathy,
transform = transform,
pretty_axis_args = paa, main = "unique, connected samples"
)
m_4 <- pf_plot_map(out_dcpf_6d, dcpf_args$bathy,
transform = transform,
pretty_axis_args = paa, main = "paths"
)
par(pp)
# Note that all locations in reconstructed paths are derived from PF samples
all(out_dcpf_6d$cell_id[out_dcpf_6d$timestep != 0] %in%
do.call(rbind, out_dcpf_6c$history)$id_current)
# But the paths only contain a subset of sampled particles
h_6a <- lapply(out_dcpf_6a$history, function(elm) elm[, "id_current"])
table(unique(unlist(h_6a)) %in% out_dcpf_6d$cell_id[out_dcpf_6d$timestep != 0])
h_6b <- lapply(out_dcpf_6b$history, function(elm) elm[, "id_current"])
table(unique(unlist(h_6b)) %in% out_dcpf_6d$cell_id[out_dcpf_6d$timestep != 0])
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.