R/doc-parallelism.R

#' Parallel computation in lidR
#'
#' This document explains how to process point clouds taking advantage of parallel processing in the
#' lidR package. The lidR package has two levels of parallelism, which is why it is difficult to
#' understand how it works. This page aims to provide users with a clear overview of how to take advantage
#' of multicore processing even if they are not comfortable with the parallelism concept.
#'
#' @section Algorithm-based parallelism:
#' When processing a point cloud we are applying an algorithm on data. This algorithm may or may not be
#' natively parallel. In lidR some algorithms are fully computed in parallel, but some are not because they are
#' not parallelizable, while some are only partially parallelized. It means that some portions of the code
#' are computed in parallel and some are not. When an algorithm is natively parallel in lidR it is always
#' a C++ based parallelization with OpenMP. The advantage is that the computation is faster without any
#' consequence for memory usage because the memory is shared between the processors. In short,
#' algorithm-based parallelism provides a significant gain without any cost for your R session and
#' your system (but obviously there is a greater workload for the processors). By default lidR uses
#' half of your cores but you can control this with \link{set_lidr_threads}. For example, the \link{lmf}
#' algorithm is natively parallel. The following code is computed in parallel:
#' \preformatted{
#' las  <- readLAS("file.las")
#' tops <- locate_trees(las, lmf(2))
#' }
#' However, as stated above, not all algorithms are parallelized or even parallelizable. For example,
#' \link{li2012} is not parallelized. The following code is computed in serial:
#' \preformatted{
#' las <- readLAS("file.las")
#' dtm <- segment_trees(las, li2012())
#' }
#' To know which algorithms are parallelized users can refer to the documentation or use the
#' function \link{is.parallelised}.
#' \preformatted{
#' is.parallelised(lmf(2))   #> TRUE
#' is.parallelised(li2012()) #> FALSE
#' }
#' @section Chunk-based parallelism:
#' When processing a `LAScatalog`, the internal engine splits the dataset into chunks and each chunk is
#' read and processed sequentially in a loop. This loop can be parallelized with the
#' `future` package. By default the chunks are processed sequentially, but they can be processed
#' in parallel by registering an evaluation strategy. For example, the following code is evaluated
#' sequentially:
#' \preformatted{
#' ctg <- readLAScatalog("folder/")
#' out <- pixel_metrics(ctg, ~mean(Z))
#' }
#' But this one is evaluated in parallel with two cores:
#' \preformatted{
#' library(future)
#' plan(multisession, workers = 2L)
#' ctg <- readLAScatalog("folder/")
#' out <- pixel_metrics(ctg, ~mean(Z))
#' }
#' With chunk-based parallelism any algorithm can be parallelized by processing several subsets of
#' a dataset.  However, there is a strong cost associated with this type of parallelism. When processing several
#' chunks at a time, the computer needs to load the corresponding point clouds. Assuming the user processes
#' one square kilometer chunks in parallel with 4 cores, then 4 chunks are loaded in the computer
#' memory. This may be too much and the speed-up is not guaranteed since there is some overhead involved in
#' reading several files at a time. Once this point is understood, chunk-based parallelism is very
#' powerful since all the algorithms can be parallelized whether or not they are natively parallel.
#' It also allows to parallelize the computation on several machines on the network or to work on a HPC.
#'
#' @section Nested parallelism - part 1:
#' Previous sections stated that some algorithms are natively parallel, such as \link{lmf}, and some are
#' not, such as \link{li2012}. Anyway, users can split the dataset into chunks to process them simultaneously
#' with the LAScatalog processing engine. Let's assume that the user's computer has four cores,
#' what happens in this case:
#' \preformatted{
#' library(future)
#' plan(multisession, workers = 4L)
#' set_lidr_threads(4L)
#' ctg <- readLAScatalog("folder/")
#' out <- locate_trees(ctg, lmf(2))
#' }
#' Here the catalog will be split into chunks that will be processed in parallel. And each computation
#' itself implies a parallelized task. This is a nested parallelism task and it is dangerous! Hopefully
#'  the lidR package handles such cases and chooses by default to give precedence to chunk-based
#'  parallelism. In this case chunks will be processed in parallel and the points will be processed
#'  serially by disabling OpenMP.
#'
#' @section Nested parallelism - part 2:
#' We explained rules of precedence. But actually the user can tune the engine more accurately. Let's
#' define the following function:
#' \preformatted{
#' myfun = function(las, ws, ...)
#' {
#'   las  <- normalize_height(las, tin())
#'   tops <- locate_tree(las, lmf(ws))
#'   return(tops)
#' }
#'
#' out <- catalog_map(ctg, myfun, ws = 5)
#' }
#' This function used two algorithms, one is partially parallelized (`tin`) and one is fully
#' parallelized `lmf`. The user can manually use both OpenMP and future. By default the engine
#' will give precedence to chunk-based parallelism because it works in all cases but the user can
#' impose something else. In the following 2 workers are attributed to future and 2 workers are
#' attributed to OpenMP.
#' \preformatted{
#' plan(multisession, workers = 2L)
#' set_lidr_threads(2L)
#' catalog_map(ctg, myfun, ws = 5)
#' }
#' The rule is simple. If the number of workers needed is greater than the number of
#' available workers then OpenMP is disabled. Let suppose we have a 4 cores:
#' \preformatted{
#' # 2 chunks 2 threads: OK
#' plan(multisession, workers = 2L)
#' set_lidr_threads(2L)
#'
#' # 4 chunks 1 threads: OK
#' plan(multisession, workers = 4L)
#' set_lidr_threads(1L)
#'
#' # 1 chunks 4 threads: OK
#' plan(sequential)
#' set_lidr_threads(4L)
#'
#' # 3 chunks 2 threads: NOT OK
#' # Needs 6 workers, OpenMP threads are set to 1 i.e. sequential processing
#' plan(multisession, workers = 3L)
#' set_lidr_threads(2L)
#' }
#'
#' @section Complex computing architectures:
#' For more complex processing architectures such as multiple computers controlled remotely
#' or HPC a finer tuning might be necessary. Using
#' \preformatted{
#' options(lidR.check.nested.parallelism = FALSE)
#' }
#' lidR will no longer check for nested parallelism and will never automatically disable OpenMP.
#' @md
#' @name lidR-parallelism
NULL
Jean-Romain/lidR documentation built on April 6, 2024, 9:41 p.m.