docs/text_old_text.md

Data processing and analysis

The main advantage of high-throughput tracking datasets, their large volume and high resolution, is also the key challenge in their processing, rendering manual data cleaning and analysis prohibitively time consuming (110, 111). Fields accustomed to massive “big data” datasets, such as genomics (112) and remote sensing (113), may inspire solutions; among these are robust exploratory data analysis (EDA. Fig. 6E), and automated processing steps organized into reproducible computational pipelines (112, 114). EDA such as heatmaps of localizations in space and time, or plotting individual tracks and distributions of key movement metrics such as speed, is a crucial first step to identify key patterns in the ecological processes observed, as well as location errors such as outliers (Fig. 6E, 6F).

HTME tracking data can then be passed through a pre-processing pipeline which prepares them for statistical analyses by filtering out movement that would be unrealistic for the study species (110, 111), after which the true animal path can be approximated from the localizations by either applying a median smooth (110) (Fig. 6F), or by fitting a movement model, such as a continuous-time correlated random walk (64, 80, 87, 115) (Fig. 6G). The pipeline’s component steps (usually, functions in a programming language such as R) (116) should be parameterized on small subsets of the data to ensure realistic and acceptable results, before passing them the full data. Similar pipelines can be built for supplementary data sources such as 3D accelerometry, which can be summarized into overall or vector dynamic body acceleration, and possibly subsampled at a biologically relevant scale (117) (Fig. 6H).

The reproducibility of processing pipelines can be improved by unit testing component steps to check that they handle data correctly (118). Adopting version control, test-driven development, and continuous integration testing can help ensure that changes to the pipeline code do not cause processing errors (112, 118). Pipeline efficiency can be increased in numerous ways to allow even very large datasets to be processed on conventional computing hardware, by processing small subsets of the data at a time, careful programming to avoid inefficiently copying data during computation, use of a compiled language or parallel computing techniques such as multi-threading or a computing cluster (112, 113).

Analytical approaches for HTME data include, inter alia, relatively simple home range analyses [17] (Fig. 6H), social network analysis (50), integrated step-selection functions (119) (Fig. 6J), but also complex individual-level or even group-dynamic movement models such as stochastic differential equations or (hierarchical) hidden Markov models (120, 121) (Fig. 6I). HTME data allows researchers to apply these modelling frameworks at essentially any scale of animal decision-making, and thereby to draw increasingly detailed pictures of behavioral processes, for example by incorporating the exact times of directional turns into step-selection analyses (122), by explicitly incorporating physiological processes into the movement model (123), or by relating an individual’s movement to that of conspecifics nearby (124). On the other hand, key challenges specific to HTME data are very strong temporal and spatial correlations in movement metrics, noisiness, nonlinearity and nonnormality, statistical artifacts resulting from the high-resolution sampling (e.g., an inflation of zero turning angles), and, most importantly, the data volume and the associated computational burden (125).

In particular, for HTME data, scalability, robustness and parallelizability of the methods are crucially important. For example, a multi-stage approach using a machine learning algorithm to identify behavioral modes (19, 126), followed by a regression analysis relating the behaviors to environmental features, will typically be feasible for HTME data, while fitting “complete” mechanistic movement models is still computationally challenging. As a consequence, researchers working with HTME data will often face a trade-off between models that aim to adequately mimic the data-generating process but are challenging to work with, and simpler approaches that are easier to implement and computationally more feasible but may oversimplify the biological process, or may suffer from statistical shortcomings such as a lack of uncertainty propagation or inadequate modelling of the correlation structure (120).



pratikunterwegs/htme documentation built on May 19, 2021, 10:04 a.m.