In keltoskytoi/iSEGMound: iSEGMound - A Reproducible Workflow for Mound Detection in LiDAR-derived DTMs

Introduction

The use of quantitative methods in archaeology reach back to the end of the 19th century, but it was only after the middle of the 20th century that they became computer-based. The first applications of statistical-mathematical methods using computers revolutionized the handling of spatial and quantitative archaeological data (e.g. @goldmannSeriationChronologischerLeitfunde1979) and also the view on archaeology itself (e.g. @clarkeAnalyticalArchaeology1968 and almost a decade later @hodderSpatialAnalysisArchaeology1976, @ihmStatistikArchaologieProbleme1978 to begin with). This quantitative analytical view preceded and prepared archaeologists for the advent of the public availability of soon high resolution digital data sets which revolutionized and broadened Aerial Archaeology, drawing upon the expertise of Satellite Remote Sensing and branching into a new discipline: Archaeological Remote Sensing. These new data sources, including geophysical platforms, led to the need to handle - in archaeological terms - big spatial data, thus to the specialized use of GIS platforms and the development and application of new methods like predictive modelling (@vanleusenPredictiveModellingArchaeologcal2005; @kamermansArchaeologicalPredictionRisk2009). The sophistication of airborne platforms, sensors and imaging technologies, like LiDAR, hyper-spectral imagery and drone derived multispectral and hyper-spectral imagery (@agapiouRemoteSensingArchaeology2015; @luoAirborneSpaceborneRemote2019) facilitated the diversification of the toolset of Archaeological Remote Sensing. New technical developments continuously push specialists to look for, borrow and adapt new methods to analyse in archaeological sense big data, which first lead to automating tasks such as georeferencing or applying the same function on multiple rasters (@raunSystematicLiteratureReview2018, 70). Soon this lead to more complex automation – that is automating whole workflows and also analysis. Thus in contrast to other disciplines, Archaeological Remote Sensing and Archaeological Science itself was quite late in adopting and applying automated methods and thus, Automated Archaeological Remote Sensing is still in its infancy (@opitzRecentTrendsLongstanding2018, 30). Although accordingly, the expression ‘automated analysis’ is still controversial, it is nonetheless used throughout this Master’s thesis. It has to be emphasized that ‘automation’ does not stand for completely automated workflows. It was chosen as a short phrase to address workflows with at least partly – or mostly – automated elements, because no one advocates ‘automatic archaeology' (@cowleyNewOutOld2012 and @cowleyAutoextractionTechniquesCultural2013, 6). Automated analysis, often also called semi-automated analysis, ‘simply’ means that a part, often a major part of a specific workflow, is automated, meaning computer aided. Critical voices about predictive modelling in GIS ˗ which can be seen as the precursor and actual starting point of automated analysis ˗ were very fittingly summarized by @wheatleyMakingSpaceArchaeology2004: predictive modelling is dehumanising, anti-historical and substitutes human actions with mathematical equations @wheatleyMakingSpaceArchaeology2004). This distrust is reflected in the criticism towards automated analysis methods in the Archaeological and Archaeological Remote Sensing community, claiming, that archaeological projects are always locally specialized (@parcakSatelliteRemoteSensing2019, 120) and that there is no generalized automation method that can be used location independently, which can locate atypical archaeological objects without producing a quite large miss-rate (@casanaRegionalScaleArchaeologicalRemote2014, @casanaGlobalScaleArchaeologicalProspection2020, S93). Taking these points into account: why only use one type of automated analysis method (@davisGeographicDisparityMachine2020a, 5)? Large-scale landscapes feature lots of different landscape forms and archaeological features which need to be addressed in different ways. It is evident that it is not possible to detect everything with one analysis method or algorithm, due to the variability of the archaeological record. Other considerations include reflections on Automated Archaeological Remote Sensing detecting round and square shapes over and over and when it will move to something more useful that actually resembles archaeological objects (Rog Palmer in personal discussion and several AARGnews Editorials and contributions, e.g. AARGnews 62, 61)? It is also somehow self-evident, that although pre-trained and known object classes are going to be detected, new and unexpected and also not detected objects always have to be expected. Thus expecting to detect unexpected or unknown object classes with supervised learning/automation is misleading and completely unrealistic (see a similar discussion about the use of predictive modelling in @wheatleyMakingSpaceArchaeology2004, 3.2.3). It is a fatal move to expect automation to be the magic trick, when it is not: it should be more seen as an extension of the archaeologist’s toolbox. Because automated analysis is not to replace manual data evaluation which, based on the personal experience of the operator and them being human, can be prone to missing knowledge or error, or even archaeologists completely, but to be complementary. Automated analysis should be followed by field observation and identification for assessment, which on the other hand also can be biased for the same reason as manual data evaluation (@cowleyNewOutOld2012, @bennettDataExplosionTackling2014a, 899). All in all automated analysis is to ease and facilitate the amount of work archaeologists are facing with the increasing amount of data and different data types to analyse and evaluate. As @davisDefiningWhatWe2020, 3 emphasizes: automated analysis is precise and manual analysis is accurate. Thus both methods combined can help archaeologists to penetrate a hidden level of any data set and landscape, which is invisible or hardly visible even for the trained eye. Taking it further, reproducible analysis (@marwickComputationalReproducibilityArchaeological2017b, @rokemAssessingReproducibility2017, @marwickPackagingDataAnalytical2018a, @calerovaldezMakingReproducibleResearch2020) can help to facilitate automated analyses and can serve as control for human individuality: because archaeologists do see different things in the same data set and unconsciously see what they are familiar with and what they know (@cowleyNewOutOld2012 7), very similarly to an algorithm trained to find a certain class of object. Human detection cannot be reproduced but automated analysis can: via documented, reproducible workflows. In order to achieve this goal open-source scripting languages like R, Python or open-source platforms like Google Engine should be used, to be able to repeat, reproduce and even replicate workflows to facilitate their use by other scholars. When writing code, the semantic syntax has to be clear and consistent. This also makes sure that the ontology and the semantic syntax will be the same when the code is used by a different operator on a different dataset – in contrary to manual analysis, where independent, subjective manual operators would define features or objects differently – thus arriving to the same or a least similar bias and errors, of course depending on the dataset. This argument is investigated more thoroughly in chapter 3 and the expressions reproducible and replicable are discussed in more detail.

Being entirely different disciplines, there is a semantic gap between the ontologies of Computer Vision, Image Analysis and Remote Sensing on one hand and Archaeology on the other hand – especially because the archaeological record itself is fragmented, multifaceted and poses ontological problems in itself for the interoperability of projects or databases, conducted by different operators with possible different metalanguage. Although the aim of this thesis is not the creation of an ontological and semantic framework and/or a metalanguage for Automated Archaeological Remote Sensing, these points have to be discussed because they affect the way how automation can be harmonized with Archaeological Remote Sensing.
This fundamental difference is reflected in the fact, that a distinction has to be made between ‘object’ and ‘feature’ in sensu stricto automation, based on termini technici from Computer Vision, on which ground ‘object’ is referred to real-world entities in remote sensing images and ‘feature’ to elements of an image and of an object (@travigliaFindingCommonGround2016a, 12; and @lambersIntegratingRemoteSensing2019, 2), in contrary to the every-day use of the expression ‘archaeological feature’ at an excavation site or in reports. Reproducible code and workflows enable to conduct the same ‘procedure’ in a controlled environment and thus possible semantic problems which can result from how different operators see archaeological objects and features can be detected, distinguished and solved. Thus the accurate expert knowledge of manual analysis can be integrated in the precision and consistency of computational semantics (@davisDefiningWhatWe2020, Fig 1.).

knitr::include_graphics('C:/Users/kelto/Documents/iSEGMound/analysis/thesis/figures/Figure_1.png')

On the other hand, manual analysis can learn a lot from the systematic precision of reproducible code: defining variables, consistent nomenclature of objects/ forms/ features; creating functions: setting the relation of objects/ forms/ features. This needs the consistency of formalized, straightforward and clear definitions (also @magniniTheoryPracticeObjectbased2019, 13, @davisDefiningWhatWe2020). Thus ontology, semantics and metadata are very important tools which enable us to connect and share code, databases and research. By shared ontologies (knowledge representation), codified metalanguage protocols and rule-sets, not only would the transferability of expert knowledge be substantially eased between Archaeological disciplines and Automated Archaeological Remote Sensing, but also between its different sub disciplines, like Template Matching-, Geometric knowledge-, GeOBIA-, Machine Learning- based analyses (this distinction is explained in chapter 2).

So then, after all, how to semantically address the varying, unpredictable, fragmented, multifaceted and diachronic archaeological record (monuments and artefacts) which can be detected in remote sensing data with methods of Computer Vision and Image Analysis? Remote sensing data offers much more: they are diachronic time capsules, records of palimpsest landscapes, archaeological, micro- and macro-topographical, geomorphological and also recent, anthropomorphic traces (@magniniTheoryPracticeObjectbased2019, 12) in a multidimensional space. Thus a common ground has to be created (a shared diachronic formalized ontology) to fill the semantic gap between the metalanguage of human perception, the archaeological record (which is manifold, imbalanced and changing in time (conceptually and physically) and Computer Vision/Image Analysis. And a way has to be found to apply this new diachronic formalized ontology onto real world scenarios (using Automated Archaeological Remote Sensing). Real world (archaeological) scenarios are often extremely complex and thus robust simplification and formalization has to be executed to break down complexity and bring the domains in discussion together. @sevaraArchaeologicalFeatureClassification2014a already pioneered a conceptual framework in 2014, which found resonance and the formulation of a Diachronic Semantic Model (DSM) in @magniniTheoryPracticeObjectbased2019, which was successfully applied by the latter on a case study in 2021 (@magniniObjectBasedPredictiveModeling2021a). These fundamental studies are the first steps towards integrating Automated Archaeological Remote Sensing in the nervous system of Archaeological Science.

As concluding thoughts let a quote from @quintusEfficacyAnalyticalImportance2017 1 state: “many archaeological professionals who might have an interest in lidar-derived products do not have the technical experience to modify or create AFE (automated feature extraction) techniques for particular regions or environments.” This should not impact the scientific methods one chooses. With open-source, reproducible (regularly updated and version controlled) and code, with clear replicable workflows and published data and study (at least the manuscript in a pre-print repository) (Figure 3), every remote sensing archaeological professional should be able to tap into the possibilities of Automated Analysis. Thus open access and open source software should be used, training data sets and workflows released to be able to learn from each other and to progress, building on the experience of each other and not to have to start from scratch with every new project conducted.

knitr::include_graphics('C:/Users/kelto/Documents/iSEGMound/analysis/thesis/figures/Figure_2.png')

Thus we not only need to compare the accuracy and the true and false positive rate of different automated methods (@trierUsingDeepNeural2019a, Table 1) and the methods themselves (@davisComparisonAutomatedObject2019b), but we need to make a compendium of these methods with the key elements mentioned above. Combined with this, a baseline or best practice for the different methods should be defined that can be utilized by beginners. Thus open-access code, workflow and protocol, published (at least training) data sets and best practices are needed as reference and base line for other studies, stored in an open-source database or platform which can support open-source computer languages.

Different observations made above and questions based on these have led to the objective of this thesis. Firstly: based on scientific literature, on what grounds lies the dispute of automation in Archaeological Remote Sensing and how can it be relieved? Is the experience and knowledge gained transferable to other studies? How many scientific studies on burial mound detection worked transparently and retraceable (reproducible if not replicable) in any step of their workflow? Can experiences and knowledge from previous studies be used to create a reproducible and replicable workflow for burial mound detection?
Thus the aim of this Master’s thesis is to emphasize the applicability and usefulness of automation in dealing with big data sets in Archaeological Remote Sensing and the need for reproducible and replicable workflows and studies. The case study for this thesis is the detection of burial mounds in LiDAR data sets in R, to substantiate this point of view.

To approach these questions accordingly, after the discussion of Automated Archaeological Remote Sensing and its reception in Archaeological Science was discussed in Chapter I, the Introduction. In the next chapter (2) all published studies whose goal is to detect burial mounds or mound-like structures are discussed in terms of used methods, software, accessibility of code and data. It is also elaborated why burial mounds are chosen as Objects of Interest and which methods will be applied and why. Chapter 3 provides the toolbox for the thesis to implement the desiderata and requirements discussed in Chapter 1 and 2. Chapter 4 presents a reproducible workflow for the detection of burial mounds in LiDAR data. Chapter 5 (Results), Chapter 6 (Discussion) and Chapter 7 (Conclusion) completes the endeavour of this Master’s thesis.

keltoskytoi/iSEGMound documentation built on Dec. 21, 2021, 5:24 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

keltoskytoi/iSEGMound
iSEGMound - A Reproducible Workflow for Mound Detection in LiDAR-derived DTMs

In keltoskytoi/iSEGMound: iSEGMound - A Reproducible Workflow for Mound Detection in LiDAR-derived DTMs

Introduction

R Package Documentation

Browse R Packages

We want your feedback!

keltoskytoi/iSEGMound iSEGMound - A Reproducible Workflow for Mound Detection in LiDAR-derived DTMs

In keltoskytoi/iSEGMound: iSEGMound - A Reproducible Workflow for Mound Detection in LiDAR-derived DTMs

Introduction

R Package Documentation

Browse R Packages

We want your feedback!

keltoskytoi/iSEGMound
iSEGMound - A Reproducible Workflow for Mound Detection in LiDAR-derived DTMs