Data collection and management

Data collection

Since this pilot project did not aim to collect any new data, the profile presented is a monitoring of current knowledge allowing the characterization of marine vessel activities and valued components in the study area. The availability and accessibility of data has thus guided the work done. Data were collected in cooperation with the various collaborators involved in the project as well as with the collaboration and support of Transport Canada and the members of the project led by the St. Lawrence Action Plan project.

Data collection began by sending a list to various experts and collaborators involved in the project for annotation. This list consisted of subcategories of environmental stressors and valued components. Participants were asked to identify any known databases that could be used to characterize the stressors and valued components within the study area, and to identify any other subcategories or databases deemed relevant to the project. We then collated the comments and suggestions and began the process of collecting data relevant to the cumulative effects assessment.

Where possible, priority was given to the open and accessible data from federal and provincial data sharing platforms Open Government and Données Québec, as well as resources available through the St. Lawrence Global Observatory (SLGO). In this regard, searches were conducted on these platforms to identify relevant data that was not included in the previously shared list. By prioritizing open data, it will be easier to reproduce and update the assessment conducted by this pilot project. Appendix 1 presents all the data that was considered for the assessment. In that appendix, the data are described and the data selected for the assessment are identified. Data that were not selected were retained to document all the data that was explored for the assessment. The data that were not included were left out because they were covered by other selected databases or because they did not meet a specific assessment objective.

Particular attention was also paid to the knowledge of experts in their respective fields, as well as that of the various project partners. We thus were in direct discussion with resource persons in order to revise the various sections of the profile of the study area characterizing the environmental stressors and the valued components. The resource persons who contributed to the revision of these sections are specifically thanked at the beginning of the sections to which they contributed. A list of resource persons/organizations for this project is available in Appendix 2.

Close cooperation with First Nations was also an important component of the project for ensuring that we have the benefit of Indigenous knowledge and First Nations concerns related to the study area. We were thus in contact with representatives of the participating First Nations in order to obtain a spatial characterization of areas of cultural, heritage and archeological interest. See the section on areas of cultural, heritage and archeological interest for more details.

Data management and reproducibility

The aim of database management is to ensure the transparency of the cumulative effects assessments and that they are reproducible. Our approach is based on the principles FAIR (Findable, Accessible, Interoperable, Reusable), the purpose of which are to ensure that the data used is discoverable, accessible, interoperable and reusable. We therefore use programming tools, in particular the language R[^r]. There are a number of advantages to using programming tools such as ArcGIS[^arc]. They offer great flexibility, enabling us to very quickly integrate changes or new considerations without having to redo a number of steps in a complex process. This flexibility is not limited to analysis, since all steps of a project, from the integration of raw data to the production of reports, can be integrated and easily modified. It is then an easy matter to integrate comments or new recommendations arising from engagement processes, for example.

We also used GitHub[^git], a version control tool for the documentation, quality control, and development/change history of programming elements relevant to the entire project. We created an organization named CumulativeNavigationEffects. A listing of all reports written and seminars offered during the pilot project is available here. A public repository entitled ceanav serves as the research compendium and is the core of the assessment; a research compendium is a collection of all parts of a research project including text, figures, data, and code that ensures reproducibility of the study. A detailed description of the assessment’s research compendium structure is available in Appendix 3 and on the GitHub repository webpage ceanav.

While not including the data directly, the assessment’s research compendium contains all the resources making it possible to access and transform the raw data. It also contains the code allowing us to characterize the environmental stressors and valued components, and to create the analyses, figures, tables and the final report. Only sensitive data for which confidentiality agreements have been signed remain inaccessible; however, these are catalogued in the report and accompanying documents so that a user or reader can know the type and source of data used in the analyses, as well as the resource persons to contact to obtain more information about the data used. The data incorporated into the study grid enabling us to conduct the cumulative effects assessment are open and available through the pilot project’s main repository ceanav; we are also working with SLGO to make them available on their portal.

meta <- read.csv("../data/data-metadata/data_summary.csv")

The databases considered for the assessment are identified by a unique identifier in the form #### for each database considered: the unique identifier is available in Appendix 1. For example, the database 0001 corresponds to an eelgrass inventory (Zostera marina) for James Bay, Baie des Chaleurs, the Estuary and the Gulf of St. Lawrence [@mpo2009]. These unique identifiers are used to identify, directly on the figures, the databases that were used to characterize the categories of environmental stressors and valued components presented in sections \@ref(strportrait) and \@ref(cvportrait); the list of unique identifiers can be viewed in the bottom right margin of each figure. In addition, all data and metadata included in the assessment’s research compendium use these unique identifiers to reference the databases considered. In total, r nrow(meta) databases were considered for the assessment and are accessible through the research compendium; of these, r sum(meta$ana == "X", na.rm = TRUE) were used to conduct the cumulative effects assessment of marine vessel activities on the valued components selected. See Appendix 1 for more details on these databases. As with for the data, the various organizations and experts consulted for this project, or custodians of the open databases used, are identified by a unique identifier of the form #### available in Appendix 2.

[^r]: R is a free software environment for statistics, data science and graphics (https://www.r-project.org/) [^arc]: ArcGIS is a geographic information software (GIS) suite developed by the US company Esri (https://www.arcgis.com/index.html) [^git]: GitHub is a web-based software development hosting and management service with over 40 million users worldwide (https://github.com/).



EffetsCumulatifsNavigation/ceanav documentation built on April 17, 2023, 1:02 p.m.