docs/salmon-data-integration-notes.md

Salmon data integration

Jim Tyhurst, Ph.D. 2019-05-20

I captured a few thoughts in outline form after attending The Second NPAFC-IYS Workshop on Salmon Ocean Ecology in a Changing Climate.

Table of Contents

  1. Independent aspects of a complete solution
  2. Possible solution 1: All data centralized in one database
  3. Possible solution 2: Distributed datasets with centralized metadata
  4. Competitive analysis
  5. Types of data
  6. Project tips
  7. Open Issues

Independent aspects of a complete solution

List of desired features for the overall system, although they should probably not be implemented in one monolithic piece of software:

Possible solution 1: All data centralized in one database

This solution is what has been prototyped so far.

Advantages of a centralized solution

Disadvantages of a centralized solution

Possible solution 2: Distributed datasets with centralized metadata

Advantages of a distributed solution

Disadvantages of a distributed solution

Competitive analysis

To do ...

How is this collection of datasets different than other systems that are already in use?

Organizations that already host open data

Disclaimer: I have not analyzed these sites in depth. However for most of them, I entered "salmon" in the search bar to confirm that at least some salmon researchers or government agencies are storing some data there.

Software that already exists for hosting a data portal

There are other tools for implementing a data portal, rather than building your own using Neo4j. Therefore, it is worth spending a short time to evaluate alternatives, even if it is necessary to build a new solution to meet the specific needs of the salmon community.

Disclaimer: I came across these previously when working on a project for public government (civic) data. I have not investigated these for relevance to scientific data, although I know that some scientific organizations have used CKAN as a portal to their public data.

Types of data

I heard many features mentioned during the workshop:

Project tips

Open Issues



int-salmon-data-lab/statusAndTrends documentation built on May 29, 2019, 1:23 p.m.