knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Purpose of this package

The natronR R-package has been developed for employees at the University Museum of the Norwegian University of Science and Technology (NTNU), and it serves to ease and standardize the process of uploading datasets to the NaTRON database. NaTRON is the internal database for the NTNU University Museum. It stores natural history data, including biodiversity and ecological datasets. NaTRON is therefore different from the MUSIT system which store mostly data on objects in the museum collections. All natural history data to be preserved should be uploaded to the NaTRON database, and the database have a generalized structure in order to support most datatypes (botanical, zoological, collection objects, non-biological environmental data, GIS-data).

This package was built to import location-event data, and is in its current version not designed for large datasets containing repeated measurements (e.g data from dataloggers, biotelemetric studies etc.).

Content

The process of uploading data to NaTRON using natronR consist of several steps:

Step 1. Restructure

The first step to getting your data on to NaTRON is to restructure the dataset from wide to long format to be compatible with the functions in the natronR R-package. Most datasets are punched in Excel in a wide format (ie lots of columns and few rows; see examples at the bottom of this vignette), but NaTRON and most other databases prefer the long format. This step is not covered by this package and must be done manually. We recoment looking into reshape2 and dplyr for this type of data wrangling. Your dataset should look like our example dataset:

# load data
data("setesdal")

# inspect data
head(setesdal)

# see descriptio of the data
?setesdal

Step 2. Connect

Next you need to establish a connection to the NaTRON database. Using the natron_connect function makes it easy to connect to NaTRON in R. The functions takes two arguments -- your user name and the database (defults to the sandbox). A pop-up window will appear where you are asked for your password. Save your connection to the environment so that it can be used in subsequent functions. The default is to connect to the NaTRON-sandbox (test-environment). Always test uploads to the sandbox and satisfy yourself with the result before running uploads to the production database. For connecting to the production environment use the argument "database='natron'". Example of use:

myConnection <- natron_connect("YOUR-USERNAME-HERE")

# connect to production database
myConnection <- natron_connect("YOUR-USERNAME-HERE", database="natron")

Talk to Marc Daverdin about NaTRON access.

Step 3. Make location table

Make a dataframe with your unique locations (ie unique coordinates) and additional location specific metedata such as elevation, varbatim coordinates and place names. The returned dataset is structure like the NaTRON table 'Locations' so that it can easily be uploaded later.

myLocationTable <- location_table(data = setesdal, 
                                  conn = myConnection,
                                  username = "YOUR-USERNAME-HERE")

This step includes a manual check of correct mapping, ie that all the columns that you want to have in the location table is mapped/matched to the correct NaTRON (DwC) terms. Read the textual output from the function carefully.

Note that all you locations have also been given a UUID (unique machine readable identifier) in the locationID column.

Step 3.1. View the locations on a map

Plotting the coordinates an a map help us check that they are correctly entered.

library(ggmap)
map_locations(data = myLocationTable)

# Or plot it vertically like this:
map_locations(data = myLocationTable, vertical = T)

Step 3.2 (optional). Find existing NaTRON locations

If you know or suspect that the locations in your data already exists in NaTRON then you probably want to reuse these locations and the unique locationIDs so that you don't break up any time series data for example. radius_scan() takes as input a dataframe that contains a minimum of two columns: decimalLatitude and decimalLongitude. It then looks for coordinates in the NaTRON location table within the radius you define and returns them in another dataframe where your original coordinates and locations names are to right. The distance to the closest NaTRON location is given in km.

radius <- 8000 #setting it very high here just to make a point

# this may take a little while:
scan <- radius_scan(locationTable = myLocationTable, conn = myConnection, radius = radius) 

# If you get a return then you may want to investigate it:
View(scan)

Step 3.2.2. Plot NaTRON locations agains original locations

If you have done step 3.2 you can also use map_locations() to visually check the proximity of your locations and the existing NaTRON locations:

# Use the 'compare' argument:
map_locations(data = myLocationTable, compare = scan)

Step 3.2.3. Modify location table

THIS FUNCTIONALITY IS NOT YET AVAILABLE

If some or all of your locations already exist in NaTRON (typicall for time series data) then you should idealy not upload them again to NaTRON. Thats a waste of storage space and it makes it harder to join together data from the same locations. However, the following functions that create event and occurence tables needs to link to the NaTRON locations somehow. One term (column) that could be used to link these tables together is locationID which consists of UUIDs. These are not human friendly values and so we need an automated but safe way of harnessing them from NaTRON . radius_scan() is one way perhaps, but it has proven difficult to make it reliable.

Step 4. Upload locations

THIS FUNCTIONALITY IS NOT YET AVAILABLE

All locations must be present on NaTRON before the 'real data' is uploaded and you will not be able to upload event-data that don't have a corresponding locationID in the database. The link between the 'real data' (ie event data or occurence data) and the location specific metadata is ensured by the relational nature of NaTRON.

The upsert_location function will let you upload new locations to the NaTRON database. The format of the data to be uploaded must exactly the same as the NaTRON database table. Example of use:

upsert_locations(myLocationTable, conn = myConnection)

5. Structure event-data

This step organises your event data table to be compatible and ready to upload to NaTRON. The event data table needs to look just like the NaTRON format, eg the columns in the same order etc. This step includes fetching and attaching the correct locationID from the locations datatable.

myEvents <- str_map_events(data = setesdal, 
                conn = myConnection, 
                location_table = myLocationTable)

Read the output carefully to make sure you have kept all the columns you wanted.

6. Upload structured event data to NaTron

THIS FUNCTIONALITY IS NOT YET AVAILABLE

This step uploads your structured event data table to NaTRON. For occurence datasets you may want to skipp this and the preceeding step.

upsert_events(myEvents, myConnection)

7. Check scientific names

NaTRON has a Taxa table with information for most of the species you can encunter in Norway. The occurence table needs to link to this table via the taxonID field. To accomplish this the species names in your dataset needs to be spelt correctly and to be an accepted name for that species according to Artsdatabanken.

7.1 Compare against Artsdatabanken (navnevask)

Save yourself a lot of time by runing the scientificNames column in your dataset through the _navnevask_function on Artsdatabanken: https://www.artsdatabanken.no/Pages/225532/Navnevask?Key=14 Update your dataset occoringly and import it again.

7.2 Compare against NaTRON

This final check will compare your scientific species names against the NaTron taxa register and perform a fuzzy match to return both perfect and non-perfect matches. Look through the dataframe to find mistakes and change the species names in the raw data if needed. For this example, the scientific names (ie Picea abies) are in a column named scientificName.

myNames <- comp_names(
           names = setesdal$scientificName, 
           conn = myConnection)
View(myNames)

If a good match for your species cannot be found then you need to add it to NaTron. This is for now done manually here: https://natron.vm.ntnu.no/datacollection/, or if your working on the sandbox: https://natron.vm.ntnu.no/sandbox/

7.3 Retrieve taxonIDs from NaTRON

When you are sure the species names are all in order, you can then use the get-taxonID function to retrieve the taxonIDs from NaTron and put these UUIDs into your dataframe, like so:

setesdal$taxonID <- get_taxonID(
                    names = setesdal$scientificName, 
                    conn = myConnection)

Note the warning this produces -- because this example data has not been cleaned properly that means there are some entries in the scientificName column in setesdal which do not have perfect matches in the NaTron taxa register and therefore the function returns NA for taxonID. You do not want to see this warning.

8. Structure occurrence-data

This step organises your occurrence data table to be compatible and ready to upload to NaTRON.

myOccurences <- str_map_occ(setesdal, myConnection, myLocationTable)

This produces a silent warning that scientificName has not been copied to the occurence table. That's because it is not a column name in NaTRON. If you want to keep the written names in this table you can for example use the organismName term.

colnames(setesdal)[colnames(setesdal) == "scientificName"] <- "organimsName"
myOccurences <- str_map_occ(setesdal, myConnection)

But remember, it's the taxonID term that actually documents the species identities in NaTron, so make sure this is included in the occurence atble before proceeding.

9. Upload structured occurence table to NaTRON

THIS FUNCTIONALITY IS NOT YET AVAILABLE

This step uploads your structured occurence table to NaTRON.

upsert_occ(myOccurences, myConnection)

Examples of stylized data tables

Example of wide data

| Sampling date | Sampling locations | Alchemilla alpina | Agrostis cappilaris | Andromeda polifolia | | :-------------: | :-------------: | :-------------: |:-------------: | :-----: | | 2010 | A | 12 | 6 | 2 | | 2011 | A | 3 | 17 | 4 | | 2011 | B | 15 | 2 | 6 |

Example of long data

| Sampling event | Sampling date | Sampling location | Species | | Quantity | | :-------------: | :-------------: |:-------------: | :-------------: |:-------------: | | 1 | 2010 |A |Alchemilla alpina | 12 | | 1 | 2010 |A |Agrostis cappilaris | 6 | | 1 | 2010 |A |Andromeda polifolia | 2 | | 2 | 2011 |A |Alchemilla alpina | 3 | | 2 | 2011 |A |Agrostis cappilaris | 17 | | 2 | 2011 |A |Andromeda polifolia | 4 | | 3 | 2011 |B |Alchemilla alpina | 15 | | 3 | 2011 |B |Agrostis cappilaris | 2 | | 3 | 2011 |B |Andromeda polifolia | 6 |

Example of location table

| locationID | Sampling location | Coordinates |:-------------: |:-------------: |:-------------: |a |A | x.xx |a |A | x.xx |a |A | x.xx |a |A | x.xx |a |A | x.xx |a |A | x.xx |b |B | x.xx |b |B | x.xx |b |B | x.xx

Example of event table

| eventID | Sampling event | Sampling date | locationID |
| :-------------: | :-------------: | :-------------: |:-------------:
| 1 | 1 | 2010 |a
| 1
| 1 | 2010 |a
| 1 | 1 | 2010 |a
| 2
| 2 | 2011 |a
| 2 | 2 | 2011 |a
| 2
| 2 | 2011 |a
| 3 | 3 | 2011 |b
| 3
| 3 | 2011 |b
| 3* | 3 | 2011 |b

Example of occurence table

| eventID | Species | Quantity | | :-------------: | :-------------: |:-------------: | | 1 |Alchemilla alpina | 12 | | 1 |Agrostis cappilaris | 6 | | 1 |Andromeda polifolia | 2 | | 2 |Alchemilla alpina | 3 | | 2 |Agrostis cappilaris | 17 | | 2 |Andromeda polifolia | 4 | | 3 |Alchemilla alpina | 15 | | 3 |Agrostis cappilaris | 2 | | 3* |Andromeda polifolia | 6 |



NTNU-VM/natronbatchupload documentation built on Oct. 12, 2019, 5:49 a.m.