knitr::opts_chunk$set(echo = TRUE)

This vignette demonstrates how to use the seedclim community database. This is useful for analysing the community data, but also for accessing the taxon or site tables used by other data types.

The code in this vignette can be copied and modified to form the data importing section of your own code.

Downloading the database

You only need to do this once, unless you want to check for an updated version of the data.

library("dataDownloader")
library("here")

# get_file will only download the file if it is missing or it needs updating
node <- "npfa9"
path <- "community/data/"
file <- "seedclim.2020.4.15.zip"
get_file(node = node, 
         file = file,
         path = here(path),
         remote_path = "3_Community_data")

#unzip
unzip(here(path, file), exdir = here(path))

Connecting to the database

The seedclim community database is an SQLite database. You can connect to the database with DBI::dbConnect

#load packages
library("tidyverse")
library("DBI")# also needs RSQLite installed

#make database connection
con <- dbConnect(RSQLite::SQLite(), here(path, "seedclim.sqlite"))

Database structure

Database tables

tables <- dbListTables(conn = con)
tables

See the data dictionary for information about the contents of each table.

Data model

#devtools::install_github("cynkra/dm") need dev version
library("dm")

dm_from_src(con, learn_keys = FALSE) %>%
  dm_add_pk(sites, siteID) %>% 
  dm_add_fk(blocks, siteID, sites) %>% 
  dm_add_fk(site_attributes, siteID, sites) %>% 
  dm_add_pk(site_attributes, c(siteID, attribute)) %>% 
  dm_add_pk(blocks, blockID) %>% 
  dm_add_fk(plots, blockID, blocks) %>% 
  dm_add_pk(plots, plotID) %>% 
  dm_add_fk(turfs, originPlotID, plots) %>% 
  dm_add_fk(turfs, destinationPlotID, plots) %>% 

  dm_add_pk(turfs, turfID) %>% 
  dm_add_fk(turf_community, turfID, turfs) %>% 
  dm_add_fk(subturf_community, turfID, turfs) %>% 
  dm_add_fk(turf_environment, turfID, turfs) %>% 
  dm_add_fk(subturf_environment, turfID, turfs) %>%

  dm_add_pk(taxon, species) %>% 
  dm_add_fk(turf_community, species, taxon) %>% 
  dm_add_fk(subturf_community, species, taxon) %>%
  dm_add_pk(turf_community, c("turfID", "species", "year")) %>%  
  dm_add_pk(turf_environment, c("turfID", "year")) %>% 
  dm_add_pk(subturf_community, c("turfID", "subturf", "species", "year")) %>%  
  dm_add_pk(subturf_environment, c("turfID", "subturf", "year")) %>% 

  dm_add_fk(species_attributes, species, taxon) %>% 
  dm_add_pk(species_attributes, c(species, attribute)) %>% 
  dm_add_pk(attributes, attribute) %>% 
  dm_add_fk(site_attributes, attribute, attributes) %>% 
  dm_add_fk(species_attributes, attribute, attributes) %>% 

  dm_draw(view_type = "keys_only")

Importing data

Data can be imported with dplyr::tbl. The first argument is the database connection, the second is the name of the table Then the data can be manipulated with the full range of tidyverse functions. Remember to use dplyr::collect to collect all the data rather than just the first few rows.

taxa <- tbl(src = con, "taxon") %>% 
  collect

You can also use a SQL statement in wrapped in sql(), should you really want to.

tbl(src = con, sql("SELECT * FROM taxon"))

Importing cover data

To import the cover data, we need to import data from several tables and join them. The transplanted turfs have an origin site and a destination site (connected via the plots and blocks), so we need to join site_block_plot to the turf data twice, once with originPlotID and once with destinationPlotID.

sites_blocks_plots <- tbl(con, "sites") %>% 
  inner_join(tbl(con, "blocks"), by = "siteID") %>% 
  inner_join(tbl(con, "plots"), by = "blockID") %>% 
  select(siteID, norwegian_name, annual_precipitation_gridded, summer_temperature_gridded,
         temperature_level, precipitation_level, blockID, plotID)

cover_thin <- sites_blocks_plots %>% 
  inner_join(tbl(con, "turfs"), by = c("plotID" = "originPlotID")) %>% 
  inner_join(sites_blocks_plots, by = c("destinationPlotID" = "plotID"), 
             suffix = c("_origin", "_dest")) %>% 
  inner_join(tbl(con, "turf_community"), by = "turfID") %>% 
  select(-plotID, -destinationPlotID, -cover_raw) 

cover_thin %>%
  collect()

Now we need to filter the data used for each experiment.

The other experiments are weird - will be fixed

Control turfs can be the control for multiple experiments.

cover_thin_ttreat <- cover_thin %>% 
  filter(
    TTtreat != "", # only TTtreat. Also remove NA values from this field
    year != 2010  # no TTtreat data for 2010, just other experiments sharing the turfs
  ) %>%
  collect()

Importing turf environment or subturf data

Subturf community data and environment data at turf or subturf scales can be imported in a either by substituting their table name for turf_community in the above code, or using a join to add them.

cover_thin %>% 
  left_join(tbl(con, "subturf_community"), 
            by = c("turfID", "year", "species")) %>%
  collect()


Between-the-Fjords/seedclim-data documentation built on March 6, 2023, 7:53 a.m.