In Between-the-Fjords/seedclim-data: Code for Cleaning SeedClim data

knitr::opts_chunk$set(echo = TRUE)

This vignette demonstrates how to use the seedclim community database. This is useful for analysing the community data, but also for accessing the taxon or site tables used by other data types.

The code in this vignette can be copied and modified to form the data importing section of your own code.

Downloading the database

You only need to do this once, unless you want to check for an updated version of the data.

library("dataDownloader")
library("here")

# get_file will only download the file if it is missing or it needs updating
node <- "npfa9"
path <- "community/data/"
file <- "seedclim.2020.4.15.zip"
get_file(node = node, 
         file = file,
         path = here(path),
         remote_path = "3_Community_data")

#unzip
unzip(here(path, file), exdir = here(path))

Connecting to the database

The seedclim community database is an SQLite database. You can connect to the database with DBI::dbConnect

#load packages
library("tidyverse")
library("DBI")# also needs RSQLite installed

#make database connection
con <- dbConnect(RSQLite::SQLite(), here(path, "seedclim.sqlite"))

Database structure

Database tables

tables <- dbListTables(conn = con)
tables

See the data dictionary for information about the contents of each table.

Data model

#devtools::install_github("cynkra/dm") need dev version
library("dm")

dm_from_src(con, learn_keys = FALSE) %>%
  dm_add_pk(sites, siteID) %>% 
  dm_add_fk(blocks, siteID, sites) %>% 
  dm_add_fk(site_attributes, siteID, sites) %>% 
  dm_add_pk(site_attributes, c(siteID, attribute)) %>% 
  dm_add_pk(blocks, blockID) %>% 
  dm_add_fk(plots, blockID, blocks) %>% 
  dm_add_pk(plots, plotID) %>% 
  dm_add_fk(turfs, originPlotID, plots) %>% 
  dm_add_fk(turfs, destinationPlotID, plots) %>% 

  dm_add_pk(turfs, turfID) %>% 
  dm_add_fk(turf_community, turfID, turfs) %>% 
  dm_add_fk(subturf_community, turfID, turfs) %>% 
  dm_add_fk(turf_environment, turfID, turfs) %>% 
  dm_add_fk(subturf_environment, turfID, turfs) %>%

  dm_add_pk(taxon, species) %>% 
  dm_add_fk(turf_community, species, taxon) %>% 
  dm_add_fk(subturf_community, species, taxon) %>%
  dm_add_pk(turf_community, c("turfID", "species", "year")) %>%  
  dm_add_pk(turf_environment, c("turfID", "year")) %>% 
  dm_add_pk(subturf_community, c("turfID", "subturf", "species", "year")) %>%  
  dm_add_pk(subturf_environment, c("turfID", "subturf", "year")) %>% 

  dm_add_fk(species_attributes, species, taxon) %>% 
  dm_add_pk(species_attributes, c(species, attribute)) %>% 
  dm_add_pk(attributes, attribute) %>% 
  dm_add_fk(site_attributes, attribute, attributes) %>% 
  dm_add_fk(species_attributes, attribute, attributes) %>% 

  dm_draw(view_type = "keys_only")

Importing data

Data can be imported with dplyr::tbl. The first argument is the database connection, the second is the name of the table Then the data can be manipulated with the full range of tidyverse functions. Remember to use dplyr::collect to collect all the data rather than just the first few rows.

taxa <- tbl(src = con, "taxon") %>% 
  collect

You can also use a SQL statement in wrapped in sql(), should you really want to.

tbl(src = con, sql("SELECT * FROM taxon"))

Importing cover data

To import the cover data, we need to import data from several tables and join them. The transplanted turfs have an origin site and a destination site (connected via the plots and blocks), so we need to join site_block_plot to the turf data twice, once with originPlotID and once with destinationPlotID.

sites_blocks_plots <- tbl(con, "sites") %>% 
  inner_join(tbl(con, "blocks"), by = "siteID") %>% 
  inner_join(tbl(con, "plots"), by = "blockID") %>% 
  select(siteID, norwegian_name, annual_precipitation_gridded, summer_temperature_gridded,
         temperature_level, precipitation_level, blockID, plotID)

cover_thin <- sites_blocks_plots %>% 
  inner_join(tbl(con, "turfs"), by = c("plotID" = "originPlotID")) %>% 
  inner_join(sites_blocks_plots, by = c("destinationPlotID" = "plotID"), 
             suffix = c("_origin", "_dest")) %>% 
  inner_join(tbl(con, "turf_community"), by = "turfID") %>% 
  select(-plotID, -destinationPlotID, -cover_raw) 

cover_thin %>%
  collect()

Now we need to filter the data used for each experiment.

The turf transplant experiment turfs have an entry in the TTtreat field.

The other experiments are weird - will be fixed

Control turfs can be the control for multiple experiments.

cover_thin_ttreat <- cover_thin %>% 
  filter(
    TTtreat != "", # only TTtreat. Also remove NA values from this field
    year != 2010  # no TTtreat data for 2010, just other experiments sharing the turfs
  ) %>%
  collect()

Importing turf environment or subturf data

Subturf community data and environment data at turf or subturf scales can be imported in a either by substituting their table name for turf_community in the above code, or using a join to add them.

cover_thin %>% 
  left_join(tbl(con, "subturf_community"), 
            by = c("turfID", "year", "species")) %>%
  collect()

Between-the-Fjords/seedclim-data documentation built on March 6, 2023, 7:53 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Between-the-Fjords/seedclim-data
Code for Cleaning SeedClim data

In Between-the-Fjords/seedclim-data: Code for Cleaning SeedClim data

Downloading the database

Connecting to the database

Database structure

Database tables

Data model

Importing data

Importing cover data

Importing turf environment or subturf data

R Package Documentation

Browse R Packages

We want your feedback!

Between-the-Fjords/seedclim-data Code for Cleaning SeedClim data

In Between-the-Fjords/seedclim-data: Code for Cleaning SeedClim data

Downloading the database

Connecting to the database

Database structure

Database tables

Data model

Importing data

Importing cover data

Importing turf environment or subturf data

R Package Documentation

Browse R Packages

We want your feedback!

Between-the-Fjords/seedclim-data
Code for Cleaning SeedClim data