library(knitr) opts_chunk$set(echo = FALSE, include = TRUE) load('keepTrack.RData')
The raw data were received r format(keepTrack$dataDate, '%d %B %Y')
.
The raw data started with r keepTrack$nrowInitial
rows. As a result of de-duplication a total of r keepTrack$nrowInitial - keepTrack$nrowDeDup
rows were removed. We used the following columns as criteria to check for duplicates (i.e. if a record had equal values for all these columns it was deemed a duplicate):
keepTrack$keyCol
It should be noted that de-duplication happened after cleaning all species and geographic names as detailed below.
The following species names were corrected (i.e. changed from old_name
to new_name
):
kable(keepTrack$nameFix, row.names = FALSE)
Some island names were inconsistent. The original island names were
keepTrack$islandNameOld
The updated names are
keepTrack$islandNameNew
Some records had low spatial accuracy (designated with a C
in the ACC
column. Removing those records further eliminated r keepTrack$nrowDeDup - keepTrack$nrowBadACC
rows.
nIslandOut <- nrow(keepTrack$outsideIsland) islandsGood <- nIslandOut == 0 rec <- ifelse(nIslandOut == 1, 'record', 'records')
Furthermore, we checked that all records fall within the bounds of the islands they were reported from (e.g. a record from Hawai`i Island does indeed fall within the boundary of Hawai`i Island). We found r nIslandOut
r rec
falling outside the island polygons.
cat('These are the records falling outside the island polygons:')
kable(keepTrack$outsideIsland, row.names = FALSE)
cat('These records falling outside the island polygons will be removed unless they can be corrected.')
nNoDate <- nrow(keepTrack$noDate) anyNoDate <- nNoDate > 0 rec <- ifelse(nNoDate == 1, 'date', 'dates')
Dates were in multiple formats which have been standardized to YYYY-MM-DD
format. We checked for missing dates and found r nNoDate
missing r rec
.
cat('Records with missing dates are:')
kable(keepTrack$noDate, row.names = FALSE)
cat('These records with no collection date will be removed unless they can be corrected.')
nRecFinal <- with(keepTrack, nrowBadACC - nrow(noDate) - nrow(outsideIsland)) codeCode <- function(x) { sprintf('`%s`', x) }
The final dataset is saved as an R object of class r codeCode(keepTrack$class)
from the sp package [@sp] and has geographic coordinate reference system r codeCode(keepTrack$proj)
.
The final dataset contains r nRecFinal
records. Below we summarize changes between the raw data and filtered data.
The following localities were lost after filtering:
kable(keepTrack$geoLost, row.names = FALSE)
The below plot shows the differences between sample sizes per year
par(mar = c(3, 3, 0, 0) + 0.5, mgp = c(2, 0.75, 0), tcl = -0.05) plot(keepTrack$perYr[, c('year', 'nrec_initial')], type = 'l', lwd = 2, xlab = 'Year', ylab = 'Number of records') points(keepTrack$perYr[, c('year', 'nrec_final')], type = 'l', lwd = 2, col = 'red') legend('topleft', legend = c('Raw data', 'Post processing'), lty = 1, col = c('black', 'red'), lwd = 2, bty = 'n')
The below table shows the differences between sample sizes per species. This table should also be checked manually for misspelled names.
names(keepTrack$perSpp) <- c('species', 'raw data', 'post processing') kable(keepTrack$perSpp, row.names = FALSE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.