knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(avesdemazan)
# Data cleaning library(dplyr) library(tidyr) library(reshape2) library(stringi) library(knitr)
data-raw
folder of this repositoryThe 3 files that are loaded in:
ecuador_5
. Primary analysis data. Contains net hours, total captures by species, net hours, captures per 1000 net hours, etc.ecuador_working
. Cleaned individual-level data which allows determination of e.g. total number of captures including banded. birds, etc.Ecuador_Species_List
: List of species and their traits. Saved as spp_trt
# Load data ## ? data("ecuador_5") # dim: 6860 x 11 ## ? data("ecuador_working")# dim: 4708 x 23 ## ? # ecuador_pc <- read.csv("ecuador_pc.csv") # ? ## Species list data("Ecuador_Species_List") spp_trt <- Ecuador_Species_List
Net hours are in the ecuador_5 dataframe
names(ecuador_5)[grep("net",names(ecuador_5))] names(ecuador_5)[grep("Loc",names(ecuador_5))]
Location: There were two "locations" and two sampling sites within each location for a total of four sites.
here are data from SANA for only 5 out of the 11 years; data from this site were excluded from analysis. The remaining three sites have data from 2006-2016.
There are 11 observations from LLAV in March, 2017 (relative to hundreds of observations from other years); these 11 observations were excluded from analysis.
habitats <- data.frame(Site = c("LLAV","MASE","MAIN"), `Habitat Type` = c("Secondary forest", "Primary forest", "Introduced forest")) colnames(habitats)[2] <- "Habitat Type" kable(habitats, caption = "Habitat type for 3 locations in working dataset")
Session Sampling occurred in three sessions of approximately 6 days each. The metadata associated with the original dataset state that each sampling session was two days; however, the dates associated with bandings and recaptures consistently occur over approximately 6 days (range: 5-8). The 6 days are not necessarily consecutive. The exact timing of the first, second, and third sessions varied by year but occurred in the following windows: the first session was between March 21 and May 6, the second session was between July 19 and September 9, and the third session was between October 30 and December 20. In two cases (segunda 13, primera 16), sampling occurred at a time different enough from other years that data were excluded from analysis to maintain the validity of grouping variables in the regression models. Most of the tercera 13 observations were temporally consistent with the other years; 70 observations from tercera 13 occurred much later and were excluded from analysis. The working dataset is visualized below. Data from within the three black boxes were included in the analysis. **
# knitr::include_graphics("Picture1.png")
The ecuador_working .csv is not fully formatted and used is only for basic data number of captures etc, including the un-tagged birds, etc
ecuador_id <- ecuador_working n_ids <- length(unique(ecuador_working$Band.Number)) n_unb <- length(which(ecuador_id$Band.Size == "UNB")) n_cap <- length(ecuador_id$X) n_spp <- length(unique(ecuador_id$Specie.Code))
There are r n_spp
unique species. There are r n_ids
unique, banded birds; this value was calculated by finding the number of unique band numbers. There are r n_unb
unbanded birds. There are r n_cap
total captures.
yearsite <- ecuador_id %>% group_by(.dots = c("Band.Number","Specie.Code", "year", "Location")) %>% summarise(N = n()) %>% group_by(.dots = c("Specie.Code", "year", "Location")) %>% summarise(N = n()) %>% group_by(.dots = c("year", "Location")) %>% summarise(N = n()) %>% spread(Location, N) colnames(yearsite) <- c("Year", "Secondary Forest", "Primary Forest", "Introduced Forest") kable(yearsite, caption = "Total unique species by year and location") site <- ecuador_id %>% group_by(.dots = c("Band.Number", "Specie.Code", "Location")) %>% summarise(N = n()) %>% group_by(.dots = c("Specie.Code", "Location")) %>% summarise(N = n()) %>% group_by(.dots = c("Location")) %>% summarise(N = n()) kable(site, caption = "Total unique species by location") year <- ecuador_id %>% group_by(.dots = c("Band.Number", "Specie.Code", "year")) %>% summarise(N = n()) %>% group_by(.dots = c("Specie.Code", "year")) %>% summarise(N = n()) %>% group_by(.dots = c("year")) %>% summarise(N = n()) colnames(year)[1] <- "Year" kable(year, caption = "Total unique species by year")
# Reshape data cast.by.id <- dcast(data = ecuador_id, formula = year + Location + Session ~ Band.Number, value.var = "Band.Number", fun.aggregate = length) # Convert to long format cast.by.id.long <- cast.by.id %>% gather("Band.Number", "n", 4:3603) cast.by.id.long$pres.abs <- ifelse(cast.by.id.long$n > 0, 1, 0) # NOTE: this dataframe won't contain data from birds that were not banded (because they do not have a band number) # The total number of unbanded individuals per year per site will have to be added subsequently # Change all positive captures to equal 1 # Want to get accurate count of number of inidividuals, excluding recaptures # If entries > 1 were not modified, we would get an inflated estimate of abundance (1 individual getting counted multiple times) # This function changes data to presence/absence (1 = presence, 0 = absence) fx01 <- function(x) { ifelse(x > 0, 1, 0) } # Apply function to all count data cast.by.id[,-c(1:3)] <- apply(cast.by.id[,-c(1:3)], 2, fx01) # Sum across rows to calculate total number of individuals captured in each location id.tot <- apply(cast.by.id[,-c(1:3)], 1, sum) # Create table to store year, location, session, and total number of unique individuals ann.tot.id <- cast.by.id[,c("year","Location","Session")] ann.tot.id$id.tot <- id.tot ann.tot.id$Session <- gsub("[ ][01][01-9]","",ann.tot.id$Session)
Plot Raw counts of total
ggplot2::ggplot(data = ann.tot.id, aes(y = id.tot, x = year, color = Session)) + geom_point() + geom_line() + facet_grid(Location~Session)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.