This packages provides functionality to access, download, and manipulate data from the Coordinated Waterbird Counts Project. It is possible to download these same data using the CWAC API, but being able to pull these data straight into R, in a standard format, should make them more accessible, easier to analyse, and eventually make our analyses more reliable and reproducible.
There is another package named
ABAP
that provides similar
functionality, but in this case to download count data from the African
Bird Atlas Project. In addition, there is a companion package the
ABDtools
package, which
adds the functionality necessary to annotate different data formats
(points and polygons) with environmental information from the Google
Earth Engine data
catalog.
To install CWAC
from GitHub using the
remotes package, run:
install.packages("remotes")
remotes::install_github("AfricaBirdData/CWAC")
There are two possible ways of downloading data from CWAC. We can select a site and download data for all species in that site or, vice versa, we can select a species and download all data for that species across all sites. In this section, we will explore the former and download all data associated with a CWAC site.
Let’s use Barberspan, a CWAC site located in the North West province of
South Africa as an example. The first thing we need to do is find our
site’s ID code. We can use the function listCwacSites
to list all the
sites in the North West and find this code.
library(CWAC)
# List all sites at the North West province
nw_sites <- listCwacSites(.region_type = "province", .region = "North West")
# Find the code for Barberspan
site_id <- nw_sites[nw_sites$LocationName == "Barberspan", "LocationCode", drop = TRUE]
# We can find more info about this site with getCwacSiteInfo()
getCwacSiteInfo(site_id)
Once we have the code for the CWAC site, we can download count data
collected there using the function getCwacSiteCounts
. This will
download all the CWAC cards submitted for Barberspan.
bp_counts <- getCwacSiteCounts(site_id)
We can do all of this using a dplyr approach.
library(dplyr, warn.conflicts = FALSE)
bp_counts <- listCwacSites(.region_type = "province",
.region = "North West") %>%
filter(LocationName == "Barberspan") %>%
pull("LocationCode") %>%
getCwacSiteCounts()
We might not be interested in any particular site, but in all records of certain species. We can follow a similar procedure: 1) find the species code, and 2) download the data.
To find the species code we can use the function listCwacSpp
, which
lists all CWAC species, and from this list we can find the one we want.
Say we are interested in the African Black Duck.
# List all species
spp_all <- listCwacSpp()
# Find the code for Black Duck
sp_id <- spp_all[spp_all$Common_species == "African Black" &
spp_all$Common_group == "Duck",
"SppRef", drop = TRUE]
Once we have the code, we can download the data for this species across
all CWAC sites with the function getCwacSppCounts()
bd_counts <- getCwacSppCounts(sp_id)
You may get some warnings from the parser because data weren’t entered in the standard format they were supposed to. We preferred printing these messages, so that you can make sure your analysis won’t be impacted by this.
Again, we can also do all this using dplyr.
library(dplyr, warn.conflicts = FALSE)
bd_counts <- listCwacSpp() %>%
filter(Common_species == "African Black",
Common_group == "Duck") %>%
pull("SppRef") %>%
getCwacSppCounts()
With the function getCwacSiteBoundary
we can download the polygons
enclosing CWAC sites. Please, note that these polygons are not always up
to date or even present for all sites in the database and therefore, we
should always check that they are accurate after downloading.
getCwacSiteBoundary
can retrieve boundaries for multiple sites at
once, in a single API call, making the process much more efficient. It
is always a good idea to retrieve all sites in a single call instead of
one at a time.
# Download our counts for the Black Duck just in case they got lost
counts <- listCwacSpp() %>%
filter(Common_species == "African Black",
Common_group == "Duck") %>%
pull("SppRef") %>%
getCwacSppCounts()
# Then let's extract the boundaries of the CWAC sites in our data
# At the moment getCwacSiteBoundary() can only retrieve boundaries from sites of
# one province/country at a time
# First identify the countries in our data
unique(counts$Country)
# It's only South Africa at the time, so we can pull all the sites at once. Otherwise,
# we would just repeat the process for the different countries.
sites <- unique(counts$LocationCode)
boundaries <- getCwacSiteBoundary(loc_code = sites,
region_type = "country",
region = "South Africa")
# Add boundaries to the count data
counts_bd <- counts %>%
dplyr::left_join(boundaries, by = "LocationCode")
First clone the repository to your local machine:
For site owners:
There is the danger of multiple people working simultaneously on the project code. If you make changes locally on your computer and, before you push your changes, others push theirs, there might be conflicts. This is because the HEAD pointer in the main branch has moved since you started working.
To deal with these lurking issues, I would suggest opening and working on a topic branch. This is a just a regular branch that has a short lifespan. In steps:
Opening branches is quick and easy, so there is no harm in opening multiple branches a day. However, it is important to merge and delete them often to keep things tidy. Git provides functionality to deal with conflicting branches. More about branches here:
https://git-scm.com/book/en/v2/Git-Branching-Branches-in-a-Nutshell
Another idea is to use the ‘issues’ tab that you find in the project header. There, we can identify issues with the package, assign tasks and warn other contributors that we will be working on the code.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.