In AfricaBirdData/CWAC: Access to Coordinated Water Counts data

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)

CWAC

This packages provides functionality to access, download, and manipulate data from the Coordinated Waterbird Counts Project. It is possible to download these same data using the CWAC API, but being able to pull these data straight into R, in a standard format, should make them more accessible, easier to analyse, and eventually make our analyses more reliable and reproducible.

There is another package named ABAP that provides similar functionality, but in this case to download count data from the African Bird Atlas Project. In addition, there is a companion package the ABDtools package, which adds the functionality necessary to annotate different data formats (points and polygons) with environmental information from the Google Earth Engine data catalog.

INSTRUCTIONS TO INSTALL

To install CWAC from GitHub using the remotes package, run:

install.packages("remotes")
remotes::install_github("AfricaBirdData/CWAC")

DOWNLOAD DATA FOR A SITE

There are two possible ways of downloading data from CWAC. We can select a site and download data for all species in that site or, vice versa, we can select a species and download all data for that species across all sites. In this section, we will explore the former and download all data associated with a CWAC site.

Let's use Barberspan, a CWAC site located in the North West province of South Africa as an example. The first thing we need to do is find our site's ID code. We can use the function listCwacSites to list all the sites in the North West and find this code.

library(CWAC)

# List all sites at the North West province
nw_sites <- listCwacSites(.region_type = "province", .region = "North West")

# Find the code for Barberspan
site_id <- nw_sites[nw_sites$LocationName == "Barberspan", "LocationCode", drop = TRUE]

# We can find more info about this site with getCwacSiteInfo()
getCwacSiteInfo(site_id)

Once we have the code for the CWAC site, we can download count data collected there using the function getCwacSiteCounts. This will download all the CWAC cards submitted for Barberspan.

bp_counts <- getCwacSiteCounts(site_id)

We can do all of this using a dplyr approach.

library(dplyr, warn.conflicts = FALSE)

bp_counts <- listCwacSites(.region_type = "province",
                           .region = "North West") %>% 
  filter(LocationName == "Barberspan") %>% 
  pull("LocationCode") %>% 
  getCwacSiteCounts()

DOWNLOAD DATA FOR A SPECIES

We might not be interested in any particular site, but in all records of certain species. We can follow a similar procedure: 1) find the species code, and 2) download the data.

To find the species code we can use the function listCwacSpp, which lists all CWAC species, and from this list we can find the one we want. Say we are interested in the African Black Duck.

# List all species
spp_all <- listCwacSpp()

# Find the code for Black Duck
sp_id <- spp_all[spp_all$Common_species == "African Black" &
                   spp_all$Common_group == "Duck",
                 "SppRef", drop = TRUE]

Once we have the code, we can download the data for this species across all CWAC sites with the function getCwacSppCounts()

bd_counts <- getCwacSppCounts(sp_id)

You may get some warnings from the parser because data weren't entered in the standard format they were supposed to. We preferred printing these messages, so that you can make sure your analysis won't be impacted by this.

Again, we can also do all this using dplyr.

library(dplyr, warn.conflicts = FALSE)

bd_counts <- listCwacSpp() %>% 
  filter(Common_species == "African Black",
         Common_group == "Duck") %>% 
  pull("SppRef") %>% 
  getCwacSppCounts()

DOWNLOAD SITE BOUNDARIES

With the function getCwacSiteBoundary we can download the polygons enclosing CWAC sites. Please, note that these polygons are not always up to date or even present for all sites in the database and therefore, we should always check that they are accurate after downloading.

getCwacSiteBoundary can retrieve boundaries for multiple sites at once, in a single API call, making the process much more efficient. It is always a good idea to retrieve all sites in a single call instead of one at a time.

# Download our counts for the Black Duck just in case they got lost
counts <- listCwacSpp() %>% 
  filter(Common_species == "African Black",
         Common_group == "Duck") %>% 
  pull("SppRef") %>% 
  getCwacSppCounts()

# Then let's extract the boundaries of the CWAC sites in our data
# At the moment getCwacSiteBoundary() can only retrieve boundaries from sites of
# one province/country at a time

# First identify the countries in our data
unique(counts$Country)

# It's only South Africa at the time, so we can pull all the sites at once. Otherwise,
# we would just repeat the process for the different countries.
sites <- unique(counts$LocationCode)

boundaries <- getCwacSiteBoundary(loc_code = sites,
                                  region_type = "country",
                                  region = "South Africa")

# Add boundaries to the count data
counts_bd <- counts %>% 
  dplyr::left_join(boundaries, by = "LocationCode")

INSTRUCTIONS TO CONTRIBUTE CODE

First clone the repository to your local machine:

In RStudio, create a new project
In the ‘Create project’ menu, select ‘Version Control’/‘Git’
Copy the repository URL (click on the ‘Code’ green button and copy the link)
Choose the appropriate directory and ‘Create project’
Remember to pull the latest version regularly

For site owners:

There is the danger of multiple people working simultaneously on the project code. If you make changes locally on your computer and, before you push your changes, others push theirs, there might be conflicts. This is because the HEAD pointer in the main branch has moved since you started working.

To deal with these lurking issues, I would suggest opening and working on a topic branch. This is a just a regular branch that has a short lifespan. In steps:

Open a branch at your local machine
Push to the remote repo
Make your changes in your local machine
Commit and push to remote
Create pull request:
- In the GitHub repo you will now see an option that notifies of changes in a branch: click compare and pull request.
Delete the branch. When you are finished, you will have to delete the new branch in the remote repo (GitHub) and also in your local machine. In your local machine you have to use Git directly, because apparently RStudio doesn´t do it:
- In your local machine, change to master branch.
- Either use the Git GUI (go to branches/delete/select branch/push).
- Or use the console typing ‘git branch -d your_branch_name’.
- It might also be necessary to prune remote branches with 'git remote prune origin'.

Opening branches is quick and easy, so there is no harm in opening multiple branches a day. However, it is important to merge and delete them often to keep things tidy. Git provides functionality to deal with conflicting branches. More about branches here:

https://git-scm.com/book/en/v2/Git-Branching-Branches-in-a-Nutshell

Another idea is to use the ‘issues’ tab that you find in the project header. There, we can identify issues with the package, assign tasks and warn other contributors that we will be working on the code.

AfricaBirdData/CWAC documentation built on Aug. 5, 2024, 12:40 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com