library(gblincoln)
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Goal of this vignette

This vignette aims to give an overview of how to use the gblincoln package by:

Package requirements

This package depends on the tidyverse and ggplot2 libraries.

Basic concepts regarding Lincoln estimations

Lincoln estimates are calculated by using the number of banded birds and the number of recovered bands from hunting for a given year and species. Coupled with a probability of reporting the band data for that year, this allows us to estimate a harvest rate for that year. This rate is then applied to harvest numbers for that year and species to give an estimate of population abundance. The principles and equations used in this package are described in Alisauskas et al., (2014)

Data requirements

Data from four (4) sources are required to perform the estimations:

All datasets must have a column labeled either b.year or B.Year to be able to be linked together.

Actions performed

The main workflow for calculating Lincoln estimates for a given species is as follows:

Quick start

This section provides a quick example on how to use the gblincoln package. More information on how each task works can be found by clicking on each section.

Note that all relevant code portions of this example can be found in the ATBR_example.R file found at the root of the package.

For this example, we will be using datasets provided with the package. For more detailed information about these datasets please see this section.

Therefore to load the example data, all that is required is to load the package. To load your own data, please read this section.

# Clean environment and load the package
rm(list=ls())
library(gblincoln)

We need to create filters to filter our database. All filter names can be either column names from the Gamebirds database, or the corresponding renamed columns as found in the gb_colnames object. Please see the Gamebirds documentation to know what each column contains.

For our example, we will look into data of the ATBR species, banded in Nunavut and shot in the United States in the flyway 1 (atlantic flyway) between 2000 and 2019. (Note: if you need more information about location and flyways available for a given species please see this section). Also keep in mind that default, are automatically applied and can be seen in the DEFAULT_LINCOLN_FILTERS object.

Only the SPEC filter is absolutely required and must take a unique value, otherwise the estimation will not work.

filters_ATBR <-
  list(
    SPEC = "ATBR",
    b.state_name = "Nunavut",
    e.country_code = 'US',
    r.flyway_code = 1,
    b.year= 2000:2019,
    r.corrected_year=2000:2019
  )

Then, the quickest way to calculate the estimates is by calling the get_lincoln_estimates() function:

lincoln_estimates <-
  get_lincoln_estimates(
    filters = filters_ATBR,
    banding_df = gb_ATBR_banding,
    recoveries_df = gb_ATBR_recoveries,
    harvest_df = gb_ATBR_harvest,
    rho_df = gb_reporting_probas,
    harvest_correction_factor = 0.61
  )
lincoln_estimates

Note that the harvest_correction_factor is based on recommendations from Padding and Royle (2012) saying that harvest surveys are overestimated and should be corrected. The value of 0.61 (the default of the package) is the one recommended for goose species collected after 1999.

If you prefer to perform every step manually you can do the following:

Get direct recoveries (this function actually filters the databases)

dr_df <-
  get_direct_recoveries(banding_df = gb_ATBR_banding,
                        recoveries_df = gb_ATBR_recoveries,
                        filters = filters_ATBR)

Get the harvest_rate

hr_df <- get_harvest_rate(df = dr_df, rho_df = gb_reporting_probas)

Calculate the Lincoln estimates

get_lincoln_estimates(df = hr_df, harvest_df = gb_ATBR_harvest)

Package description

The gblincoln package aims to calculate Lincoln estimates for population abundance using banding data extracted from the Gamebirds database provided by the US Banding service. The gblincoln package offers functions to perform all steps required to calculate Lincoln estimates as described in the previous section. In addition, the package offers convenience functions for:

Since deciding which harvest data to use can be very dependent of the populations and species studied, this package only performs estimations for one species at a time. To perform multiple estimations, multiple filters should be defined and executed one at a time.

Loading data

All datasets should be loaded using the load_dataset() function. This function performs the following:

If the data to be loaded is not in a csv format, it is possible to load the data using the appropriate method and then call the clean_dataset() function on the resulting dataframe. The type of action taken during cleaning is determined automatically by the columns present in the dataset. Please see the help page of the clean_dataset() function for further explanations.

To load a csv file from an external path, we would write something like this:

dataset <- load_dataset("path_to_file")

Creating filters

By default, the data extracted from gamebirds is unfiltered and can contain unwanted entries. Therefore, we need to define filters to select only the relevant data. In gblincoln, filtering is done by creating a list of filters that associate the name of a column from the dataset with the desired value.

For example, let's say we want to keep only birds banded between 2010 and 2020. Banding year in the datasets is stored in the column b.year. Therefore to create this filter, we would write:

filter <- list(b.year=2010:2020) 

You might notice that b.year is not a column found in the original datasets extracted from Gamebirds and was renamed by the package. If you prefer to use the original names given by Gamebirds, it is also possible as long as the column is present in the list of columns accepted by the package. See the section on column names for more information. For instance, the original column name is actually B.Year so you could define your filter like this and achieve the same results:

filter_old_name <- list(B.Year=2010:2020)

If you want to reuse an old filter and only change one of its values or add another filtering value, you can do so by updating the list using the list_update(old, new) function that will add or replace any filter present in new to the old list.

# Replace the b.year filter and add the SPEC filter
filters <- list_update(filter, list(b.year=2009:2019, SPEC="ATBR"))
filters

Since filters will be applied in the order in which they are found, you can use the new_first argument to specify if the new filters will be added at the beginning or the end of the list. Note that all filters that are already present will remain at the same place.

# Replace the b.year filter and add the SPEC filter
filters2 <- list_update(filter, list(b.year=2009:2019, SPEC="ATBR"),
                            new_first=TRUE)
filters2

Note that to be able to calculate a Lincoln estimation, it is necessary that your filter list contains the SPEC filter with the short alpha code of the species for which the estimation is performed. This filter must take only one value, otherwise the estimation will not work.

Filter the databases

To apply the filters, you can then call the filter_database() function. Here we use the gb_ATBR_banding and gb_ATBR_recoveries databases provided with the package as an example (see here for more information)

ATBR_banding <- filter_database(gb_ATBR_banding, filters=filters)
ATBR_recoveries <- filter_database(gb_ATBR_recoveries, filters=filters)

Filtering will be done in the order in which the filter appear in the list.

Default filters

Note that by default, the filter_database() function automatically adds some filters. This include keeping only relevant columns, type of bands, type of birds, how the birds are recovered etc. The complete list of default values can be found in the following object:

DEFAULT_LINCOLN_FILTERS

Please type ?DEFAULT_LINCOLN_FILTERS for a description of each filter.

If you want to change the value of a specific filter, just add it in your filter list, it will automatically override the default value. If you do not want to use the default values, you can set use_default_filters=FALSE when calling the filter_database() function.

Also, if you want to apply you filters before the default filters, you can set the filters_first argument to TRUE.

For creating database specific filters, please see this section

Get direct recoveries

Once our filters are defined, the next step is to get all direct recoveries. In practice, this is done by calling the get_direct_recoveries() function. By default, the function assumes the databases are not filtered and will automatically call the filter_database() function described above. If you want to provide filtered databases, set the filtered argument to TRUE.

Then the function will summarize the banding and recovery data to count how many of each event happened in one year and then merge the results and compute a recovery rate.

The get_direct_recoveries() function takes the banding and recoveries databases as argument and either a list of filters with database specific filters in it, or a list of filters for the banding data and a list of filters for the recoveries data (see here for more information on database specific filters).

# This will actually filter the databases
dr_df <- get_direct_recoveries(gb_ATBR_banding, gb_ATBR_recoveries, filters=filters)

# If we provide already filtered databases
dr_df2 <- get_direct_recoveries(ATBR_banding, ATBR_recoveries, filtered=TRUE)

# The two methods are similar
all.equal(dr_df, dr_df2)

Calculate harvest rate

The next step is then to calculate the harvest rate for the species based on the recovery rate calculated in the previous step and the probability of reporting a band after harvesting a bird on a given year. Therefore, by default this function takes as arguments a dataframe created by the get_direct_recoveries() function as well as a dataframe with the reporting probabilities. If no direct recoveries dataframe is present, this function will call get_direct_recoveries() using the arguments passed as .... This means that you can provide all the arguments needed for get_direct_recoveries() directly inside get_harvest_rate(). Note that in this case, it is mandatory to provide argument names.

# We can call the function like this
hr_df <- get_harvest_rate(dr_df, gb_reporting_probas)

# Or if we did not create the direct recoveries dataframe, we can pass
# all arguments directly to the function that will create it.

hr_df2 <-
  get_harvest_rate(
    rho_df = gb_reporting_probas,
    banding_df = gb_ATBR_banding,
    recoveries_df = gb_ATBR_recoveries,
    filters = filters
  )

# The two methods are similar
all.equal(hr_df, hr_df2)

Get Lincoln estimates

Finally, we can calculate the Lincoln estimates by calling get_lincoln_estimates(). This function takes a harvest rate dataframe created with get_harvest_rate() and a harvest dataframe with the number of birds harvested each year. Just like get_harvest_rate(), if a harvest rate dataframe has not been created, it is possible to pass all arguments necessary to create it automatically.

# We can call the function like this
lincoln1 <-
  get_lincoln_estimates(df = hr_df,
                        harvest_df = gb_ATBR_harvest,
                        plot_estimates = FALSE,
                        save_estimates = FALSE)

# Or if we did not create the harvest rate dataframe, we can pass
# all arguments directly to the function that will create it.

lincoln2 <- get_lincoln_estimates(
  banding_df = gb_ATBR_banding,
  recoveries_df = gb_ATBR_recoveries,
  rho_df = gb_reporting_probas,
  filters = filters,
  harvest_df = gb_ATBR_harvest,
  plot_estimates = FALSE,
  save_estimates = FALSE
)


# The two methods are similar
all.equal(lincoln1, lincoln2)

In addition, it is possible to plot the estimates using the plot_estimates arguments, and to save them using the save_estimates flag. To select the save path, you can do it with the save_path argument

get_lincoln_estimates(df = hr_df,
                      harvest_df = gb_ATBR_harvest,
                      plot_estimates = TRUE,
                      save_estimates = TRUE,
                      save_path= ".")

Additional information

Extracting data from gamebirds

Here are the steps for extracting data from gamebirds. We will show how to extract data for Atlantic brants. Note that these steps describe how to export data based on a single criteria, species. We assume that all additional filtering will be done by the package. Other filtering can be done directly in Gamebirds, however this will not be covered here and we recommend that all fitering done at export be documented and kept in mind at all time during processing.

Banding data

When launching Gamebirds, select Summarized bandings


{width=90%}


This leads to a page that displays banding information. For searching bands related to a species, click on the Find icon


{width=90%}


A new form with blank entries appears


{width=90%}


On the species line (1), in the middle entry, select ATBR (2). Once this is done, click on Perform Find (3).


{width=90%}


OPTIONAL (this step is not executed as part of this example): If we want to extract more than one species at the same time, before clicking on Perform find, click on New request (1). We can see the number is now 2 (2). We can then select another species (for example, Lesser Snow Goose LSGO). Note that the gblincoln package works with databases containing more than one species, so it might be easier to extract all target species at the same time (unless the size of the database grows too fast).


{width=90%}


Once the search is performed, the results are displayed. We can see that when selecting only ATBR records, 2072 entries are returned (1). To save them, click on Export records (2)


{width=90%}


When exporting the data, we recommend saving as an Excel (.xlsx) file. This is because it seems .csv files do not save the column names.


{width=90%}


Leave the next box empty and click on Continue...


{width=90%}


Then, when selecting columns to export. Click on Move all (1) and then Export (2).


{width=90%}


The last step now consists in converting the Excel file to csv. For that, open Excel (or another software like LibreOffice), click on Save as and then select .csv.


{width=90%}

Recoveries data

Exporting recoveries data is very similar to exporting banding data. The first major difference is the database selection on the main menu. Now, we select Encounters


{width=90%}


The encounters screen display both the banding data and the encounter data. To select recordings of interest, click on Find like precedently.


{width=90%}


In the filter form, we select the species in the middle input (1), choose ATBR (2) and then click on Perform Find (3)


{width=90%}


We can see that 14521 records have been found. Now we can export them just like banding data.


{width=90%}


The rest of the export process is the same as for the banding data. The only thing different will be that when you select the columns, new columns specific to the recoveries will be present.


{width=90%}

Provided datasets

The gblincoln packages comes with ready to use datasets used in the examples of this vignette. The package provided an example of the four (4) datasets required for calculating Lincoln estimates for a North American population of Atlantic brants.

Here are the name of the variables containing them:

All these datasets come from raw data loaded using the load_dataset() function as described in this section.

# Banding data
str(gb_ATBR_banding)
# Reporting probabilities
head(gb_reporting_probas)
# Harvest data
head(gb_ATBR_harvest)

Column names

In addition to the datasets, the gblincoln packages also comes with an dataframe mapping columns names from Gamebirds to the one used in the package. Columns from Gamebirds were renamed for ease of use and consistency.

This object contains only 2 columns:

As described in the filtering section, it is possible to create filters based on their Gamebirds name. Keep in mind though that when loading dataframes, R can change the name of the columns, for instance by replacing all spaces with dots by default.

#### Add your own column names

If you want to provide a custom list of columns, you can do so when calling the filter_database() function by passing it as the columns argument. Note that this argument still must be a dataframe with two columns named old_colnames and new_colnames.

IMPORTANT: By creating your custom column names, we cannot guarantee the package will continue to work as some columns might be called by name. These include, but are not limited to:

Recovery year correction

Please note that by default recovery years are corrected when calculating Lincoln estimates. If the recovery happens before month 4 (April), then the recovery is considered to happen the year before. If you do not want to correct the recovery years, when loading recovery data, set the correct_recovery_years argument of the load_dataset() function to FALSE.

Species locations

gblincoln offers convenience function to know where a target species can be found. Especially, it allows the user to find information about country, state and flyways. If present, these function will return information about banding and recovery data.

Four (4) functions are offered, with 3 being just convenience shortcuts of the main one:

get_species_locations(gb_ATBR_banding, "ATBR", sort_by = "b.country_name")

get_species_countries(gb_ATBR_banding, "ATBR", return_codes=FALSE)
get_species_flyways(gb_ATBR_recoveries, "ATBR")
get_species_states(gb_ATBR_banding, "ATBR", return_names = FALSE)

Database specific filters

If columns are present in both datasets, and you desire a different set of filters for the banding dataset and the recoveries dataset, we propose two methods:

default_filter <- list(SPEC="ATBR") # Common filters that can be applied to all datasets
banding_filter <- list_update(default_filter, list(b.year=2010:2019))
recoveries_filter <- list_update(default_filter, list(r.corrected_year=2010:2019))

The only problem with this approach is that you need to manipulate several objects.

 filters <- list(SPEC="ATBR", banding_filters=list(b.year=2010:2019),
                 recoveries_filters = list(r.corrected_year=2010:2019))

If you choose to use the second method, you MUST specify the database type using the db_type option when calling the filter_database() function, otherwise these filters will not be applied. Please refer to the help page of the function for accepted values of db_type.

filtered_db <- filter_database(gb_ATBR_banding, filters=filters, db_type="b")

Also note that filters defined using special database keywords will override every other existing filters present. Therefore special care must be taken when updating a filter list containing these filters.

For instance, let's say we want to create a filter based on the filters object we defined earlier where we want to filter the banding years for all datasets.

The method below will not work for the banding dataset. This is because a b.year filter exists in the banding_filters option. Therefore, this value will be used for the banding dataset.

list_update(filters, list(b.year=2015:2019))

This is the way to do it:

list_update(filters, list(
  b.year = 2015:2019,
  banding_filters = list(b.year = 2015:2019)
))

In this case, it might be better to create separate filters for each database.

Comparing two filters

Sometimes it could be interesting to compare two filters to see how they influence the estimations. For examples, we could try to see if including bands with geolocators influence the estimations of harvest rates.

Defining the filters

We will define two filters: one with all band types, and one without geolocators.

# Filters with all band types
filters_ATBR_all <- filters_ATBR <-
  list(
    SPEC = "ATBR",
    b.state_name = "Nunavut",
    e.country_code = 'US',
    r.flyway_code = 1,
    b.year = 2000:2019,
    r.corrected_year = 2000:2019
  )

# Geolocators were only deployed in 2018:2019. So we remove them only these years
# for the comparison
filters_ATBR_nogeo <-
  list_update(filters_ATBR, list(add_info = c(00, 01, 07), b.year = 2018:2019))

Compare and plot

We can then compare how the two filters influence the estimation of the harvest rate using the compare_filters() function. This function takes 2 dataframes calcualted by get_harvest_rate() as input or 2 sets of filters and the required databases.

By default, this function plots the harvest rates and check if there is an overlap

res <-
  compare_harvest_rates(
    filters1 = filters_ATBR_all,
    filters2 = filters_ATBR_nogeo,
    banding_df = gb_ATBR_banding,
    recoveries_df = gb_ATBR_recoveries,
    rho_df = gb_reporting_probas
  )
# res checks if the confidence levels of the harvest rates overlap for all values
res # All confidence levels overlap, so there should not be a problem to use geolocators


Vin985/gblincoln documentation built on April 21, 2022, 1:49 a.m.