library(gblincoln) knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
This vignette aims to give an overview of how to use the gblincoln
package
by:
This package depends on the tidyverse
and ggplot2
libraries.
Lincoln estimates are calculated by using the number of banded birds and the number of recovered bands from hunting for a given year and species. Coupled with a probability of reporting the band data for that year, this allows us to estimate a harvest rate for that year. This rate is then applied to harvest numbers for that year and species to give an estimate of population abundance. The principles and equations used in this package are described in Alisauskas et al., (2014)
Data from four (4) sources are required to perform the estimations:
All datasets must have a column labeled either b.year
or B.Year
to
be able to be linked together.
The main workflow for calculating Lincoln estimates for a given species is as follows:
This section provides a quick example on how to use the gblincoln
package.
More information on how each task works can be found by clicking on each section.
Note that all relevant code portions of this example can
be found in the ATBR_example.R
file found at the root of the package.
For this example, we will be using datasets provided with the package. For more detailed information about these datasets please see this section.
Therefore to load the example data, all that is required is to load the package. To load your own data, please read this section.
# Clean environment and load the package rm(list=ls()) library(gblincoln)
We need to create filters to filter our database. All filter
names can be either column names from the Gamebirds database,
or the corresponding renamed columns as found in the gb_colnames
object. Please
see the Gamebirds documentation to know what each column contains.
For our example, we will look into data of the ATBR species, banded in
Nunavut and shot in the United States in the flyway 1 (atlantic flyway) between
2000 and 2019. (Note: if you need more information about location and flyways
available for a given species please see this section).
Also keep in mind that default, are automatically applied
and can be seen in the DEFAULT_LINCOLN_FILTERS
object.
Only the SPEC
filter is absolutely required and must take a unique value,
otherwise the estimation will not work.
filters_ATBR <- list( SPEC = "ATBR", b.state_name = "Nunavut", e.country_code = 'US', r.flyway_code = 1, b.year= 2000:2019, r.corrected_year=2000:2019 )
Then, the quickest way to calculate the estimates is by calling the
get_lincoln_estimates()
function:
lincoln_estimates <- get_lincoln_estimates( filters = filters_ATBR, banding_df = gb_ATBR_banding, recoveries_df = gb_ATBR_recoveries, harvest_df = gb_ATBR_harvest, rho_df = gb_reporting_probas, harvest_correction_factor = 0.61 ) lincoln_estimates
Note that the harvest_correction_factor
is based on recommendations from
Padding and Royle (2012) saying that harvest surveys are overestimated and
should be corrected.
The value of 0.61 (the default of the package) is the one recommended for goose
species collected after 1999.
If you prefer to perform every step manually you can do the following:
Get direct recoveries (this function actually filters the databases)
dr_df <- get_direct_recoveries(banding_df = gb_ATBR_banding, recoveries_df = gb_ATBR_recoveries, filters = filters_ATBR)
hr_df <- get_harvest_rate(df = dr_df, rho_df = gb_reporting_probas)
Calculate the Lincoln estimates
get_lincoln_estimates(df = hr_df, harvest_df = gb_ATBR_harvest)
The gblincoln
package aims to calculate Lincoln estimates for population abundance
using banding data extracted from the Gamebirds database provided by the
US Banding service.
The gblincoln package offers functions to perform all steps required to calculate
Lincoln estimates as described in the previous section. In addition, the
package offers convenience functions for:
Since deciding which harvest data to use can be very dependent of the populations and species studied, this package only performs estimations for one species at a time. To perform multiple estimations, multiple filters should be defined and executed one at a time.
All datasets should be loaded using the load_dataset()
function.
This function performs the following:
If the data to be loaded is not in a csv format, it is possible to load the data
using the appropriate method and then call the clean_dataset()
function on the
resulting dataframe.
The type of action taken during cleaning is determined automatically by the columns present
in the dataset. Please see the help page of the clean_dataset()
function for
further explanations.
To load a csv file from an external path, we would write something like this:
dataset <- load_dataset("path_to_file")
By default, the data extracted from gamebirds is unfiltered and can contain
unwanted entries. Therefore, we need to define filters to select only the
relevant data.
In gblincoln
, filtering is done by creating a list of filters that associate
the name of a column from the dataset with the desired value.
For example, let's say we want to keep only birds banded between 2010 and 2020.
Banding year in the datasets is stored in the column b.year
. Therefore to create
this filter, we would write:
filter <- list(b.year=2010:2020)
You might notice that b.year
is not a column found in the original datasets
extracted from Gamebirds and was renamed by the package. If you prefer
to use the original names given by Gamebirds, it is also possible as long as the
column is present in the list of columns accepted by the package. See the
section on column names for more information.
For instance, the original column name is actually B.Year
so you
could define your filter like this and achieve the same results:
filter_old_name <- list(B.Year=2010:2020)
If you want to reuse an old filter and only change one of its values or add
another filtering value, you can do so by updating the list using the
list_update(old, new)
function that will add or replace any filter present
in new
to the old
list.
# Replace the b.year filter and add the SPEC filter filters <- list_update(filter, list(b.year=2009:2019, SPEC="ATBR")) filters
Since filters will be applied in the order in which they are found, you can use
the new_first
argument to specify if the new filters will be added at the
beginning or the end of the list. Note that all filters that are already present
will remain at the same place.
# Replace the b.year filter and add the SPEC filter filters2 <- list_update(filter, list(b.year=2009:2019, SPEC="ATBR"), new_first=TRUE) filters2
Note that to be able to calculate a Lincoln estimation, it is necessary
that your filter list contains the SPEC
filter with the short alpha code of the
species for which the estimation is performed. This filter must take only one
value, otherwise the estimation will not work.
To apply the filters, you can then call the filter_database()
function. Here
we use the gb_ATBR_banding
and gb_ATBR_recoveries
databases provided with
the package as an example (see here for more information)
ATBR_banding <- filter_database(gb_ATBR_banding, filters=filters) ATBR_recoveries <- filter_database(gb_ATBR_recoveries, filters=filters)
Filtering will be done in the order in which the filter appear in the list.
Note that by default, the filter_database()
function automatically adds some
filters. This include keeping only relevant columns, type of bands,
type of birds, how the birds are recovered etc.
The complete list of default values can be found in the following object:
DEFAULT_LINCOLN_FILTERS
Please type ?DEFAULT_LINCOLN_FILTERS
for a description of each filter.
If you want to change the value of a specific filter, just add it
in your filter list, it will automatically override the default value.
If you do not want to use the default values, you can set
use_default_filters=FALSE
when calling the filter_database()
function.
Also, if you want to apply you filters before the default filters, you can
set the filters_first
argument to TRUE
.
For creating database specific filters, please see this section
Once our filters are defined, the next step is to get all direct recoveries. In
practice, this is done by calling the get_direct_recoveries()
function.
By default, the function assumes the databases are not filtered and will
automatically call the filter_database()
function described above. If you
want to provide filtered databases, set the filtered
argument to TRUE.
Then the function will summarize the banding and recovery data to count how many of each event happened in one year and then merge the results and compute a recovery rate.
The get_direct_recoveries()
function takes the banding and recoveries databases
as argument and either a list of filters with database specific filters in it,
or a list of filters for the banding data and a list of filters for the recoveries
data
(see here for more information on database specific filters).
# This will actually filter the databases dr_df <- get_direct_recoveries(gb_ATBR_banding, gb_ATBR_recoveries, filters=filters) # If we provide already filtered databases dr_df2 <- get_direct_recoveries(ATBR_banding, ATBR_recoveries, filtered=TRUE) # The two methods are similar all.equal(dr_df, dr_df2)
The next step is then to calculate the harvest rate for the species based on
the recovery rate calculated in the previous step and the probability of
reporting a band after harvesting a bird on a given year.
Therefore, by default this function takes as arguments a dataframe created by
the get_direct_recoveries()
function as well as a dataframe with the reporting
probabilities.
If no direct recoveries dataframe is present, this function will call
get_direct_recoveries()
using the arguments passed as ...
. This means that
you can provide all the arguments needed for get_direct_recoveries()
directly
inside get_harvest_rate()
. Note that in this case, it is mandatory to provide
argument names.
# We can call the function like this hr_df <- get_harvest_rate(dr_df, gb_reporting_probas) # Or if we did not create the direct recoveries dataframe, we can pass # all arguments directly to the function that will create it. hr_df2 <- get_harvest_rate( rho_df = gb_reporting_probas, banding_df = gb_ATBR_banding, recoveries_df = gb_ATBR_recoveries, filters = filters ) # The two methods are similar all.equal(hr_df, hr_df2)
Finally, we can calculate the Lincoln estimates by calling
get_lincoln_estimates()
. This function takes a harvest rate dataframe created
with get_harvest_rate()
and a harvest dataframe with the number of birds
harvested each year. Just like get_harvest_rate()
, if a harvest rate dataframe
has not been created, it is possible to pass all arguments necessary to create
it automatically.
# We can call the function like this lincoln1 <- get_lincoln_estimates(df = hr_df, harvest_df = gb_ATBR_harvest, plot_estimates = FALSE, save_estimates = FALSE) # Or if we did not create the harvest rate dataframe, we can pass # all arguments directly to the function that will create it. lincoln2 <- get_lincoln_estimates( banding_df = gb_ATBR_banding, recoveries_df = gb_ATBR_recoveries, rho_df = gb_reporting_probas, filters = filters, harvest_df = gb_ATBR_harvest, plot_estimates = FALSE, save_estimates = FALSE ) # The two methods are similar all.equal(lincoln1, lincoln2)
In addition, it is possible to plot the estimates using the plot_estimates
arguments, and to save them using the save_estimates
flag. To select the save
path, you can do it with the save_path
argument
get_lincoln_estimates(df = hr_df, harvest_df = gb_ATBR_harvest, plot_estimates = TRUE, save_estimates = TRUE, save_path= ".")
Here are the steps for extracting data from gamebirds. We will show how to extract data for Atlantic brants. Note that these steps describe how to export data based on a single criteria, species. We assume that all additional filtering will be done by the package. Other filtering can be done directly in Gamebirds, however this will not be covered here and we recommend that all fitering done at export be documented and kept in mind at all time during processing.
When launching Gamebirds, select Summarized bandings
{width=90%}
This leads to a page that displays banding information. For searching bands related to a species, click on the Find icon
{width=90%}
A new form with blank entries appears
{width=90%}
On the species line (1), in the middle entry, select ATBR (2). Once this is done, click on Perform Find (3).
{width=90%}
OPTIONAL (this step is not executed as part of this example): If we want to
extract more than one species at the same time, before clicking on
Perform find, click on New request (1). We can see
the number is now 2 (2). We can then select
another species (for example, Lesser Snow Goose LSGO). Note that the gblincoln
package works with databases containing more than one species, so it might be
easier to extract all target species at the same time (unless the size of
the database grows too fast).
{width=90%}
Once the search is performed, the results are displayed. We can see that when selecting only ATBR records, 2072 entries are returned (1). To save them, click on Export records (2)
{width=90%}
When exporting the data, we recommend saving as an Excel (.xlsx) file. This is because it seems .csv files do not save the column names.
{width=90%}
Leave the next box empty and click on Continue...
{width=90%}
Then, when selecting columns to export. Click on Move all (1) and then Export (2).
{width=90%}
The last step now consists in converting the Excel file to csv. For that, open Excel (or another software like LibreOffice), click on Save as and then select .csv.
{width=90%}
Exporting recoveries data is very similar to exporting banding data. The first major difference is the database selection on the main menu. Now, we select Encounters
{width=90%}
The encounters screen display both the banding data and the encounter data. To select recordings of interest, click on Find like precedently.
{width=90%}
In the filter form, we select the species in the middle input (1), choose ATBR (2) and then click on Perform Find (3)
{width=90%}
We can see that 14521 records have been found. Now we can export them just like banding data.
{width=90%}
The rest of the export process is the same as for the banding data. The only thing different will be that when you select the columns, new columns specific to the recoveries will be present.
{width=90%}
The gblincoln
packages comes with ready to use datasets used in the examples
of this vignette. The package provided an example of the four (4)
datasets required for calculating Lincoln estimates for a North American
population of Atlantic brants.
Here are the name of the variables containing them:
gb_ATBR_banding
. This contains the banding data for
Atlantic brants extracted from Gamebirds.gb_ATBR_recovery
. This contains the banding data for
Atlantic brants extracted from Gamebirds. gb_reporting_probas
. This contains the reporting
probabilities and their standard deviation between 1976 and 2019 estimated
using the model created for mallard ducks in Arnold et al,. (2020).gb_ATBR_harvest
. Estimation of United States harvest for
Atlantic Brants between 2000 and 2019 in the Atlantic flyway.All these datasets come from raw data loaded using the load_dataset()
function as described in this section.
# Banding data str(gb_ATBR_banding) # Reporting probabilities head(gb_reporting_probas) # Harvest data head(gb_ATBR_harvest)
In addition to the datasets, the gblincoln
packages also comes with an
dataframe mapping columns names from Gamebirds to the one used in the package.
Columns from Gamebirds were renamed for ease of use and consistency.
This object contains only 2 columns:
old_colnames
: Columns as extracted by Gamebirdsnew_colnames
: Columns used in the gblincoln package.As described in the filtering section, it is possible to create filters based on their Gamebirds name. Keep in mind though that when loading dataframes, R can change the name of the columns, for instance by replacing all spaces with dots by default.
#### Add your own column names
If you want to provide a custom list of columns, you can do so when calling
the filter_database()
function by passing it as the columns
argument.
Note that this argument still must be a dataframe with two columns named
old_colnames
and new_colnames
.
IMPORTANT: By creating your custom column names, we cannot guarantee the package will continue to work as some columns might be called by name. These include, but are not limited to:
b.year
SPEC
Please note that by default recovery years are corrected when calculating Lincoln
estimates. If the recovery happens before month 4 (April), then the recovery
is considered to happen the year before.
If you do not want to correct the recovery years, when loading recovery data,
set the correct_recovery_years
argument of the load_dataset()
function to
FALSE.
gblincoln
offers convenience function to know where a target species can be
found. Especially, it allows the user to find information about country, state
and flyways. If present, these function will return information about banding
and recovery data.
Four (4) functions are offered, with 3 being just convenience shortcuts of the main one:
get_species_locations
: The main function. Takes a dataframe, a type of data
to return ("country", "state" or "flyway"). By default this function returns
names and codes (as recorded in Gamebirds). The results are sortable by a given
column.get_species_countries
: Shortcut to return only information about the countriesget_species_states
: Shortcut to return only information about the statesget_species_flyways
: Shortcut to return only information about the flywaysget_species_locations(gb_ATBR_banding, "ATBR", sort_by = "b.country_name") get_species_countries(gb_ATBR_banding, "ATBR", return_codes=FALSE) get_species_flyways(gb_ATBR_recoveries, "ATBR") get_species_states(gb_ATBR_banding, "ATBR", return_names = FALSE)
If columns are present in both datasets, and you desire a different set of filters for the banding dataset and the recoveries dataset, we propose two methods:
default_filter <- list(SPEC="ATBR") # Common filters that can be applied to all datasets banding_filter <- list_update(default_filter, list(b.year=2010:2019)) recoveries_filter <- list_update(default_filter, list(r.corrected_year=2010:2019))
The only problem with this approach is that you need to manipulate several objects.
banding_filters
,
recoveries_filter
, rho_filters
and harvest_filters
. The following code
would give the same result than the code described above:filters <- list(SPEC="ATBR", banding_filters=list(b.year=2010:2019), recoveries_filters = list(r.corrected_year=2010:2019))
If you choose to use the second method, you MUST specify the database type
using the db_type
option when calling the filter_database()
function,
otherwise these filters will not be applied. Please refer to the help page of
the function for accepted values of db_type
.
filtered_db <- filter_database(gb_ATBR_banding, filters=filters, db_type="b")
Also note that filters defined using special database keywords will override every other existing filters present. Therefore special care must be taken when updating a filter list containing these filters.
For instance, let's say we want to create a filter based on the filters
object
we defined earlier where we want to filter the banding years for all datasets.
The method below will not work for the banding dataset. This is because
a b.year
filter exists in the banding_filters
option. Therefore, this value
will be used for the banding dataset.
list_update(filters, list(b.year=2015:2019))
This is the way to do it:
list_update(filters, list( b.year = 2015:2019, banding_filters = list(b.year = 2015:2019) ))
In this case, it might be better to create separate filters for each database.
Sometimes it could be interesting to compare two filters to see how they influence the estimations. For examples, we could try to see if including bands with geolocators influence the estimations of harvest rates.
We will define two filters: one with all band types, and one without geolocators.
# Filters with all band types filters_ATBR_all <- filters_ATBR <- list( SPEC = "ATBR", b.state_name = "Nunavut", e.country_code = 'US', r.flyway_code = 1, b.year = 2000:2019, r.corrected_year = 2000:2019 ) # Geolocators were only deployed in 2018:2019. So we remove them only these years # for the comparison filters_ATBR_nogeo <- list_update(filters_ATBR, list(add_info = c(00, 01, 07), b.year = 2018:2019))
We can then compare how the two filters influence the estimation of the harvest
rate using the compare_filters()
function. This function takes 2 dataframes
calcualted by get_harvest_rate()
as input or 2 sets of filters and the required
databases.
By default, this function plots the harvest rates and check if there is an overlap
res <- compare_harvest_rates( filters1 = filters_ATBR_all, filters2 = filters_ATBR_nogeo, banding_df = gb_ATBR_banding, recoveries_df = gb_ATBR_recoveries, rho_df = gb_reporting_probas ) # res checks if the confidence levels of the harvest rates overlap for all values res # All confidence levels overlap, so there should not be a problem to use geolocators
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.