Table of Contents

Introduction {#introduction}

The migbirdHIP package was created by the U.S. Fish and Wildlife Service (USFWS) to process, clean, and visualize Harvest Information Program (HIP) registration data.

Every year, migratory game bird hunters must register for HIP in the state(s) they plan to hunt by providing their name, address, date of birth, and hunting success from the previous year. All 49 contiguous states are responsible for collecting this information and submitting hunter registrations to the USFWS. Raw data submitted by the states are processed in this package.

The USFWS has been using HIP registrations since 1999 to select hunters for the Migratory Bird Harvest Survey (Diary Survey) and the Parts Collection Survey (Wing Survey). Only a small proportion of HIP registrants are chosen to complete these surveys each year. The USFWS ultimately uses these data to generate annual harvest estimates. To read more about HIP and the surveys, visit: https://www.fws.gov/harvestsurvey/ and https://www.fws.gov/library/collections/migratory-bird-hunting-activity-and-harvest-reports

Installation {#installation}

The package can be installed from the USFWS GitHub repository using:

devtools::install_github(
  "USFWS/migbirdHIP",
  quiet = T,
  upgrade = F,
  build_vignettes = T
)

Releases {#releases}

To install a past release, use the example below and substitute the appropriate version number. Releases are documented in the README.

devtools::install_github("USFWS/migbirdHIP@v1.2.8", build_vignettes = T)

Load {#load}

Load migbirdHIP after installation. The package will return a startup message with the version you have installed and what season of HIP data the package version was intended for.

library(migbirdHIP)

Functions overview {#functions-overview}

The flowchart below is a visual guide to the order in which functions are used. Some functions are only used situationally and some issues with the data cannot be solved using a function at all. The general process of handling HIP data is demonstrated in the flowchart; every exported function in the migbirdHIP package is included in it and the vignette.

knitr::include_graphics(
  paste0(here::here(), "/man/figures/migbirdHIP_flowchart.svg"))

Part A: Data Import and Cleaning {#part-a-data-import-and-cleaning}

fileRename {#filerename}

The fileRename() function renames non-standard file names to the standard format. We expect a file name to be a 2-letter capitalized state abbreviation followed by YYYYMMDD (indicating the date data were submitted).

Some states submit files using a 5-digit file name format, containing 2 letters for the state abbreviation followed by a 3-digit Julian date. To convert these 5-digit filenames to the standard format (a requirement to read data properly with read_hip()), supply fileRename() with the directory containing HIP data. File names will be automatically overwritten with the YYYYMMDD format date corresponding to the submitted Julian date.

This function also converts lowercase state abbreviations to uppercase.

The current hunting season year must be supplied to the year parameter to accurately convert dates.

fileRename(path = "C:/HIP/raw_data/DL0901", year = 2024)

fileCheck {#filecheck}

Check if any files in the input folder have already been written to the processed folder using fileCheck().

fileCheck(
  raw_path = "C:/HIP/raw_data/DL0901/",
  processed_path = "C:/HIP/corrected_data/"
)

read_hip {#read_hip}

Read HIP data from fixed-width .txt files using read_hip(). Files must adhere to a 10-character naming convention to successfully be read in (2-letter capitalized state abbreviation followed by YYYYMMDD); if files were submitted with a 5-digit format or lowercase state abbreviation, run fileRename first.

Note: In the read_hip() example below, we use an internal directory to read in fake HIP data files, which allows the function to demonstrate its errors and messages. (The testing data were generated with data-raw/create_fake_HIP_data.R and are stored under inst/extdata/.) All other examples in this vignette will use C:/ drive examples for clarity.

raw_data <- read_hip(paste0(here::here(), "/inst/extdata/DL0901/"))

The read_hip() function allows data to be read in for:

Use unique = TRUE to read in a frame without exact duplicates, or unique = FALSE to read in all registrations including exact duplicates.

Data are read in an expected format beginning with the title field and ending with email field. Additional fields are created by read_hip(): dl_state, dl_date, source_file, dl_cycle, dl_key, record_key.

The read_hip() function returns a message if:

As an example, we will use the default read_hip() settings to read in all of the states from download 0901, using fake HIP data files located in the extdata/DL0901/ directory of the R package.

glyphCheck {#glyphcheck}

During pre-processing, R may throw an error that says something like "invalid UTF-8 byte sequence detected". The error usually includes a field name but no other helpful information. The glyphCheck() function identifies values containing non-UTF-8 glyphs/characters and prints them with the source file in the console so they can be edited.

glyphCheck(raw_data)

shiftCheck {#shiftcheck}

Find and print any rows that have a line shift error with shiftCheck().

shiftCheck(raw_data)

clean {#clean}

The clean() function performs data cleaning and filters out bad registrations.

clean_data <- clean(raw_data)

Registrations are dropped if:

Changes include:

In addition to the changes listed above, the internal zipCheck() function returns a message from clean() if zip codes are detected that do not correspond to provided state of residence for >10% of a file and/or >100 registrations.

issueCheck {#issuecheck}

The issueCheck() function assesses the validity of the issue_date field based on the current hunting season's HIP issuance start and end dates (not season open and close dates) for every state except Mississippi. A plot is automatically returned for past and future registrations; to skip plotting, specify plot = F.

current_data <- issueCheck(clean_data, year = 2024, plot = F)
current_data <- clean_data

Criteria evaluated and changes made:

duplicateFinder {#duplicatefinder}

The duplicateFinder() function finds hunters that have more than one registration. Registrations are grouped by firstname, lastname, state, birth_date, dl_state, and registration_yr to identify unique hunters. If the same hunter has 2 or more registrations, the fields that are not identical are counted and summarized.

Plot the duplicates with duplicatePlot.

duplicateFinder(current_data)

duplicateFix {#duplicatefix}

We sometimes receive multiple HIP registrations per person which must be resolved by duplicateFix(). Duplicates are identified when more than one registration has the same firstname, lastname, state, birth_date, dl_state, and registration_yr. Only 1 HIP registration per hunter can be kept. For in-line permit states (r unique(migbirdHIP:::REF_PMT_INLINE$dl_state)), permit records are submitted separately from HIP registrations. Multiple permits are allowed. We differentiate HIP and PMT records from in-line permit states, we check the values in the non-permit species fields (ducks_bag, geese_bag, dove_bag, woodcock_bag, coots_snipe, rails_gallinules). HIP registrations contain non-zero values in those columns, but permit records always have 0 values.

deduplicated_data <- duplicateFix(current_data)

To decide which HIP registration to keep from a group, we follow a series of logic.

For sea duck and brant states:

These states include: r paste0(migbirdHIP:::REF_STATES_SD_BR, collapse = ", ")

  1. Keep registration(s) with the most recent issue date.
  2. Exclude registrations with all 1s or all 0s in bag columns from consideration.
  3. Keep any registrations that have a 2 for either brant or sea duck (for seaduck and brant states), or 2 for seaduck (Maine only).
  4. If more than one registration remains, choose to keep one randomly.

For HIP registrations from in-line permit states and all other states:

These states include: r paste0(migbirdHIP:::REF_ABBR_49_STATES[!migbirdHIP:::REF_ABBR_49_STATES %in% c(migbirdHIP:::REF_STATES_SD_BR, migbirdHIP:::REF_STATES_SD_ONLY)], collapse = ", ")

  1. Keep registration(s) with the most recent issue date.
  2. Exclude registrations with all 1s or all 0s in bag columns from consideration, if possible.
  3. If more than one registration remains, choose to keep one randomly.

A new field called record_type is added to the data after the above deduplicating process. Every HIP registration is labeled HIP. In-line permit states WA and OR send HIP and permit records separately, which are labeled HIP and PMT respectively.

bagCheck {#bagcheck}

Running bagCheck() ensures species "bag" values are in order. This function searches for values in species group columns that are not typical or expected by the US Fish and Wildlife Service. If a value outside of the normal range is detected, an output tibble is created. Each row in the output contains the state, species, unusual stratum value, and a list of the normal values we would expect.

If a value for a species group is given in the HIP data that doesn't match anything in our records, the species reported in the output will have NA values in the normal_strata column. These species are not hunted in the reported states.

bagCheck(deduplicated_data)

proof {#proof}

After data are cleaned and checked in the steps above, we run proof() to check for errors. Note that no actual corrections or data changes take place as a result of this function. The year of the Harvest Information Program must be supplied to the year parameter, which aids in checking registration_yr, issue_date and birth_date.

Values that are considered irregular are flagged in a new field called errors. For each existing field (r paste(migbirdHIP:::REF_FIELDS_ALL, collapse = ", ")), values are compared to standard expected formats. If they do not conform, the field name is pasted as a string in the errors column. Each row can have anywhere from zero errors (NA) to all field names listed. Multiple errors per registration are hyphen delimited. A typical errors value would look like: "suffix-address-zip", "zip", or "middle-email".

proofed_data <- proof(deduplicated_data, year = 2024)

What gets flagged in the errors field and why:

correct {#correct}

Data can be corrected by running the correct() function with year of the Harvest Information Program supplied to the year parameter.

corrected_data <- correct(proofed_data, year = 2024)

Changes and corrections include:

Part B: Data Visualization and Tabulation {#part-b-data-visualization-and-tabulation}

duplicatePlot {#duplicateplot}

Plot duplicates with duplicatePlot().

duplicatePlot(current_data)

errorPlotFields {#errorplotfields}

The errorPlotFields() function can be run on all states...

errorPlotFields(proofed_data, loc = "all", year = 2024)

... or it can be limited to just one.

errorPlotFields(proofed_data, loc = "LA", year = 2024)

It is possible to add any ggplot2 components to these plots. A plot can be altered with facet_wrap using either dl_cycle or dl_date. The example below demonstrates how this package's functions can interact with tidyverse and shows an example of an errorPlotFields with facet_wrap (using a subset of 4 download cycles).

errorPlotFields(
  hipdata2024 |>
    filter(str_detect(dl_cycle, "0800|0901|0902|1001")),
    year = 2024) +
  theme(
    axis.text.x = element_text(angle = 90, vjust = 0, hjust = 1),
    legend.position = "bottom") +
  facet_wrap(~dl_cycle, ncol = 2)

errorPlotStates {#errorplotstates}

The errorPlotStates() function plots error proportions per state. A threshold value must be set to only view states above a certain proportion of error (in the example below, threshold = 0.05 indicates an error tolerance of 5%). Bar labels are error counts.

errorPlotStates(proofed_data, threshold = 0.05)

errorPlotDL {#errorplotdl}

This function should not be used unless you want to visualize an entire season of data. The errorPlotDL() function plots proportion of error per download cycle over the course of the hunting season. Location may be specified with the loc parameter to see a particular state over time.

errorPlotDL(hipdata2024, loc = "MI")
knitr::include_graphics(paste0(here::here(), "/man/figures/errorPlot_dl.png"))

pullErrors {#pullerrors}

The pullErrors() function can be used to view all of the actual values that were flagged as errors in a particular field. In this example, we find that the suffix field contains several values that are not accepted.

pullErrors(proofed_data, field = "suffix")

Running pullErrors() on a field that has no errors will return a message.

pullErrors(proofed_data, field = "dove_bag")

errorTable {#errortable}

The errorTable() function returns error data as a tibble, which can be assessed as needed or exported to create records of download cycle errors. The basic function reports errors by both location and field.

errorTable(proofed_data)

Errors can be reported by only location by turning off the field parameter.

errorTable(proofed_data, field = "none")

Errors can be reported by only field by turning off the loc parameter.

errorTable(proofed_data, loc = "none")

Location can be specified using one of the 49 contiguous state abbreviations.

errorTable(proofed_data, loc = "CA")

Field can be specified (one of: r paste(c("all", "none", migbirdHIP:::REF_FIELDS_ALL), collapse = ", ")).

errorTable(proofed_data, field = "suffix")

Total errors for a location can be pulled.

errorTable(proofed_data, loc = "CA", field = "none")

Total errors for a field in a particular location can be pulled.

errorTable(proofed_data, loc = "CA", field = "dove_bag")

Part C: Writing Data and Reports {#part-c-writing-data-and-reports}

write_hip {#write_hip}

After the data have been corrected, the data are ready to be written out. Use write_hip() to do final processing to the data, which includes 1) adding in FWS strata and 2) setting NA values to blank strings. If split = FALSE, the final table will be saved as a single .csv to your specified path. If split = TRUE (default), one .csv file per each input .txt source file will be written to the specified directory.

write_hip(corrected_data, path = "C:/HIP/processed_data/")

writeReport {#writereport}

The writeReport() function can be used to automatically generate an R markdown document with figures, tables, and summary statistics. This can be done at the end of a download cycle.

writeReport(
  raw_path = "C:/HIP/DL0901/",
  temp_path = "C:/HIP/corrected_data",
  year = 2024,
  dl = "0901",
  dir = "C:/HIP/dl_reports",
  file = "DL0901_report")


USFWS/migbirdHarvestData documentation built on July 16, 2025, 10:10 p.m.