R/sysdata.rda
sysdata.rda
(reduced from 14 to 4). This enhances transparency and reduces the number of objects that must be generated outside of the R package itself.constants.R
:ref_bagfields
, abbr_usa
, abbr_canada
, pmt_inline
, pmt_files
, states_twoseason
, states_sdbr
, and states_seaducks
were moved and renamed REF_BAG_FIELDS
, REF_ABBR_USA
, REF_ABBR_CANADA
, REF_PMT_INLINE
, REF_PMT_FILES
, REF_STATES_SD_BR
, and REF_STATES_SD_ONLY
, respectively.MS_firstday
and MS_lastday
are no longer needed by issueCheck()
, and states_twoseason
is only used by the download report, so the code to generate that vector was moved to the download report template.REF_BAGS
(previously hip_bags_ref
), REF_DATES
(previously licenses_ref
), REF_ZIP_CODE
(previously zip_code_ref
), and SF_HEXMAP
(previously hexmap
).data-raw/
.txt
files under inst/extdata/DL0901/
, to be used in testing or simulating read_hip()
.rda
files under data/
, to make it easier to demonstrate functions and run unit testsDF_TEST_MINI
contains 1,606 rows from 7 states (OR records to represent solo permit state, ME records to represent SD-only state, DE records to represent SD and BR state, ND records to represent CR state, UT records to represent BT state, CO records to represent CR and BT state, and IA records to represent non-BT, CR, SD, or BR state) and is formatted as though the data were just read in.DF_TEST_TINI_READ
is a subset of DF_TEST_MINI
, and contains 3 rows formatted as though the data were just read inDF_TEST_TINI_CLEANED
is the result of running clean()
on DF_TEST_TINI_READ
DF_TEST_TINI_CURRENT
is the result of running issueCheck()
on DF_TEST_TINI_CLEANED
DF_TEST_TINI_DEDUPED
is the result of running duplicateFix()
on DF_TEST_TINI_CURRENT
DF_TEST_TINI_PROOFED
is the result of running proof()
on DF_TEST_TINI_DEDUPED
DF_TEST_TINI_CORRECTED
is the result of running correct()
on DF_TEST_TINI_PROOFED
variables.R
to define seasonally changing variables in a central place.REF_CURRENT_SEASON
for current HIP season.REF_RELEASES
is a named vector of all migbirdHIP
package releases and the corresponding season of HIP data that the version was intended for.constants.R
to define variables in a central place and thus evaluate data consistently.inLinePermitDNHMessage()
and inLinePermitDNHFix()
both use LOGIC_INLINE_PMT_DNH
) and are shared between functions and testthat
files.REF_
, LOGIC_
, REGEX_
, and SF_
).testRecordMessage()
added to read_hip()
and testRecordFilter()
added to clean()
to find and filter out any testing records mistakenly sent to us by the states.duplicatePlot()
function added; duplicateFinder()
(previously named findDuplicates()
) function no longer outputs a plot.zeroBagsMessage()
internal function is a new feature of read_hip()
that checks for records with all-zero bag values and returns a message to the console if they are detected.proof()
constants.R
file, which are used by proof()
and test-proof.R
duplicateFinder()
duplicateFields()
uses purrr
to significantly reduce redundancy in duplicateFinder()
; overall, refactoring reduced the function's length from 151 lines to 50 lines and improved processing speed.duplicateFix()
duplicateID()
, duplicateNewest()
, duplicateAllOnes()
, duplicateAllOnesGroupSize()
, duplicateDecide()
, duplicateRecordType()
and duplicateSample()
).clean()
.read_hip()
listFiles()
, ignorePermits()
, ignoreHolds()
, idBlankFiles()
, dropBlankFiles()
, checkFileNameDateFormat()
, checkFileNameStateAbbr()
, readMessages()
, missingPIIMessage()
, missingEmailsMessage()
, testRecordMessage()
, zeroBagsMessage()
, naBagsMessage()
, nonDigitBagsMessage()
, inLinePermitDNHMessage()
, dlStateNAMessage()
, and dlDateNAMessage()
).clean()
strataFix()
split into cranePermitBagFix()
and btpiPermitBagFix()
; and 6 new functions: namesToUppercase()
, missingPIIFilter()
, moveSuffixes()
, formatZip()
, zipCheck()
, and inLinePermitDNHFix()
)correct()
via correctMiddleInitial()
)correct()
correctEmail()
, correctTitle()
, correctSuffix()
, correctMiddleInitial()
))clean()
via naAndZeroBagsFilter()
)proof()
via correctMiddleInitial()
(this step previously happened in clean()
)write_hip()
type
param conditionally checks record_type
field and cranes
, band_tailed_pigeon
, and dove_bag
fields depending on the user input..xlsx
and .xls
file extensions are converted to .csv
strataCheck()
to bagCheck()
and renamed strata.R
to bags.R
.bagCheck()
split and now uses an internal function, summarizeBadBags()
.renameFiles()
to fileRename()
and moved from renameFiles.R
to files.R
to be grouped with fileCheck()
(previously in fileCheck.R
).findDuplicates()
and fixDuplicates()
renamed to duplicateFinder()
and duplicateFix()
to mirror naming conventions of other functions with the subject of the verb coming first (e.g. glyphFinder()
, glyphCheck()
).duplicates.R
(previously separated into findDuplicates.R
and fixDuplicates.R
)read_hip()
and write_hip()
errorPlot_fields()
, errorPlot_states()
, and errorPlot_dl()
now named errorPlotFields()
, errorPlotStates()
, errorPlotDL()
errorLevel_errors_state()
and errorLevel_errors_field()
renamed to errorLevelErrorsByState()
and errorLevelErrorsByField()
redFlags()
no longer exported (used only in the download report), moved to the errorPlots.R
script instead of being in its own filevalidate()
, investigate()
, and identicalBags()
.outOfStateHunters()
and youthHunters()
functions because they were not being used.recordLevel_errors_state()
function since it was not being used.sumLines()
deleted and read_hip()
param sumlines
eliminated; no longer used and not considered useful moving forward.dl_report.qmd
:zzz.R
migbirdHIP
package version and which season of HIP data the package version is compatible with.x
now cleaned_data
, proofed_data
, etc).{tibble}
no longer a required importread_hip()
now catches file names with incorrect MMDDYYYY or DDMMYYYY date format.issueCheck()
now returns an error for NA values in record_key
field.~ .x
anonymous function notation with \(x)
.shiftCheck()
to return a summary of shift errors rather than just a table of record id values.issueCheck()
, issueAssign()
, and issuePlot()
to accommodate new rules in evaluating if a record is current. All records are now current unless their issue_date
falls before issue_start
or after the last day of migratory bird hunting in the record's state.proof()
and errorPlot_fields()
to no longer flag and/or plot youth hunters (hunters with birth year \< 16 years ago).dl_report.qmd
NULL
identicalBags()
function to exclude matching coots_snipe and rails_gallinules from MI in output; this state uses the response from one question to populate both fields.read_hip()
function to exclude "hold" subdirectories when reading season HIP data.stopifnot()
to all functions to safeguard against running with incorrect/invalid parameters.distinct
changed to unique
for pullErrors()
output
changed to return
for outOfStateHunters()
assigned_data
changed to x
for issuePlot()
data
changed to x
for glyphCheck()
, glyphFinder()
, issueAssign()
, issueCheck()
, and shiftCheck()
NEWS.md
file to track changes to the package.man/migbirdHIP-package.Rd
fileCheck()
function: checks if any files in the input folder have already been written to processed folder.shiftCheck()
function: find and print any rows that have a line shift error with number of positions shifted.identicalBags()
function: returns output if any columns are exactly the same in a file; does not return "no season" matches.glyphCheck()
function: pull and view any non-UTF-8 characters in the raw data; helps guide manual fixes to read in the HIP files without line shifts.glyphFinder()
no longer exported, now used internally inside of glyphCheck()
errorLevel_errors_field()
, errorLevel_errors_state()
, and recordLevel_errors_state()
), which are used inside redFlags()
, errorPlot_fields()
, and errorPlot_states()
. They reduce code redundancy and ensure updates happen universally.issueAssign()
and issuePlot
), which are used inside of issueCheck()
and by the download report (dl_report.qmd
).strataFix()
to be used inside of clean()
to resolve false permit labels. This function edits strata values for band_tailed_pigeon
and crane
from states that submit permit files for crane and band-tailed pigeons; values changed from "2"
to "0"
.writeReport()
to render quarto documents.issueCheck()
to place more emphasis on issue_date
to determine relevancy of a record. The function no longer exports future and past data as .csv
files. Past data are still filtered out from the returned tibble. Output messages indicate if future data exist.clean()
function:clean()
from proof()
; now checks on entire zip code, not just prefix. Remove ending 0
when zip
value is 10 digits long.hunt_mig_birds
field when it equals "0"
to "2"
. For context, a solo permit contains a "2"
in at least one of the band_tailed_pigeon
, brant
, or seaduck
fields and contains "0"
in all other bag fields.correct()
to remove any records with value of "0"
or NA
value in every bag field; improved email
field cleaning and repair.strataCheck()
to return two additional fields in output; 1) number of bad strata and 2) proportion of bad strata. The function now checks for permit species coming during regular HIP and returns them as erroneous (e.g. NM band_tailed_pigeon
= "2"
).write_hip()
to set any state/species combinations without a season to have strata of "0"
; bad bag values remain NA.sumLines()
to improve speed and efficiency. In addition, the function now returns a data table with the sum of lines per file instead of a single number. No longer exported; set as internal function.read_hip()
to eliminate encoding check and optionally use sumLines()
function to ensure all lines were read in. Returns a message if any records contain a bag value other than a single digit. In addition, now converts blank strings to NA
.validate()
to return source_file
field and filter out states and species with no season from function output.investigate()
to no longer be exported; it works inside of validate()
to return a more detailed output. This replaces the previous workflow of running investigate()
separately.manualFix()
function because it is no longer relevant to the package.shiftFix()
because line shift errors cannot be fixed programmatically on a reliable basis.dl_report.qmd
replaced RMarkdown dl_report.Rmd
.catch_messages()
function created only for use in the dl_report.qmd
and is not exported or contained within the migbirdHIP
package internally. The catch_messages()
function wraps around pre-processing functions (such as read_hip()
, clean()
, issueCheck()
, etc) and captures messages in a list so that they can be returned as readable bullet points.season_report.Rmd
templatemagrittr
and rmarkdown
quarto
and sf
spelling
sysdata.rda
)devtools::check()
write_hip()
to eliminate redundancy; replaced repeated left_join()
with for loopfindDuplicates()
by throwing an error message for a bad string supplied to the return
parameter at the start, which reduces wait time for failure.findDuplicates()
redundancy of searching for duplicate fields using a for
loop or purrr::map()
, but this change added 20+ seconds of processing time so left the redundancy as-is.tidyr::separate()
with tidyr::separate_wider_delim()
or tidyr::separate_wider_position()
dplyr::summarize()
with dplyr::reframe()
since returning more than 1 row per group was deprecated in dplyr 1.1.0
ggplot::stat()
with ggplot::after_stat()
, since the former was deprecated in ggplot2 3.4.0
%>%
and %<>%
with base R pipe |>
for increased speed and reduced dependency on tidyverse packages.DESCRIPTION
file:en-US
usethis::use_spell_check()
to package checking workflow, which added an inst/WORDLIST
file (whitelisted words) to the package.glyphFinder()
functionissueCheck()
and proof()
dl_report.Rmd
templatefixDuplicates()
and validate()
migbirdHIP
in function documentationmigbirdHIP
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.