Package panthr
was created with the express purpose of fluidly streamlining
common albeit tedious data cleaning and manipulation tasks with field and value
formats in the Georgia State University data warehouse.
Package panthr
has no package dependencies and is coded entirely in base R.
Package panthr
contains over 35 functions capable of formatting:
CamelCaps
, snake_case
, kebab-case
, etc.DD-MMM-YY
format into: YYYY-MM-DD
YYYYMM
formatYYYYMM
format into:YYYY-MM-DD
In addition, panthr
uses warehouse validation tables to convert codes into human-readable descriptions, including:
A number of these decoding functions offer the added ability to abbreviate full-length descriptions in order to provide more convenient and concise labeling in tables and visualizations.
Lastly, panthr
not only decodes ethnicity and race, it formats them per conventions
established by the Office of Institutional Effectiveness (OIE), allowing:
You can install the panthr
package directly from GitHub using the devtools
package.
install.packages("devtools")
devtools
by running library(devtools)
panthr
with install_github("jamisoncrawford/panthr")
panthr
by running library(panthr)
You can use RStudio's autocomplete feature to quickly scroll through available functions.
Simply type panthr::
and peruse the scrollable tooltips and function descriptions.
You can also read the full documentation for each dataset by typing the bare function name like so:
help(function_name)
?function_name
Documentation will then appear in the "Help" pane.
Package panthr
includes a sample dataset of 10,000 anonymized and "shuffled" student records: students
.
Invoke this dataset by running data(students)
.
Further details on how these data were shuffled may be accessed by running help(students)
.
This dataset features variable names and formatting conventions exactly as they
appear when exported from Oracle SQL Developer and demonstrate how panthr
was
created specifically to treat those conventions.
R CMD Check results currently clock at 20 seconds with 0 errors, 0 warnings, and
1 note regarding compression optimization for internal sysdata.rda
datasets.
2019-12-20: Prototype package panthr
documentation and functions first published.
2020-01-06: Additional decode functions added with "clean" description variables.
2020-01-07: Date functions added; visible sysdata.rda
variable bindings added.
2020-01-08: Internal sysdata.rda
files debugged; field functions added. README.md
created.
Jamison R. Crawford, MPA is an Institutional Research Associate at The Graduate School and Center for the Advancement of Students and Alumni (CASA) at Georgia State University.
He is an Associate Faculty member at Arizona State University where he teaches Fundamentals of Data Science I to Master of Science candidates in Program Evaluation & Data Analytics (PEDA) at the Watts College of Public Service and Community Solutions and coauthor of "Foundations of Data Science".
The Office of Institutional Effectiveness (OIE) at Georgia State has been instrumental in understanding warehouse data, validation tables, and institutional conventions.
Very special thanks to:
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.