README.md

Package panthr: Streamlining GSU Data Processing in R

Package panthr was created with the express purpose of fluidly streamlining common albeit tedious data cleaning and manipulation tasks with field and value formats in the Georgia State University data warehouse.

Package panthr has no package dependencies and is coded entirely in base R.

Functions

Package panthr contains over 35 functions capable of formatting:

In addition, panthr uses warehouse validation tables to convert codes into human-readable descriptions, including:

A number of these decoding functions offer the added ability to abbreviate full-length descriptions in order to provide more convenient and concise labeling in tables and visualizations.

Lastly, panthr not only decodes ethnicity and race, it formats them per conventions established by the Office of Institutional Effectiveness (OIE), allowing:

Installing panthr & Exploring Functions

You can install the panthr package directly from GitHub using the devtools package.

  1. In the R console, run install.packages("devtools")
  2. Load devtools by running library(devtools)
  3. Install panthr with install_github("jamisoncrawford/panthr")
  4. Load panthr by running library(panthr)

You can use RStudio's autocomplete feature to quickly scroll through available functions. Simply type panthr:: and peruse the scrollable tooltips and function descriptions.

Peruse panthr functions with RStudio's autocomplete feature.

You can also read the full documentation for each dataset by typing the bare function name like so:

Documentation will then appear in the "Help" pane.

Practice Datasets

Package panthr includes a sample dataset of 10,000 anonymized and "shuffled" student records: students. Invoke this dataset by running data(students).

Further details on how these data were shuffled may be accessed by running help(students).

This dataset features variable names and formatting conventions exactly as they appear when exported from Oracle SQL Developer and demonstrate how panthr was created specifically to treat those conventions.

Development Notes

R CMD Check results currently clock at 20 seconds with 0 errors, 0 warnings, and 1 note regarding compression optimization for internal sysdata.rda datasets.

2019-12-20: Prototype package panthr documentation and functions first published.

2020-01-06: Additional decode functions added with "clean" description variables.

2020-01-07: Date functions added; visible sysdata.rda variable bindings added.

2020-01-08: Internal sysdata.rda files debugged; field functions added. README.md created.

About the Author

Jamison R. Crawford, MPA is an Institutional Research Associate at The Graduate School and Center for the Advancement of Students and Alumni (CASA) at Georgia State University.

He is an Associate Faculty member at Arizona State University where he teaches Fundamentals of Data Science I to Master of Science candidates in Program Evaluation & Data Analytics (PEDA) at the Watts College of Public Service and Community Solutions and coauthor of "Foundations of Data Science".

Contributors

The Office of Institutional Effectiveness (OIE) at Georgia State has been instrumental in understanding warehouse data, validation tables, and institutional conventions.

Very special thanks to:



jamisoncrawford/panthr documentation built on March 9, 2020, 6:18 p.m.