knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
This package is based entirely on a system outlined in the following talks and presentations.
University of Miami Department of Public Health Sciences. Distinguished Lecture Series. Machine Learning and EMS Data for Opioid Overdose Surveillance. February 2020. Patton, A.
Development of Text-Based Algorithm for Opioid Overdose Identification in EMS Data. International Society for Disease Surveillance Annual Meeting 2019. Oral Presentation. Patton, A.
Preventing the Next Overdose: An Emergency Medical Services-Based Non-Fatal Opioid Overdose Surveillance and Telephone Outreach Pilot Program. Council of State and Territorial Epidemiologists Annual Meeting 2019. Breakout Presentation. Arambula, K., Hannah, H., Patton, A., Willis, M., Ereman, R.
Cost-effectiveness of Offering Treatment to Non-Fatal Opioid Overdoses Encountered by Emergency Medical Services (EMS) in Marin County, California. Council of State and Territorial Epidemiologists Annual Meeting 2019. Breakout Presentation. Hannah, H., Arambula, K., Patton, A., Hansen, R. Willis, M., Ereman, R.
Briefly, it is designed to serve as the predictive modeling component for an in-house surveillance system for opioid related events in EMS (911) data. The intended audience is local government health departments or academic researchers. The model takes in NEMSIS 3.0 EMS data and predicts whether an event was or was not an opioid related event. This methodology was proven to be over 99% specific and over 95% sensitive in the program that ran in Marin County, California over 2017-2019. This model can work retrospectively, identifying a historical baseline, or prospectively, predicting records as they come in.
In general, overdoseR works through a combination of several factors built primarily on the contents of the care narrative, medication administered, and primary impression fields.
Using information learned during system development in Marin County, a group of several hundred words, bigrams (pairs of words), and trigrams (triplets of words) were identified to be broadly more common in opioid related events than non-opioid related events. These terms serve as the main features in the models based on their presence or absence in the care narratives.
Information such as medication administered and the response to the medication is also included in the model. Specifically naloxone, morhpine, fentanyl, and midazolam administrations are encoded as features and naloxone response is coded as a feature.
Based on manual review of tens of thousands of EMS records, certain rule based clinical criteria were found to be high effectively in increasing model accuracy.
If an event had a cardiac related primary impression and did not have a positive naloxone response, it was coded as not an opioid related event
If an event occurred in a patient under the age of 18 or over the age of 70 without a positive naloxone response, it was coded as not an opioid related event
If an event had a positive naloxone response, it was coded as an opioid related event regardless of other information
In order to use overdoseR, you will need EMS data (ideally NEMSIS 3.0) that contains at a minimum the following columns.
You will also need a column that indicates whether or not an event was an opioid related event/overdose based on your own internal clinical criteria. For example, when this system was deployed in Marin County, your package author read and manually classified approximately 30,000 individual records to create a pool of training and testing data.
The first step is to use the following function. As written this accepts all the defaults for the column names (see above), but you can provide your own as well.
formatted_result <- format_multirow_ems_data(data_in = your_raw_EMS_data)
This converts multiple row per event data into single row per event data while at the same time creating and adjusting fields to prepare for modeling. The output is a dataframe, so make sure to assign it to something when you call the function. The next step before modeling is to take all the high value terms and turn the data into a one-hot matrix of their presence or abscence.
one_hot_result <- one_hot_single_row_ems_data(data_in = formatted_result)
After that step, the data is ready for modeling. There are currently three options for classification built into overdoseR. There is logistic regression, a support vector machine (SVM), and XGBoost. Each function is used in the same manner with the same syntax more or less. It is important to note that each of the following functions has slightly different arugments. You can read the documentation if you want to be able to understand more about the specifics of each parameter.
You are encouraged to split the data into training and test sets in whatever manner you choose following best practices for randomization and reproducibility.
log_res <- tune_logistic(training_data = one_hot_result_training, testing_data = one_hot_result_testing) svm_res <- tune_svm(training_data = one_hot_result_training, testing_data = one_hot_result_testing) xgb_res <- tune_xgboost(training_data = one_hot_result_training, testing_data = one_hot_result_testing)
The logistic regression result is a list with the model object and the predictions on the test data. The SVM and XGBoost results are a list with the model object, the tuning object, and the predictions on the test data.
Very important note: the predictions are in terms of probability. It is paramount that you, the user, optimize the threshold for class prediction based on your clinical goals. It is extremely strongly recommended not to use a naive 0.5 threshold.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.