extractMed: Extract medication information from clinical notes

View source: R/extractMed.R

extractMedR Documentation

Extract medication information from clinical notes


This function is an interface to the medExtractR function within the medExtractR package, and allows drug dosing information to be extracted from free-text sources, e.g., clinical notes.


extractMed(note_fn, drugnames, drgunit, windowlength, max_edit_dist = 0, ...)



File name(s) for the text file(s) containing the clinical notes. Can be a character string for an individual note, or a vector or list of file names for multiple notes.


Vector of drug names for which dosing information should be extracted. Can include various forms (e.g., generic, brand name) as well as abbreviations.


Unit of the drug being extracted, e.g., 'mg'


Length of the search window (in characters) around the drug name in which to search for dosing entities


Maximum edit distance allowed when attempting to extract drugnames. Allows for capturing misspelled drug name information.


Additional arguments to medExtractR, for example lastdose=TRUE to extract time of last dose (see medExtractR package documentation for details)


Medication information, including dosing data, is often stored in free-text sources such as clinical notes. The extractMed function serves as a convenient wrapper for the medExtractR package, a natural language processing system written in R for extracting medication data. Within extractMed, the medExtractR function identifies dosing data for drug(s) of interest, specified by the drugnames argument, using rule-based and dictionary-based approaches. Relevant dosing entities include medication strength (identified using the unit argument), dose amount, dose given intake, intake time or frequency of dose, dose change keywords (e.g., 'increase' or 'decrease'), and time of last dose. After applying medExtractR to extract drug dosing information, extractMed appends the file name to results to ensure they are appropriately labeled.

See EHR Vignette for for Extract-Med and Pro-Med-NLP. For more details, see Weeks, et al. 2020.


A data.frame with the extracted dosing information, labeled with file name as an identifier
Sample output:

filename entity expr pos
note_file1.txt DoseChange decrease 66:74
note_file1.txt DrugName Prograf 78:85
note_file1.txt Strength 2 mg 86:90
note_file1.txt DoseAmt 1 91:92
note_file1.txt Frequency bid 101:104
note_file1.txt LastDose 2100 121:125


tac_fn <- list(system.file("examples", "tacpid1_2008-06-26_note1_1.txt", package = "EHR"),
               system.file("examples", "tacpid1_2008-06-26_note2_1.txt", package = "EHR"),
               system.file("examples", "tacpid1_2008-12-16_note3_1.txt", package = "EHR"))

           drugnames = c("tacrolimus", "prograf", "tac", "tacro", "fk", "fk506"),
           drgunit = "mg",
           windowlength = 60,
           max_edit_dist = 2,

EHR documentation built on Dec. 28, 2022, 1:31 a.m.