parseMedXN: Parse MedXN NLP Output

Description Usage Arguments Details Value Examples

View source: R/parseMedxn.R


Takes files with the raw medication extraction output generated by the MedXN natural language processing system and converts it into a standardized format.


parseMedXN(filename, begText = "^[R0-9]+_[0-9-]+_[0-9]+_")



File name for single file containing MedXN output.


A regular expression that would indicate the beginning of a new observation (i.e., extracted clinical note).


Output from different medication extraction systems is formatted in different ways. In order to be able to process the extracted information, we first need to convert the output from different systems into a standardized format. Extracted expressions for various drug entities (e.g., drug name, strength, frequency, etc.) each receive their own column formatted as "extracted expression::start position::stop position". If multiple expressions are extracted for the same entity, they will be separated by backticks.

MedXN output files anchor extractions to a specific drug name extraction.

In MedXN output files, the results from multiple clinical notes can be combined into a single output file. The beginning of some lines of the output file can indicate when output for a new observation (or new clinical note) begins. The user should specify the argument begText to be a regular expression used to identify the lines where output for a new clinical note begins.

See EHR Vignette for Extract-Med and Pro-Med-NLP as well as Dose Building Using Example Vanderbilt EHR Data for details.


A data.table object with columns for filename, drugname, strength, dose, route, freq, and duration. The filename contains the file name corresponding to the clinical note. Each of the entity columns are of the format "extracted expression::start position::stop position".


mxn_output <- system.file("examples", "lam_medxn.csv", package = "EHR")
mxn_parsed <- parseMedXN(mxn_output, begText = "^ID[0-9]+_[0-9-]+_")

Example output

                   filename                drugname           strength
1: ID1_2012-11-22_Note1.txt   lamotrigine::255::266   100 mg::278::284
2: ID1_2012-11-22_Note1.txt      Vimpat::1086::1092  200mg::1093::1098
3: ID1_2012-11-22_Note1.txt lamotrigine::1172::1183 100 mg::1184::1190
4: ID1_2012-11-22_Note1.txt    Lamictal::1213::1221                   
                        dose             route
1: 2::288::289`1.5::310::313                  
2:                           mouth::1106::1111
3:                                        <NA>
4:           1.5::1223::1226 mouth::1238::1243
                                  freq          duration
1: morning::298::305`evening::322::329 2 weeks::334::341
2:             twice daily::1112::1123              <NA>
3:                                <NA>              <NA>
4:             twice a day::1244::1255              <NA>

EHR documentation built on June 9, 2021, 9:07 a.m.