medExtractR: Extract Medication Entities From Clinical Note

View source: R/medExtractR.R

medExtractRR Documentation

Extract Medication Entities From Clinical Note

Description

This function identifies medication entities of interest and returns found expressions with start and stop positions.

Usage

medExtractR(
  note,
  drug_names,
  window_length,
  unit,
  max_dist = 0,
  drug_list = "rxnorm",
  lastdose = FALSE,
  lastdose_window_ext = 1.5,
  strength_sep = NULL,
  flag_window = 30,
  dosechange_dict = "default",
  ...
)

Arguments

note

Text to search.

drug_names

Vector of drug names of interest to locate.

window_length

Length (in number of characters) of window after drug in which to look.

unit

Strength unit to look for (e.g., ‘mg’).

max_dist

Numeric - edit distance to use when searching for drug_names.

drug_list

Vector of known drugs that may end search window. By default calls rxnorm_druglist.

lastdose

Logical - whether or not last dose time entity should be extracted.

lastdose_window_ext

Numeric - multiplicative factor by which window_length should be extended when identifying last dose time.

strength_sep

Delimiter for contiguous medication strengths (e.g., ‘-’ for “LTG 200-300”).

flag_window

How far around drug (in number of characters) to look for dose change keyword - default fixed to 30. See ‘Details’ section below for further explanation.

dosechange_dict

List of keywords used to determine if a dose change entity is present.

...

Parameter settings used in extracting frequency, intake time, route, and duration. Potentially useful parameters include freq_dict, intaketime_dict, route_dict, and duration_dict (see ... argument in extract_entities) to specify frequency or intake time dictionaries, as well as ‘freq_fun’, ‘intaketime_fun’, ‘route_fun’, and ‘duration_fun’ for user-specified extraction functions. If no additional arguments are provided, medExtractR_tapering will use extract_generic and the default dictionary for each entity. See extract_entities documentation for details.

Details

This function uses a combination of regular expressions, rule-based approaches, and dictionaries to identify various drug entities of interest. Specific medications to be found are specified with drug_names, which is not case-sensitive or space-sensitive (e.g., ‘lamotrigine XR’ is treated the same as ‘lamotrigineXR’). Entities to be extracted include drug name, strength, dose amount, dose, frequency, intake time, route, duration, and time of last dose. See extract_entities and extract_lastdose for more details.

When searching for medication names of interest, fuzzy matching may be used. The max_dist argument determines the maximum edit distance allowed for such matches. If using fuzzy matching, any drug name with less than 5 characters will only allow an edit distance of 1, regardless of the value of max_dist.

The purpose of the drug_list argument is to reduce false positives by removing information that is likely to be related to a competing drug, not our drug of interest, By default, this is “rxnorm” which calls data(rxnorm_druglist). A custom drug list in the form of a character string can be supplied instead, or can be appended to rxnorm_druglist by specifying drug_list = c("rxnorm", custom_drug_list). medExtractR then uses this list to truncate the search window at the first appearance of an unrelated drug name. This uses publicly available data courtesy of the U.S. National Library of Medicine (NLM), National Institutes of Health, Department of Health and Human Services; NLM is not responsible for the product and does not endorse or recommend this or any other product. See rxnorm_druglist documentation for details.

Most medication entities are searched for in a window after the drug. The dose change entity, or presence of a keyword to indicate a non-current drug regimen, may occur before the drug name. The flag_window argument adjusts the width of the pre-drug window. Both flag_window and dosechange_dict are not default arguments to the extended function medExtractR_tapering since that extension uses a more flexible search window and extraction procedure. In the tapering extension, entity extraction is more flexible, and any entity can be extracted either before or after the drug mention. Thus functionality for dose change identification is identical to all other dictionary-based entities.

The stength_sep argument is NULL by default, but can be used to identify shorthand for morning and evening doses. For example, consider the phrase ‘Lamotrigine 300-200’ (meaning 300 mg in the morning and 200 mg in the evening). The argument strength_sep = '-' identifies the full expression 300-200 as dose strength in this phrase.

Value

data.frame with entity information. Only extractions from found entities are returned. If no dosing information for the drug of interest is found, the following output will be returned:

entity expr pos
NA NA NA

The “entity” column of the output contains the formatted label for that entity, according to the following mapping.
drug name: “DrugName”
strength: “Strength”
dose amount: “DoseAmt”
dose strength: “DoseStrength”
frequency: “Frequency”
intake time: “IntakeTime”
duration: “Duration”
route: “Route”
dose change: “DoseChange”
time of last dose: “LastDose”
Sample output:

entity expr pos
DoseChange decrease 66:74
DrugName Prograf 78:85
Strength 2 mg 86:90
DoseAmt 1 91:92
Route by mouth 100:108
Frequency bid 109:112
LastDose 2100 129:133

References

Nelson SJ, Zeng K, Kilbourne J, Powell T, Moore R. Normalized names for clinical drugs: RxNorm at 6 years. J Am Med Inform Assoc. 2011 Jul-Aug;18(4)441-8. doi: 10.1136/amiajnl-2011-000116. Epub 2011 Apr 21. PubMed PMID: 21515544; PubMed Central PMCID: PMC3128404.

Examples


note1 <- "Progrf Oral Capsule 1 mg 3 capsules by mouth twice a day - last
dose at 10pm"
medExtractR(note1, c("prograf", "tacrolimus"), 60, "mg", 2, lastdose=TRUE)
note2 <- "Currently on lamotrigine 150-200, but will increase to lamotrigine 200mg bid"
medExtractR(note2, c("lamotrigine", "ltg"), 130, "mg", 1, strength_sep = "-")


medExtractR documentation built on June 7, 2022, 1:08 a.m.