buildDose | R Documentation |
Output from parse process is taken and converted into a wide format, grouping drug entity information together based on various steps and rules.
buildDose( dat, dn = NULL, preserve = NULL, dist_method, na_penalty, neg_penalty, greedy_threshold, checkForRare = FALSE )
dat |
data.table object from the output of |
dn |
Regular expression specifying drug name(s) of interest. |
preserve |
Column names to include in output, whose values should not be combined with other rows. If present, dosechange is always preserved. |
dist_method |
Distance method to use for calculating distance of various paths. Alternatively set the ‘ehr.dist_method’ option, which defaults to ‘minEntEnd’. |
na_penalty |
Penalty for matching extracted entities with NA. Alternatively set the |
neg_penalty |
Penalty for negative distances between frequency/intake time and dose amounts. Alternatively set the ‘ehr.neg_penalty’ option, which defaults to 0.5. |
greedy_threshold |
Threshold to use greedy matching; increasing this value too high could lead to the
algorithm taking a long time to finish. Alternatively set the |
checkForRare |
Indicate if rare values for each entity should be found and displayed. |
The buildDose
function takes as its main input (dat
), a data.table object that
is the output of a parse process function (parseMedExtractR
, parseMedXN
,
parseMedEx
, or parseCLAMP
). Broadly, the parsed extractions are grouped
together to form wide, more complete drug regimen information. This reformatting facilitates
calculation of dose given intake and daily dose in the collapseDose
process.
The process of creating this output is broken down into multiple steps:
Removing rows for any drugs not of interest. Drugs of interest are specified with the dn
argument.
Determining whether extractions are "simple" (only one drug mention and at most one extraction per entity) or complex. Complex cases can be more straightforward if they contain at most one extraction per entity, or require a pairing algorithm to determine the best pairing if there are multiple extractions for one or more entities.
Drug entities are anchored by drug name mention within the parse process. For complex cases, drug entities are further grouped together anchored at each strength (and dose with medExtractR) extraction.
For strength groups with multiple extractions for at least one entity, these groups go through a path searching algorithm, which computes the cost for each path (based on a chosen distance method) and chooses the path with the lowest cost.
The chosen paths for each strength group are returned as the final pairings. If route is unique within a strength group, it is standardized and added to all entries for that strength group.
The user can specify additional arguments including:
dist_method
: The distance method is the metric used to determine which entity path is the most likely to
be correct based on minimum cost.
na_penalty
: NA penalties are incurred when extractions are paired with nothing (i.e., an NA), requiring that
entities be sufficiently far apart from one another before being left unpaired.
neg_penalty
: When working with dose amount (DA) and frequency/intake time (FIT), it is much more common
for the ordering to be DA followed by FIT. Thus, when we observe FIT followed by DA, we apply a negative penalty to make such pairings
less likely.
greedy threshold
: When there are many extractions from a clinical note, the number of possible combinations for paths
can get exponentially large, particularly when the medication extraction natural language processing system is incorrect. The greedy
threshold puts an upper bound on the number of entity pairings to prevent the function from stalling in such cases.
If none of the optional arguments are specified, then the buildDose
process uses the default option values specified in the EHR
package documentation.
See EHR Vignette for Extract-Med and Pro-Med-NLP as well as Dose Building Using Example Vanderbilt EHR Data for details. For additional details, see McNeer, et al. 2020.
A data.frame object that contains columns for filename (of the clinical note, inherited from the
parse output object dat
), drugname, strength, dose, route, freq, duration, and drugname_start.
data(lam_mxr_parsed) buildDose(lam_mxr_parsed)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.