interpret | R Documentation |
Extracts structured information such as quantity, frequency and units
from a dosage text. This function is intended to be called directly by advanced users. For
general use, the function doseconvert
is provided to
automatically load the dictionaries and analyse a single text or multiple texts.
The function testdoseconvert
is provided to assist in testing the
algorithm, by analysing a list of texts and comparing the output with a gold
standard interpretation.
interpret(instring, singlewords, multiwords, patterns,
noisy = FALSE, simplify = FALSE, maxifchoice = TRUE, id = NULL)
instring |
Dosage string to interpret. |
singlewords |
A |
multiwords |
A |
patterns |
A |
noisy |
Whether to print debug information to the console. |
simplify |
Whether multiple periods with different doses should be returned as separate doses or combined into a single dose. |
maxifchoice |
Whether to return the maximum dose if there is a choice of dose (e.g. 1 or 2 tablets daily). If FALSE, the average is returned. |
id |
Unique identifier for the text being interpreted. |
The algorithm works as follows:
Any text occurring after "Notes for patient" or "repeat details" is ignored, as the text after these words usually contains long instructions to the patient. Although there may be some useful dosage information in this section, there is so much additional superfluous information that it is more likely that errors will be introduced if an attempt is made to interpret this part of the text.
The remaining text is split into separate words
Each word is checked against the "singlewords" dictionary. If it is not in the dictionary, an attempt is made to split it into 2 or more words which are in the dictionary (e.g. "everyday" changes to "every day"). The dictionary corrects some common spelling mistakes (e.g. "wekkly" is converted to "weekly")
All words which are not in the singlewords dictionary are deleted. In addition, any words which are between two words of 2 or more letters which are not in the dictionary are deleted. This is to remove superfluous words from the end of long dosage instructions.
An attempt is made to calculate the average of numbers wherever there is a choice of dose, using the function "numbers_replace". For example, "1 times or 2 times" is replaced by "1.5 times" and the choice flag is set to "average"
Phrases are checked against the "first" dictionary, which is derived from the sheets "1abbrev", "2numbers", "3units", "4times" and "5uncertainty". This changes certain phrases to make more standardised phrases. For example "for a week" is converted to "for 7 days"
The function "numbers_replace" is applied again.
The function "analyse_dose" is called. This splits the dosage text into up to 10 parts at certain link words, such as "changeto" or "and". Each partial dose is then analysed using the "second" dictionary to derive the dosage information.
The function "combine_parts" then combines the parts of the dose depending on the link words. For example, if there were two dosages separated by "and", such as "1every morning and 2 at night" then the function would combine them by adding the doses, giving a total daily dose of 3. If two parts represent different dosage regimens then an algorithm is applied to choose one of them, or combine them if they both have a duration (e.g. contraceptive pills with instructions such as 1 daily for 21 days then 7 days break is converted to 0.75 per (1 days) for 28 days)
Data frame with the following columns:
qty |
dose quantity (numeric) |
units |
dose units (character) |
freq |
dose frequency per time interval (numeric) |
tot |
total dose per time period (numeric) |
max |
factor with 3 levels: max, average, exact |
time |
time period in days (numeric) |
change |
factor with 4 levels: first, second, nochange, combined.
If doses for different time periods are combined using |
choice |
factor with 3 levels: nochoice, choice, asneeded |
daily_dose |
interpreted daily dose (numeric; 0 = missing) |
If simplify
= TRUE, the row.names are the ids
if supplied, or equal to the row numbers otherwise.
If simplify
= FALSE, the row.names are ids
.X where X (=1, 2, 3 etc.) is the order of the partial
dose if ids
are supplied, or equal to the row numbers otherwise.
Anoop Shah
Shah AD, Martinez C. An algorithm to derive a numerical daily dose from unstructured text dosage instructions. Pharmacoepidemiol Drug Saf 2006; 15(3): 161-166. doi: 10.1002/pds.1151 http://onlinelibrary.wiley.com/doi/10.1002/pds.1151/
doseconvert
, testdoseconvert
data(singlewords)
data(multiwords)
data(patterns)
interpret('one daily for 1 week then two daily', id = 2,
singlewords = as.drugdose_singlewords(singlewords),
multiwords = as.drugdose_multiwords(multiwords),
patterns = as.drugdose_patterns(patterns),
noisy = TRUE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.