medfind: Complete, Partial, and Fuzzy Matches for Medication (and...

Description Usage Arguments Details Value Author(s) Examples

Description

This is a convenience function for identifying drug names and other strings in columns of data frames. Eventually support for RxNorm/UMLS searching will be added, but for now there is no cross-referencing between search strings and generic or drug trade names.

Usage

1
medfind(data, field, string, fuzzy=FALSE, distance=0.1)

Arguments

data

A data frame, containing medication information in one or more columns.

field

A character string identifying the column in which medication information is stored.

string

A character string, partial or complete, to match in field.

fuzzy

Logical. If TRUE, use approximate string matching with the generalized Levenshtein edit distance, as provided by agrep.

distance

Maximum distance allowed for a match. Expressed either as integer, or as a fraction of the pattern length times the maximal transformation cost (will be replaced by the smallest integer not less than the corresponding fraction).

Details

For applications involving searches of recorded medication data, the function enables easy and efficient searching while handling inconsistent data entry – accents and diacritic marks are removed using stri_trans_general from the stringi package, case is ignored, and unusual or incorrect spelling of names is handled to a degree by fuzzy matching with agrep.

Value

A vector of unique character strings corresponding to the identified matches of string in field.

Author(s)

Ryan Kyle, ryan.kyle@mail.mcgill.ca

Examples

1
2
3
4
5
6
7
8
9
# The mednames table is a simple, completely simulated record of participant IDs, treatment dates, and medication names.
# This example results in a single hit, for the exact match.
medfind(mednames, field = "medication", string = "cholecalciferol")

# Using fuzzy matching, we identify 8 matches -- several of which are spelled improperly or contain diacritic marks.
medfind(mednames, field = "medication", string = "cholecalciferol", fuzzy=TRUE)

# By increasing the distance value, we find two more matches, but both are for ergocalciferol -- a slightly different compound.
medfind(mednames, field = "medication", string = "cholecalciferol", fuzzy=TRUE, distance=0.25)

rpkyle/cscmisc documentation built on May 13, 2019, 12:06 p.m.