epimatch: epimatch.

Description Details Running the User Interface Backend Wrapper Functions Dissimilarity Functions Matrix Summary Functions

Description

This package provides an interactive way to visualize potentially duplicated records across tabular data sets by calculating dissimilarity scores on user-specified columns in the data.

Details

Find matching patient records across tabular datasets

Running the User Interface

The user interface can be invoked with the function launch. This will launch the app in your browser.

Backend

The backend to the user interface is a modular set of functions that can calculate dissimilarity scores on any column(s) of the data. Once dissimilarity scores are calculated, they are given weights based on importance, summed, and scaled from zero to one. This resulting matrix is traversed, and indices below the given threshold are returned.

Wrapper Functions

The wrapper functions provide a way to programmatically execute the distance functions on the data. They retun a list of matrices and a list of matching indices, respectively.

Dissimilarity Functions

Each dissimilarity function returns a distance matrix scaled from 0 to 1 where 0 indicates a perfect match and 1 indicates no match. The following distances are available:

Matrix Summary Functions

Once matrices are computed and stored in a list, they have weights applied, and are summed. When summing, missing values are given a custom defined weight (default 0.5). The following functions work with the matrices:


Hackout3/epimatch documentation built on May 6, 2019, 9:48 p.m.