Nothing
#' Parse MedXN NLP Output
#'
#' Takes files with the raw medication extraction output generated by the MedXN
#' natural language processing system and converts it into a standardized format.
#'
#' Output from different medication extraction systems is formatted in different ways.
#' In order to be able to process the extracted information, we first need to convert
#' the output from different systems into a standardized format. Extracted expressions
#' for various drug entities (e.g., drug name, strength, frequency, etc.) each receive
#' their own column formatted as "extracted expression::start position::stop position".
#' If multiple expressions are extracted for the same entity, they will be separated by
#' backticks.
#'
#' MedXN output files anchor extractions to a specific drug name extraction.
#'
#' In MedXN output files, the results from multiple clinical notes can be combined into
#' a single output file. The beginning of some lines of the output file can indicate
#' when output for a new observation (or new clinical note) begins. The user should specify
#' the argument \code{begText} to be a regular expression used to identify the lines where output
#' for a new clinical note begins.
#'
#' See EHR Vignette for Extract-Med and Pro-Med-NLP as well as Dose Building Using Example Vanderbilt EHR Data for details.
#'
#' @param filename File name for single file containing MedXN output.
#' @param begText A regular expression that would indicate the beginning of a new
#' observation (i.e., extracted clinical note).
#'
#' @return A data.table object with columns for filename, drugname, strength, dose, route,
#' freq, and duration. The filename contains the file name corresponding to the clinical
#' note. Each of the entity columns are of the format
#' "extracted expression::start position::stop position".
#'
#' @examples
#' mxn_output <- system.file("examples", "lam_medxn.csv", package = "EHR")
#' mxn_parsed <- parseMedXN(mxn_output, begText = "^ID[0-9]+_[0-9-]+_")
#' mxn_parsed
#' @export
parseMedXN <- function(filename, begText = "^[R0-9]+_[0-9-]+_[0-9]+_") {
con <- file(filename, 'r', blocking = TRUE)
cnt <- 1
bld <- list()
while(TRUE) {
l <- readLines(con, n = 10000)
# lines should start with GRID_date_note
lineStart <- grepl(begText, l)
ix <- cumsum(lineStart)
ll <- sapply(split(l, ix), paste, collapse = ' ', USE.NAMES = FALSE)
bld[[cnt]] <- tstrsplit(ll, "|", fixed = TRUE)
if(length(l) < 10000) break
cnt <- cnt + 1
}
close(con)
rdf <- vector('list', cnt)
for(i in seq_along(bld)) {
df <- as.data.frame(bld[[i]][1:9], stringsAsFactors = FALSE)
names(df) <- paste0('V', 1:9)
rdf[[i]] <- df
}
alldf <- do.call(rbind, rdf)
x <- data.table::as.data.table(alldf[, c(1,2,4,5,7,8,9)])
data.table::setnames(x, c("filename", "drugname", "strength", "dose", "route", "freq", "duration"))
}
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.