Controlled substances"
In DOPE: Drug Ontology Parsing Engine

c_cs_alpha.pdf downloaded https://www.deadiversion.usdoj.gov/schedules/orangebook/c_cs_alpha.pdf 2020-09-08

The text file, c_cs_alpha.txt, was opened with UltraEdit and edited to remove headers and the data were aligned to match the drug number column. That manual editing to remove the headers should be done programmatically (perhaps with a package that emulates PERL). Aligning of the drug number column is a major coding task given the structure of the data.

That file was opened in Excel to spit into columns. This should be redone in R.

| What | Column | | ------------ | ------ | | Substance | 1-59 | | DEA Number | 60-62 | | CSA Schedule | 63-71 | | Narcotic | 72 | | Other Names | 73+ |

library(readxl)
controlled <- read_excel("./publicData/c_cs_alpha.xlsx")

library(readr)
write_rds(controlled, "./publicData/controlled.rds")

library(conflicted)
suppressMessages(conflict_prefer("filter", "dplyr"))
#suppressPackageStartupMessages(library(tidyverse))

Mark names that will be diffcult to parse

The Names column can have many different synonyms for a drug (street and/or brand). I am marking names records that have an open parenthisis ( or a comma followed by something other than a space as being difficult to parse.

new <- controlled %>% 
  mutate(difficult = str_count(Names, "[(]") > 0 | 
           str_detect(controlled$Names, ",(?=\\S)"))

Difficult synonyms

Here the difficult names are exported to a text file and then parsed by to separate names with semi-colons. Check the git history for a partial solution that automated this for specific types complexities.

# filtered rows where synonyms are "difficult"
difficult <- new %>% 
  filter(difficult == TRUE)

# created an text file (CSV) for all rows with "difficult" synonyms
difficult %>% 
  select(- difficult) %>% 
  write_csv("./publicData/Difficult.csv")

# for the 'difficult' synonyms, I plan to split by semicolon
# prepped csv file for splitting synonyms via semicolon
synonyms_edited <- read_csv("./publicData/Difficult_Edited.csv")
synonyms_difficult_prep <- synonyms_edited

# data set of difficult synonyms, all split by semicolon
synonyms_difficult <- 
  synonyms_difficult_prep %>% 
  separate(
    Names, 
    into = c("n_1", "n_2", "n_3", "n_4", "n_5", "n_6", "n_7", "n_8", "n_9"), 
    extra = "drop", 
    fill = "right", 
    sep = "[;]", 
    remove = FALSE
    ) %>%
  select(everything()) %>% 
  mutate(across(starts_with("n_"), ~str_trim(.x))) %>% 
  pivot_longer(
    cols = starts_with("n_"), 
    values_to = "synonym", 
    values_drop_na = TRUE) %>% 
  select(-c(name, Names)) %>%
  filter(synonym != '')

Easy synonyms

A few of the "easy" synonyms had conjunctions (and/or) or commas without spaces.

For the 'easy' synonyms, I plan to split by comma. Below are the records which need modification:

row #6: change semicolon to comma
row #58: remove "has been sold as Ecstasy, i.e."
row #64: change semicolon to comma
row #247: replace "and" with a comma
row #274: replace "and" with a comma
row #328: replace "or" with a comma
row #376: remove comma after synonym name
row #378: change semicolon to comma
row #422: replace "and" with a comma
row #423: replace "and" with a comma

# filtered rows where synonyms are NOT "difficult"
easy <- new %>%
  filter(difficult %in% c(FALSE, NA))

# made the comma replacements and created a dataset for each type of 
# transformation, with the final result being a comma

# change semicolon to comma
semi_is_gone <- 
  easy %>% 
  slice(6, 64, 80, 378) %>%
  mutate(Names = str_replace_all(Names, ";", ","))

# replace "and" with comma
and_is_gone <-
  easy %>% 
  slice(79, 120, 247, 274, 422, 423) %>%
  mutate(Names = str_replace_all(Names, " and", ","))

# remove the phrase involving ecstasy
ecstasy_is_gone <-
  easy %>% 
  slice(58) %>%
  mutate(Names = str_remove_all(Names, " has been sold as Ecstasy, i.e."))

# remove comma after synonym
extra_comma_is_gone <-
  easy %>% 
  slice(376) %>%
  mutate(Names = str_remove_all(Names, ","))

# replace "or" with comma
or_is_gone <-
  easy %>% 
  slice(328) %>%
  mutate(Names = str_replace_all(Names, " or", ","))

# dataset of rows that did NOT require a comma change 
#  (i.e. I left them the way they are)  
easy_nochanges <-
  easy %>%
  slice(-6, -58, -64, -79, -80, -120, -247, -274, -328, -376, -378, -422, -423)

# bind rows that required a comma change and rows that didn't 
# now the data is ready to be split by comma 
synonyms_easy_prep <-
  bind_rows(
    semi_is_gone,
    and_is_gone,
    ecstasy_is_gone,
    extra_comma_is_gone,
    or_is_gone,
    easy_nochanges
  )

# dataset of easy synonyms, all split by comma
synonyms_easy <-
  synonyms_easy_prep %>%
  # move the comma separated names into their own columns. 
  #   mine new columns are enough to hold the drugs with MANY synonyms.
  separate(
    Names, 
    into = c("n_1", "n_2", "n_3", "n_4", "n_5", "n_6", "n_7", "n_8", "n_9"), 
    extra = "drop", 
    fill = "right", 
    sep = "[,]", 
    remove = FALSE
    ) %>%
  # remove extra spaces for all the newly created variables
  mutate(across(starts_with("n_"), ~str_trim(.x))) %>% 
  # make the dataset long
  pivot_longer(
    cols = starts_with("n_"), 
    values_to = "synonym", 
    values_drop_na = TRUE) %>% 
  select(-c(name, Names, difficult)) %>%
  # get of any blank name columns
  filter(synonym != '')

bind the 'easy' and 'difficult' synonym datasets to create the final dataset

synonyms <- bind_rows(synonyms_difficult, synonyms_easy)

write_rds(synonyms, "./publicData/synonyms_controlled.rds")

Any scripts or data that you put into this service are public.

DOPE documentation built on June 19, 2021, 1:06 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

DOPE
Drug Ontology Parsing Engine

Controlled substances"
In DOPE: Drug Ontology Parsing Engine

Mark names that will be diffcult to parse

Difficult synonyms

Easy synonyms

bind the 'easy' and 'difficult' synonym datasets to create the final dataset

Try the DOPE package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

DOPE Drug Ontology Parsing Engine

Controlled substances" In DOPE: Drug Ontology Parsing Engine

Mark names that will be diffcult to parse

Difficult synonyms

Easy synonyms

bind the 'easy' and 'difficult' synonym datasets to create the final dataset

Try the DOPE package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

DOPE
Drug Ontology Parsing Engine

Controlled substances"
In DOPE: Drug Ontology Parsing Engine