addWordnet: Use WordNet to assist concept database creation

View source: R/wordnet.R

addWordnetR Documentation

Use WordNet to assist concept database creation

Description

Adds terms from a WordNet thesaurus to a concept database, matching on term. It is recommended to restrict the wordnet categories to ensure that words with multiple meanings are not linked to the wrong synonym. This function also corrects some known errors in WordNet to avoid them being passed on to the CDB; currently this applies to 'allergy = allergic reaction' and 'cuneiform bone = triquetral' but more corrections can be done if needed.

Usage

addWordnet(
  CDB_TABLE,
  wn_categories,
  WN,
  CHECK_TABLE = NULL,
  errors_to_remove = list(c("allergy", "allergic reaction"), c("allergic",
    "allergic reaction"), c("cuneiform bone", "triquetral bone"), c("upset", "disorder"),
    c("disorderliness", "disorder"))
)

Arguments

CDB_TABLE

data.frame or data.table with columns conceptId (integer64) and term (character, with space before and after) containing existing descriptions in the CDB

wn_categories

WordNet categories to use

WN

WordNet data.table as returned by downloadWordnet

CHECK_TABLE

other table in the same format as CDB_TABLE to check for WordNet synonyms that link to another unrelated concept, where this synonym will be excluded because of the risk of errors

errors_to_remove

list of character vectors of length two containing synonym pairs to be removed. The first entry of the pair will be removed from the WordNet file before it is used for adding to CDB

Value

CDB_TABLE with extra rows for Wordnet synonyms

References

https://wordnet.princeton.edu/

See Also

[downloadWordnet()]

Examples

WORDNET <- data.table::data.table(cat = c('noun.body', 'noun.state'),
  wordnetId = bit64::as.integer64('1', '2'),
  synonyms = list(c('heart', 'pump', 'ticker'),
  c('infection', 'infectious')),
  parents = list('cardiovascular system',
  'pathologic process'), 
  adj = list('cardiac', 'infectious'))
# Add Wordnet synonyms to a concept database table
SNOMED <- sampleSNOMED()
CDB_TABLE <- description(c('Heart', 'Infection'),
  include_synonyms = TRUE)[type == 'Synonym',
  .(conceptId, term = paste0(' ', tolower(term), ' '))]
addWordnet(CDB_TABLE, 'noun.state', WORDNET)

anoopshah/Rdiagnosislist documentation built on Oct. 18, 2024, 9:48 a.m.