knitr::opts_chunk$set( collapse = TRUE, comment = "#>", out.width = "100%" )
In this example we will create a candidate codelist for osteoarthritis, exploring how different search strategies may impact our final codelist. First, let's load the necessary packages and create a cdm reference using mock data.
library(dplyr) library(CodelistGenerator) cdm <- mockVocabRef()
The mock data has the following hypothetical concepts and relationships:
knitr::include_graphics("Figures/1.png")
We will start by creating a codelist with keywords match. Let's say that we want to find those codes that contain "Musculoskeletal disorder" in their concept_name:
knitr::include_graphics("Figures/2.png")
getCandidateCodes( cdm = cdm, keywords = "Musculoskeletal disorder", domains = "Condition", standardConcept = "Standard", includeDescendants = FALSE, searchInSynonyms = FALSE, searchNonStandard = FALSE, includeAncestor = FALSE )
Note that we could also identify it based on a partial match or based on all combinations match.
getCandidateCodes( cdm = cdm, keywords = "Musculoskeletal", domains = "Condition", standardConcept = "Standard", searchInSynonyms = FALSE, searchNonStandard = FALSE, includeDescendants = FALSE, includeAncestor = FALSE ) getCandidateCodes( cdm = cdm, keywords = "Disorder musculoskeletal", domains = "Condition", standardConcept = "Standard", searchInSynonyms = FALSE, searchNonStandard = FALSE, includeDescendants = FALSE, includeAncestor = FALSE )
Notice that currently we are only looking for concepts with domain = "Condition". However, we can expand the search to all domains using domain = NULL.
getCandidateCodes() function will generate a table with class "candidate_codes", which contains an atribute with the details of the search strategy:
candidate_codes <- getCandidateCodes( cdm = cdm, keywords = "Musculoskeletal", domains = "Condition", standardConcept = "Standard", searchInSynonyms = FALSE, searchNonStandard = FALSE, includeDescendants = FALSE, includeAncestor = FALSE ) searchStrategy(candidate_codes)
Now we will include standard and non-standard concepts in our initial search. By setting standardConcept = c("Non-standard", "Standard"), we allow the function to return, in the final candidate codelist, both the non-standard and standard codes that have been found.
knitr::include_graphics("Figures/3.png")
getCandidateCodes( cdm = cdm, keywords = "Musculoskeletal disorder", domains = "Condition", standardConcept = c("Non-standard", "Standard"), searchInSynonyms = FALSE, searchNonStandard = FALSE, includeDescendants = FALSE, includeAncestor = FALSE )
We can also search for multiple keywords simultaneously, capturing all of them with the following search:
knitr::include_graphics("Figures/4.png")
getCandidateCodes( cdm = cdm, keywords = c( "Musculoskeletal disorder", "arthritis" ), domains = "Condition", standardConcept = c("Standard"), includeDescendants = FALSE, searchInSynonyms = FALSE, searchNonStandard = FALSE, includeAncestor = FALSE )
Now we will include the descendants of an identified code using includeDescendants argument
knitr::include_graphics("Figures/5.png")
getCandidateCodes( cdm = cdm, keywords = "Musculoskeletal disorder", domains = "Condition", standardConcept = "Standard", includeDescendants = TRUE, searchInSynonyms = FALSE, searchNonStandard = FALSE, includeAncestor = FALSE )
Notice that now, in the column found_from, we can see that we have obtain concept_id=1 from an initial search, and concept_id_=c(2,3,4,5) when searching for descendants of concept_id 1.
We can also exclude specific keywords using the argument exclude
knitr::include_graphics("Figures/6.png")
getCandidateCodes( cdm = cdm, keywords = "Musculoskeletal disorder", domains = "Condition", exclude = c("Osteoarthrosis", "knee"), standardConcept = "Standard", includeDescendants = TRUE, searchInSynonyms = FALSE, searchNonStandard = FALSE, includeAncestor = FALSE )
When multiple words are added within a term (e.g., "knee osteoarthritis"), each word will be searched independently, so that for example, "osteoarthritis of knee" is excluded:
getCandidateCodes( cdm = cdm, keywords = "Musculoskeletal disorder", domains = "Condition", exclude = c("knee osteoarthritis"), standardConcept = "Standard", includeDescendants = TRUE, searchInSynonyms = FALSE, searchNonStandard = FALSE, includeAncestor = FALSE )
If we only want to exclude exact matching terms (without accounting for words boundaries) we need to add "/" at the beginning and at the end of the term. Hence, using "knee osteoarthritis", "osteoarthritis of knee" won't be excluded. However, if we had "rightknee osteoarthritis", it would be excluded.
# No exclusion: getCandidateCodes( cdm = cdm, keywords = "Knee", domains = "Condition", exclude = NULL, standardConcept = c("Standard", "Non-standard"), includeDescendants = TRUE, searchInSynonyms = FALSE, searchNonStandard = FALSE, includeAncestor = FALSE ) # Exclusion looking for terms: getCandidateCodes( cdm = cdm, keywords = "Knee", domains = "Condition", exclude = c("knee osteoarthritis"), standardConcept = c("Standard", "Non-standard"), includeDescendants = TRUE, searchInSynonyms = FALSE, searchNonStandard = FALSE, includeAncestor = FALSE ) # Exclusion looking for partial matching terms (without word boundaries) getCandidateCodes( cdm = cdm, keywords = "Knee", domains = "Condition", exclude = c("/knee osteoarthritis/"), standardConcept = c("Standard", "Non-standard"), includeDescendants = TRUE, searchInSynonyms = FALSE, searchNonStandard = FALSE, includeAncestor = FALSE ) # Exclusion looking for partial matching terms (without word boundaries) getCandidateCodes( cdm = cdm, keywords = "Knee", domains = "Condition", exclude = c("/e osteoarthritis/"), standardConcept = c("Standard", "Non-standard"), includeDescendants = TRUE, searchInSynonyms = FALSE, searchNonStandard = FALSE, includeAncestor = FALSE )
If we want to do exact matching (that means, to find the exact two words "knee osteoarthritis" in the concept name) we need to use "/\b" at the beginning and at the end of the expression.
getCandidateCodes( cdm = cdm, keywords = "Knee", domains = "Condition", exclude = c("/\bKnee osteoarthritis/\b"), standardConcept = c("Standard", "Non-standard"), includeDescendants = TRUE, searchInSynonyms = FALSE, searchNonStandard = FALSE, includeAncestor = FALSE ) # We will now only search for "ee osteoarthritis" to show that # "knee osteoarthritis" won't be excluded: getCandidateCodes( cdm = cdm, keywords = "Knee", domains = "Condition", exclude = c("/\bee osteoarthritis/\b"), standardConcept = c("Standard", "Non-standard"), includeDescendants = TRUE, searchInSynonyms = FALSE, searchNonStandard = FALSE, includeAncestor = FALSE )
For example, if we look for
Notice that, for example, if we wanted `keywords = "depression"` and `exclude = "ST depression"`, concepts like "poSTpartum depression" would be excluded. To avoid this,
we could use `exclude = "/ST depression/"`. Notice that, "poST depression" would also be excluded with this option.
Hence, there is another option to exclude exact matching terms accounting for words boundaries: adding "/\b" at the beginning and at the end of the term. For example, if we look for "/\bp osteoarthritis/\b", concepts like "hip osteoarthritis **won't** be excluded.
## Add ancestor
To include the ancestors one level above the identified concepts, we can use the argument `includeAncestor`
```r
knitr::include_graphics("Figures/7.png")
codes <- getCandidateCodes( cdm = cdm, keywords = "Osteoarthritis of knee", includeAncestor = TRUE, domains = "Condition", standardConcept = "Standard", includeDescendants = TRUE, searchInSynonyms = FALSE, searchNonStandard = FALSE, ) codes
We can also pick up codes based on their synonyms. For example, Osteoarthrosis has a synonym of Arthritis.
knitr::include_graphics("Figures/8.png")
getCandidateCodes( cdm = cdm, keywords = "osteoarthrosis", domains = "Condition", searchInSynonyms = TRUE, standardConcept = "Standard", includeDescendants = FALSE, searchNonStandard = FALSE, includeAncestor = FALSE )
Notice that if includeDescendants = TRUE, Arthritis descendants will also be included:
knitr::include_graphics("Figures/9.png")
getCandidateCodes( cdm = cdm, keywords = "osteoarthrosis", domains = "Condition", searchInSynonyms = TRUE, standardConcept = "Standard", includeDescendants = TRUE, searchNonStandard = FALSE, includeAncestor = FALSE )
We can also pick up concepts associated with our keyword via non-standard search.
knitr::include_graphics("Figures/10.png")
codes1 <- getCandidateCodes( cdm = cdm, keywords = "Degenerative", domains = "Condition", standardConcept = "Standard", searchNonStandard = TRUE, includeDescendants = FALSE, searchInSynonyms = FALSE, includeAncestor = FALSE ) codes1
Let's take a moment to focus on the standardConcept and searchNonStandard arguments to clarify the difference between them. standardConcept specifies whether we want only standard concepts or also include non-standard concepts in the final candidate codelist. searchNonStandard determines whether we want to search for keywords among non-standard concepts.
In the previous example, since we set standardConcept = "Standard", we retrieved the code for Osteoarthrosis from the non-standard search. However, we did not obtain the non-standard code degenerative arthropathy from the initial search. If we allow non-standard concepts in the final candidate codelist, we would retireve both codes:
knitr::include_graphics("Figures/11.png")
codes2 <- getCandidateCodes( cdm = cdm, keywords = "Degenerative", domains = "Condition", standardConcept = c("Non-standard", "Standard"), searchNonStandard = FALSE, includeDescendants = FALSE, searchInSynonyms = FALSE, includeAncestor = FALSE ) codes2
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.