| orm_extract | R Documentation |
orm_extract() scans the title, abstract, and keywords of each
record against the active risk dictionary and builds a binary presence
matrix (record x risk category). It also detects whether each study
contains direct worker exposure data - the key signal for computing the
WRDI indicator.
Matching is case-insensitive and uses whole-word boundary detection to avoid false positives (e.g. "laser" does not match "eyelaser").
orm_extract(
refs,
dict = orm_dict(),
fields = c("title", "abstract", "keywords"),
lang = getOption("orisma.lang", "en"),
verbose = getOption("orisma.verbose", TRUE)
)
refs |
An |
dict |
An |
fields |
Character vector. Which text fields to search. Default
|
lang |
Character. |
verbose |
Logical. Print progress? |
A list (class orisma_matrix) containing:
refsOriginal orisma_refs tibble with added columns:
one binary column per risk category (cat_*), n_categories (total
categories matched), and has_worker_data (logical).
matrixPure binary matrix (records x categories) for downstream analysis.
dictThe dictionary used.
categoriesCategory metadata tibble.
## Not run:
refs <- orm_load("my_references/")
deduped <- orm_dedup(refs)
# Use default dictionary
mx <- orm_extract(deduped)
# Use a customised dictionary
dict <- orm_dict()
dict <- orm_dict_add_terms(dict, "nanoparticles", c("nano-dust", "UFP"))
mx <- orm_extract(deduped, dict = dict)
# Restrict to title + abstract only
mx <- orm_extract(deduped, fields = c("title", "abstract"))
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.