orm_extract: Extract risk categories from bibliographic records
In orisma: Occupational Risk Integrated Systematic Mapping and Analysis

orm_extract

R Documentation

Extract risk categories from bibliographic records

Description

orm_extract() scans the title, abstract, and keywords of each record against the active risk dictionary and builds a binary presence matrix (record x risk category). It also detects whether each study contains direct worker exposure data - the key signal for computing the WRDI indicator.

Matching is case-insensitive and uses whole-word boundary detection to avoid false positives (e.g. "laser" does not match "eyelaser").

Usage

orm_extract(
  refs,
  dict = orm_dict(),
  fields = c("title", "abstract", "keywords"),
  lang = getOption("orisma.lang", "en"),
  verbose = getOption("orisma.verbose", TRUE)
)

Arguments

`refs`	An `orisma_refs` object (output of `orm_load()` or `orm_dedup()`).
`dict`	An `orisma_dict` object. Default: `orm_dict()` (ISO 45001 / INSST / NIOSH).
`fields`	Character vector. Which text fields to search. Default `c("title", "abstract", "keywords")`.
`lang`	Character. `"en"` or `"es"`.
`verbose`	Logical. Print progress?

Value

A list (class orisma_matrix) containing:

refs: Original orisma_refs tibble with added columns: one binary column per risk category (⁠cat_*⁠), n_categories (total categories matched), and has_worker_data (logical).
matrix: Pure binary matrix (records x categories) for downstream analysis.
dict: The dictionary used.
categories: Category metadata tibble.

Examples

## Not run: 
refs   <- orm_load("my_references/")
deduped <- orm_dedup(refs)

# Use default dictionary
mx <- orm_extract(deduped)

# Use a customised dictionary
dict <- orm_dict()
dict <- orm_dict_add_terms(dict, "nanoparticles", c("nano-dust", "UFP"))
mx   <- orm_extract(deduped, dict = dict)

# Restrict to title + abstract only
mx <- orm_extract(deduped, fields = c("title", "abstract"))

## End(Not run)

orisma documentation built on May 19, 2026, 1:07 a.m.