batch_processor_db: Batch NLP Annotations for a Cohort

Description Usage Arguments

View source: R/text_processing.R

Description

NLP annotates documents for a cohort of patients, in parallel. Locks each record before proceeding with NLP annotations.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
batch_processor_db(
  patient_vect,
  text_format,
  nlp_engine,
  URL,
  negex_simp,
  umls_selected,
  uri_fun,
  user,
  password,
  host,
  port,
  database,
  max_n_grams_length,
  negex_depth,
  select_cores
)

Arguments

patient_vect

Vector of patient ID's.

text_format

Text format.

nlp_engine

NLP engine, UDPipe only for now.

URL

UDPipe model URL.

negex_simp

Simplifed negex.

umls_selected

Processed UMLS table.

uri_fun

Uniform resource identifier (URI) string generating function for MongoDB credentials.

user

MongoDB user name.

password

MongoDB user password.

host

MongoDB host server.

port

MongoDB port.

database

MongoDB database name.

max_n_grams_length

Maximum length of tokens for matching with UMLS concept unique identifiers (CUI's). Shorter values will result in faster processing. If ) is chosen, UMLS CUI tags will not be provided.

negex_depth

Maximum distance between negation item and token to negate. Shorter distances will result in decreased sensitivity but increased specificity for negation.

select_cores

How many CPU cores should be used for parallel processing? Max allowed is total number of cores minus one. If 1 is entered, parallel processing will not be used.


CEDARS documentation built on Feb. 7, 2021, 5:06 p.m.