createTritonCovariateSettings: createTritonCovariateSettings
In mi-erasmusmc/Triton: Generating Text Representation Features for a Cohort in the OMOP CDM

Description Usage Arguments Value

View source: R/CreateTritonSettings.R

Create a covariateSettings object for constructing text representation (Triton) covariates from the notes table in the OMOP CDM. Possible representations: text statistics(TextStats), and Bag-of-Words(BoW)(binary,frequency,TFIDF) and Topic Models(TopicModel), and averaged embeddings(DocEmb) using trained models.

createTritonCovariateSettings(
  useNoteData = TRUE,
  startDay = -30,
  endDay = 0,
  idrange = NULL,
  parallel = FALSE,
  analysisId = 999,
  note_databaseschema = NULL,
  note_tablename = "note",
  note_customWhere = "",
  pipe_preprocess_function = NULL,
  pipe_tokenizer_function = "word",
  pipe_ngrams = 1,
  pipe_saveVocab = FALSE,
  pipe_outputFolder = NULL,
  filter_stopwords = NULL,
  filter_custom_regex = NULL,
  filter_vocab_term_max = NULL,
  filter_term_count_min = NULL,
  filter_term_count_max = NULL,
  filter_doc_count_min = NULL,
  filter_doc_count_max = NULL,
  filter_doc_proportion_max = NULL,
  filter_doc_proportion_min = NULL,
  representations = c("TextStats"),
  BoW_type = c("binary"),
  BoW_validationVarImpTable = NULL,
  DocEmb_word_embeddings = NULL,
  TopicModel_type = c("lsa"),
  TopicModel_model = NULL,
  covariateDataSave = "",
  covariateDataLoad = ""
)

`startDay`	integer; start day before the index date for with the text representations have to be computed. Default is `-30`.
`endDay`	integer; end day before the index date for with the text representations have to be computed. Default is `0`.
`idrange`	(optional) integer vector; specifying the range of integers that can be used to generate the covariateids, max is 2147482. Default is `c(1,2147482)`.
`parallel`	logical; to indicate whether multi-threading should be used (Not on Windows). Default is `False`.
`note_databaseschema`	character; database schema other than the one passed through FeatureExtraction. Default is `NULL`.
`note_tablename`	character; note table name, provide if different than OMOP cdm default. Default is "note".
`note_customWhere`	(optional) character; with a SQL where statement to filter the note import. Example "WHERE note_source_value='communication'". Default is `""`.
`pipe_preprocess_function`	function; to preprocess the stings before tokenization. Default is `tolower`.
`pipe_tokenizer_function`	character or function; to tokenize the strings. Default is quanteda tokenizer (`tokens`), with argument "word". Other possible arguments are "fasterword", "fastestword", "sentence", and "character". It is possible to provide a custom tokenizer function. This function should take the document strings as input and should return a list of character vectors (tokens).
`pipe_ngrams`	integer vector; specifying the number of elements to be concatenated in each ngram. For example: `c(1,2)` creates all unigrams and bigrams; `c(1:3)` creats all unigrams, bigrams, and trigrams. Default is 1: no ngrams (unigram).
`pipe_saveVocab`	logical; option to save the generated vocabulary as rds file in the outputFolder. Default is `False`.
`pipe_outputFolder`	(optional) character; file path and name for saving output files. Default is `NULL`.
`filter_stopwords`	character vector; of list of stopwords that will be removed. Default is `NULL` See `stopwords` for generating stopwords.
`filter_custom_regex`	(optional) character; regular expression (regex) that selects tokens that will be removed. Default is `NULL`.
`filter_vocab_term_max`	integer; maximum number of terms in vocabulary, takes top most frequent terms. Default is `NULL`.
`filter_term_count_min`	integer; minimum number of occurences over all documents. Default is `NULL`.
`filter_term_count_max`	integer; maximum number of occurences over all documents. Default is `NULL`.
`filter_doc_count_min`	integer; term will be kept when number of documents that contain this term is larger than this value. Default is `NULL`.
`filter_doc_count_max`	integer; term will be kept when number of documents that contain this term is lower than this value. Default is `NULL`.
`filter_doc_proportion_max`	numeric; maximum proportion (0.-1.) of documents which should contain term. Default is `NULL`.
`filter_doc_proportion_min`	numeric; minimum proportion (0.-1.) of documents which should contain term. Default is `NULL`.
`representations`	character vector; of text representations that should be constructed, chose from `"TextStats"`(default), `"BoW"`, `"TopicModel"`, and `"DocEmb"`. Multiple representations can be constructed at once: `c("BoW","TextStats")`.
`BoW_type`	character vector; of BoW types to be constructed, chose from `"binary"`(default), `"frequency"`, and `"tfidf"`. Multiple types can be constructed at once: `c("binary","frequency")`.
`BoW_validationVarImpTable`	(optional) data.frame; used for validation of a model with bag-of-word covariates. A varImp data.frame with the covariate names and covariate values of a trained model. The varImp data.frame can be found in plpResult$model$varImp or plpModel$varImp.
`DocEmb_word_embeddings`	(optional) character; of a data.frame loaded in the R environment that contains the word embeddings. First column must contain the word, the other n-1 columns contain the embedding values.
`TopicModel_type`	character vector; todo.
`TopicModel_model`	character; name of a topic model object loaded in the R environment.
`covariateDataSave`	(optional) character; location and file name of where the created covariateData must be stored.
`covariateDataLoad`	(optional) character; location and file name of where the created covariateData must be loaded from. Anything else is ignored, just the covariateData is loaded and returned.
`useTextData`	logical; option to disable the creation of text representation covariates. Default is `True`.

covariateSettings object, that can be used by the OHDSI FeatureExtraction package.

mi-erasmusmc/Triton documentation built on Feb. 15, 2022, 10:37 a.m.

mi-erasmusmc/Triton index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

mi-erasmusmc/Triton
Generating Text Representation Features for a Cohort in the OMOP CDM

createTritonCovariateSettings: createTritonCovariateSettings
In mi-erasmusmc/Triton: Generating Text Representation Features for a Cohort in the OMOP CDM

Description

Usage

Arguments

Value

Related to createTritonCovariateSettings in mi-erasmusmc/Triton...

R Package Documentation

Browse R Packages

We want your feedback!

mi-erasmusmc/Triton Generating Text Representation Features for a Cohort in the OMOP CDM

createTritonCovariateSettings: createTritonCovariateSettings In mi-erasmusmc/Triton: Generating Text Representation Features for a Cohort in the OMOP CDM

Description

Usage

Arguments

Value

Related to createTritonCovariateSettings in mi-erasmusmc/Triton...

R Package Documentation

Browse R Packages

We want your feedback!

mi-erasmusmc/Triton
Generating Text Representation Features for a Cohort in the OMOP CDM

createTritonCovariateSettings: createTritonCovariateSettings
In mi-erasmusmc/Triton: Generating Text Representation Features for a Cohort in the OMOP CDM