getDbCohortBasedCovariatesData: Get covariate information from the database based on other...
In FeatureExtraction: Generating Features for a Cohort

View source: R/GetCovariatesFromOtherCohorts.R

getDbCohortBasedCovariatesData

R Documentation

Get covariate information from the database based on other cohorts

Description

Constructs covariates using other cohorts.

Usage

getDbCohortBasedCovariatesData(
  connection,
  oracleTempSchema = NULL,
  cdmDatabaseSchema,
  cohortTable = "#cohort_person",
  cohortId = -1,
  cohortIds = c(-1),
  cdmVersion = "5",
  rowIdField = "subject_id",
  covariateSettings,
  targetDatabaseSchema = NULL,
  targetCovariateTable = NULL,
  targetCovariateContinuousTable = NULL,
  targetCovariateRefTable = NULL,
  targetAnalysisRefTable = NULL,
  targetTimeRefTable = NULL,
  aggregated = FALSE,
  minCharacterizationMean = 0,
  minCharacterizationCount = 0,
  tempEmulationSchema = getOption("sqlRenderTempEmulationSchema")
)

Arguments

`connection`	A connection to the server containing the schema as created using the `connect` function in the `DatabaseConnector` package.
`oracleTempSchema`	DEPRECATED: use `tempEmulationSchema` instead.
`cdmDatabaseSchema`	The name of the database schema that contains the OMOP CDM instance. Requires read permissions to this database. On SQL Server, this should specifiy both the database and the schema, so for example 'cdm_instance.dbo'.
`cohortTable`	Name of the table holding the cohort for which we want to construct covariates. If it is a temp table, the name should have a hash prefix, e.g. '#temp_table'. If it is a non-temp table, it should include the database schema, e.g. 'cdm_database.cohort'.
`cohortId`	DEPRECATED:For which cohort ID should covariates be constructed? If set to -1, covariates will be constructed for all cohorts in the specified cohort table.
`cohortIds`	For which cohort ID(s) should covariates be constructed? If set to c(-1), covariates will be constructed for all cohorts in the specified cohort table.
`cdmVersion`	The version of the Common Data Model used. Currently only `cdmVersion = "5"` is supported.
`rowIdField`	The name of the field in the cohort temp table that is to be used as the row_id field in the output table. This can be especially usefull if there is more than one period per person.
`covariateSettings`	An object of type `covariateSettings` as created using the `createCohortBasedCovariateSettings` or `createCohortBasedTemporalCovariateSettings` functions.
`targetDatabaseSchema`	(Optional) The name of the database schema where the resulting covariates should be stored. If not provided, results will be fetched to R.
`targetCovariateTable`	(Optional) The name of the table where the resulting covariates will be stored. If not provided, results will be fetched to R. The table can be a permanent table in the `targetDatabaseSchema` or a temp table. If it is a temp table, do not specify `targetDatabaseSchema`.
`targetCovariateContinuousTable`	(Optional) The name of the table where the resulting continuous covariates should be stored.
`targetCovariateRefTable`	(Optional) The name of the table where the covariate reference will be stored.
`targetAnalysisRefTable`	(Optional) The name of the table where the analysis reference will be stored.
`targetTimeRefTable`	(Optional) The name of the table for the time reference
`aggregated`	Should aggregate statistics be computed instead of covariates per cohort entry?
`minCharacterizationMean`	The minimum mean value for binary characterization output. Values below this will be cut off from output. This will help reduce the file size of the characterization output, but will remove information on covariates that have very low values. The default is 0.
`minCharacterizationCount`	The minimum count value for binary characterization output. Values below this will be cut off from output. This will help reduce the file size of the characterization output, but will remove information on covariates that occur in very few cohort entries. The default is 0.
`tempEmulationSchema`	Some database platforms like Oracle and Impala do not truly support temp tables. To emulate temp tables, provide a schema with write privileges where temp tables can be created.

Details

This function uses the data in the CDM to construct a large set of covariates for the provided cohort. The cohort is assumed to be in an existing temp table with these fields: 'subject_id', 'cohort_definition_id', 'cohort_start_date'. Optionally, an extra field can be added containing the unique identifier that will be used as rowID in the output. Typically, users don't call this function directly but rather use the getDbCovariateData function instead.

Value

Returns an object of type CovariateData, which is an Andromeda object containing information on the baseline covariates. Information about multiple outcomes can be captured at once for efficiency reasons. This object is a list with the following components:

covariates: An ffdf object listing the baseline covariates per person in the cohorts. This is done using a sparse representation: covariates with a value of 0 are omitted to save space. The covariates object will have three columns: rowId, covariateId, and covariateValue. The rowId is usually equal to the person_id, unless specified otherwise in the rowIdField argument.
covariateRef: A table describing the covariates that have been extracted.

. The CovariateData object will also have a metaData attribute, a list of objects with information on how the covariateData object was constructed.

FeatureExtraction documentation built on June 26, 2026, 5:07 p.m.