sampleCohortDefinitionSet | R Documentation |
Create 1 or more sample of size n of a cohort definition set
Subsetted cohorts can be sampled, as with any other subset form. However, subsetting a sampled cohort is not recommended and not currently supported at this time. In the case where n > cohort count the entire cohort is copied unmodified
As different databases have different forms of randomness, the random selection is computed in R, based on the count for each cohort. This is, therefore, db platform independent
Note, this function assumes cohorts have already been generated.
Lifecycle Note: This functionality is considered experimental and not intended for use inside analytic packages
sampleCohortDefinitionSet(
cohortDefinitionSet,
cohortIds = cohortDefinitionSet$cohortId,
connectionDetails = NULL,
connection = NULL,
tempEmulationSchema = getOption("sqlRenderTempEmulationSchema"),
cohortDatabaseSchema,
outputDatabaseSchema = cohortDatabaseSchema,
cohortTableNames = getCohortTableNames(),
n = NULL,
sampleFraction = NULL,
seed = 64374,
seedArgs = NULL,
identifierExpression = "cohortId * 1000 + seed",
incremental = FALSE,
incrementalFolder = NULL
)
cohortDefinitionSet |
The
Optionally, this data frame may contain:
|
cohortIds |
Optional subset of cohortIds to generate. By default this function will sample all cohorts |
connectionDetails |
An object of type |
connection |
An object of type |
tempEmulationSchema |
Some database platforms like Oracle and Impala do not truly support temp tables. To emulate temp tables, provide a schema with write privileges where temp tables can be created. |
cohortDatabaseSchema |
Schema name where your cohort tables reside. Note that for SQL Server, this should include both the database and schema name, for example 'scratch.dbo'. |
outputDatabaseSchema |
optional schema to output cohorts to (if different from cohortDatabaseSchema) |
cohortTableNames |
The names of the cohort tables. See |
n |
Sample size. Ignored if sample fraction is set |
sampleFraction |
Fraction of cohort to sample |
seed |
Vector of seeds to give to the R pseudorandom number generator |
seedArgs |
optional arguments to pass to set.seed |
identifierExpression |
Optional string R expression used to compute output cohort id. Can only use variables cohortId and seed. Default is "cohortId * 1000 + seed", which is substituted and evaluated |
incremental |
Create only cohorts that haven't been created before? |
incrementalFolder |
If |
sampledCohortDefinitionSet - a data.frame like object that contains the resulting identifiers and modified names of cohorts
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.