View source: R/SPARRAfairness_simulation.R
sim_pop_data | R Documentation |
Simulates population data with a reasonably realistic joint distribution
sim_pop_data(
npop,
coef_adjust = 4,
offset = 1,
vcor = NULL,
coefs = c(2, 1, 0, 5, 3, 0, 0),
seed = 12345,
incl_id = TRUE,
incl_reason = TRUE
)
npop |
population size |
coef_adjust |
inverse scale for all (true) coefficients (default 4): lower means that hospital admissions are more predictable from covariates. |
offset |
offset for logistic model (default 1): higher means a lower overall prevalence of admission |
vcor |
a valid 5x5 correlation matrix (default NULL), giving correlation between variables. If 'NULL', values roughly represents realistic data. |
coefs |
coefficients of age, male sex, non-white ethnicity, number of previous admissions, and deprivation decile on hospital admissions, Default (2,1,0,5,3). Divided through by coef_adjust. |
seed |
random seed (default 12345) |
incl_id |
include an ID column (default TRUE) |
incl_reason |
include a column indicating reason for admission. |
Simulates data for a range of people for the variables
Age (age
)
Sex (sexM
; 1 if male)
Race/ethnicity (raceNW
: 1 if non-white ethnicity)
Number of previous hospital admissions (PrevAdm
)
Deprivation decile (SIMD
: 1 most deprived, 10 least deprived. NOTE - opposite to English IMD)
Urban-rural residence status (urban_rural
: 1 for rural)
Mainland-island residence status (mainland_island
: 1 for island)
Hospital admission (target
: 1/TRUE if admitted to hospital in year following prediction date)
Can optionally add an ID column.
Optionally includes an admission reason for samples with target=1
. These admission reasons
roughly correspond to the first letters of ICD10 categories, and can either correspond to an
admission or death. Admission reasons are simulated with a non-constant multinomial distribution
which varies across age/sex/ethnicity/urban-rural/mainland-island/PrevAdm values in a randomly-
chosen way. The distributions of admission reasons are not however chosen to reflect real
distributions, nor are systematic changes in commonality of admission types across categories
intended to appear realistic.
data frame with realistic values.
# Simulate data
dat=sim_pop_data(10000)
cor(dat[,1:7])
# See vignette
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.