Description Usage Arguments Details Value References See Also Examples
Point prevalence at a specific index date is estimated using contributions to prevalence from both available registry data, and from Monte Carlo simulations of the incidence and survival process, as outlined by Crouch et al (2004) (see References).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21  prevalence(
index,
num_years_to_estimate,
data,
inc_formula = NULL,
inc_model = NULL,
surv_formula = NULL,
surv_model = NULL,
registry_start_date = NULL,
death_column = NULL,
incident_column = NULL,
age_column = "age",
age_dead = 100,
status_column = "status",
N_boot = 1000,
population_size = NULL,
proportion = 1e+05,
level = 0.95,
dist = c("exponential", "weibull", "lognormal"),
precision = 2
)

index 
The date at which to estimate point prevalence as a string in the format YYYYMMDD. 
num_years_to_estimate 
Number of years of data to consider when
estimating point prevalence; multiple values can be specified in a vector.
If any values are greater than the number of years of registry data
available before 
data 
A data frame with the corresponding column names provided in

inc_formula 
A formula specifying the columns used in the incidence process.
The LHS should be the name of the column holding the incident dates,
with the RHS specifying any variables that should be stratified by, or 1 if no
stratification. For example, with the supplied

inc_model 
An object that has a 
surv_formula 
A formula used to specify a survival model, where the
LHS a Surv object, as used by 
surv_model 
An object that has a 
registry_start_date 
The starting date of the registry. If not supplied then defaults to the earliest incidence date in the supplied data set. 
death_column 
A string providing the name of the column which holds the death date information. If not provided then prevalence cannot be counted and estimates will be solely derived from simulation. 
incident_column 
A string providing the name of the column which holds the diagnosis
date. If not provided either in this argument or in 
age_column 
A string providing the name of the column that holds patient age. If provided
then patients alive at 
age_dead 
The age at which patients are set to be dead if they are still alive, to prevent
'immortal' patients. Used in conjunction with 
status_column 
A string providing the name of the column that holds patient event status at
the event time. If not provided in 
N_boot 
Number of bootstrapped calculations to perform. 
population_size 
Integer corresponding to the size of the population at risk. 
proportion 
The population ratio to estimate prevalence for. 
level 
Double representing the desired confidence interval width. 
dist 
The distribution used by the default parametric survival model. 
precision 
Integer representing the number of decimal places required. 
The most important parameter is num_years_to_estimate
, which governs
the number of previous years of data to use when estimating the prevalence at
the index date. If this parameter is greater than the number of years of
known incident cases available in the supplied registry data (specified with
argument num_registry_years
), then the remaining
num_years_to_estimate  num_registry_years
years of incident data will
be simulated using Monte Carlo simulation.
The larger num_years_to_estimate
, the more accurate the prevalence
estimate will be, provided an adequate survival model can be fitted to the
registry data. It is therefore important to provide as much clean registry
data as possible.
Prevalence arises from two stochastic processes: incidence and survival. This is reflected in the function arguments by multiple options for each of these processes.
The incidence process is specified by an object
that has an associated draw_incident_population
method, which produces the new
incident population. The default implementation is a homogeneous Poisson process,
whereby interarrival times are distributed according to an exponential distribution.
The inc_formula
argument specifies the nature of this process, see the
description for more details. See the vignette for guidance on providing a custom incidence
object.
The survival process is characterised by a method predict_survival_probability
,
that estimates the probability of a given individual being alive at the index date.
The default object is a parametric distribution with the functional form being specified
in surv_formula
and distribution given in dist
. See the vignette for guidance
on providing a custom survival model.
A prevalence
object containing the following attributes:
estimates 
Prevalence estimates at the specified years as both absolute and rates. 
simulated 
A 
counted 
The number of incident cases present in the registry data set. 
full_surv_model 
The survival model built on the complete registry data set. 
full_inc_model 
The incidence model built on the complete registry data set. 
surv_models 
A list of the survival models fitted to each bootstrap iteration. 
inc_models 
A list of the incidence models fitted to each bootstrap iteration. 
index_date 
The index date. 
est_years 
The years at which prevalence is estimated at. 
counted_incidence_rate 
The overall incidence rate in the registry data set. 
registry_start 
The date the registry was identified at starting at. 
proportion 
The denominator to use for estimating prevalence rates. 
status_col 
The column in the registry data containing the survival status. 
N_boot 
The number of bootstrap iterations that were run. 
means 
Covariate means, used when plotting KaplanMeier estimators using 
max_event_time 
The maximum timetoevent in the registry data. Again, used in

pval 
The pvalue resulting from a hypothesis test on the difference between the simulated and counted prevalence on the timespan covered by the registry. Tests the prevalence fit; if a significant result is found then further diagnostics are required. 
Crouch, Simon, et al. "Determining disease prevalence from incidence and survival using simulation techniques." Cancer epidemiology 38.2 (2014): 193199.
Other prevalence functions:
test_prevalence_fit()
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.