View source: R/posteriorsize.R
posteriorsize | R Documentation |
posteriorsize
computes the posterior distribution of the
population size based on data collected by Respondent Driven Sampling. The
approach approximates the RDS via the Sequential Sampling model of Gile
(2008). As such, it is referred to as the Sequential Sampling - Population Size Estimate (SS-PSE).
It uses the order of selection of the sample to provide information
on the distribution of network sizes over the population members.
posteriorsize(
s,
s2 = NULL,
previous = NULL,
median.prior.size = NULL,
interval = 10,
burnin = 5000,
maxN = NULL,
K = FALSE,
samplesize = 1000,
quartiles.prior.size = NULL,
mean.prior.size = NULL,
mode.prior.size = NULL,
priorsizedistribution = c("beta", "flat", "nbinom", "pln", "supplied"),
effective.prior.df = 1,
sd.prior.size = NULL,
mode.prior.sample.proportion = NULL,
alpha = NULL,
visibilitydistribution = c("cmp", "nbinom", "pln"),
mean.prior.visibility = NULL,
sd.prior.visibility = NULL,
max.sd.prior.visibility = 4,
df.mean.prior.visibility = 1,
df.sd.prior.visibility = 3,
beta_0.mean.prior = -3,
beta_t.mean.prior = 0,
beta_u.mean.prior = 0,
beta_0.sd.prior = 10,
beta_t.sd.prior = 10,
beta_u.sd.prior = 10,
mem.optimism.prior = NULL,
df.mem.optimism.prior = 5,
mem.scale.prior = 2,
df.mem.scale.prior = 10,
mem.overdispersion = 15,
visibility = TRUE,
type.impute = c("median", "distribution", "mode", "mean"),
Np = 0,
n = NULL,
n2 = NULL,
mu_proposal = 0.1,
nu_proposal = 0.15,
beta_0_proposal = 0.2,
beta_t_proposal = 0.001,
beta_u_proposal = 0.001,
memmu_proposal = 0.1,
memscale_proposal = 0.15,
burnintheta = 500,
burninbeta = 50,
parallel = 1,
parallel.type = "PSOCK",
seed = NULL,
maxbeta = 90,
supplied = list(maxN = maxN),
max.coupons = NULL,
recruit.time = NULL,
recruit.time2 = NULL,
include.tree = TRUE,
unit.scale = FALSE,
optimism = TRUE,
reflect.time = FALSE,
equalize = TRUE,
verbose = FALSE
)
s |
either a vector of integers or an |
s2 |
either a vector of integers or an |
previous |
character; optionally, the name of the variable in |
median.prior.size |
scalar; A hyperparameter being the mode of the prior distribution on the population size. |
interval |
count; the number of proposals between sampled statistics. |
burnin |
count; the number of proposals before any MCMC sampling is done. It typically is set to a fairly large number. |
maxN |
integer; maximum possible population size. By default this is determined from an upper quantile of the prior distribution. |
K |
count; the maximum visibility for an individual. This is usually
calculated as |
samplesize |
count; the number of Monte-Carlo samples to draw to compute the posterior. This is the number returned by the Metropolis-Hastings algorithm.The default is 1000. |
quartiles.prior.size |
vector of length 2; A pair of hyperparameters
being the lower and upper quartiles of the prior distribution on the
population size. For example, |
mean.prior.size |
scalar; A hyperparameter being the mean of the prior distribution on the population size. |
mode.prior.size |
scalar; A hyperparameter being the mode of the prior distribution on the population size. |
priorsizedistribution |
character; the type of parametric distribution
to use for the prior on population size. The options are |
effective.prior.df |
scalar; A hyperparameter being the effective number of samples worth of information represented in the prior distribution on the population size. By default this is 1, but it can be greater (or less!) to allow for different levels of uncertainty. |
sd.prior.size |
scalar; A hyperparameter being the standard deviation of the prior distribution on the population size. |
mode.prior.sample.proportion |
scalar; A hyperparameter being the mode
of the prior distribution on the sample proportion |
alpha |
scalar; A hyperparameter being the first parameter of the beta prior model for the sample proportion. By default this is NULL, meaning that 1 is chosen. it can be any value at least 1 to allow for different levels of uncertainty. |
visibilitydistribution |
count; the parametric distribution to use for the
individual network sizes (i.e., degrees). The options are |
mean.prior.visibility |
scalar; A hyper parameter being the mean visibility for the prior distribution for a randomly chosen person. The prior has this mean. |
sd.prior.visibility |
scalar; A hyper parameter being the standard deviation of the visibility for a randomly chosen person. The prior has this standard deviation. |
max.sd.prior.visibility |
scalar; The maximum allowed value of |
df.mean.prior.visibility |
scalar; A hyper parameter being the degrees-of-freedom of the prior for the mean. This gives the equivalent sample size that would contain the same amount of information inherent in the prior. |
df.sd.prior.visibility |
scalar; A hyper parameter being the degrees-of-freedom of the prior for the standard deviation. This gives the equivalent sample size that would contain the same amount of information inherent in the prior for the standard deviation. |
beta_0.mean.prior |
scalar; A hyper parameter being the mean of the beta_0 parameter distribution in the model for the number of recruits. |
beta_t.mean.prior |
scalar; A hyper parameter being the mean of the beta_t parameter distribution in the model for the number of recruits. This corresponds to the time-to-recruit variable. |
beta_u.mean.prior |
scalar; A hyper parameter being the mean of the beta_u parameter distribution in the model for the number of recruits. This corresponds to the visibility variable. |
beta_0.sd.prior |
scalar; A hyper parameter being the standard deviation of the beta_0 parameter distribution in the model for the number of recruits. |
beta_t.sd.prior |
scalar; A hyper parameter being the standard deviation of the beta_t parameter distribution in the model for the number of recruits. This corresponds to the time-to-recruit variable. |
beta_u.sd.prior |
scalar; A hyper parameter being the standard deviation of the beta_u parameter distribution in the model for the number of recruits. This corresponds to the visibility variable. |
mem.optimism.prior |
scalar; A hyper parameter being the mean of the distribution of the optimism parameter. |
df.mem.optimism.prior |
scalar; A hyper parameter being the degrees-of-freedom of the prior for the optimism parameter. This gives the equivalent sample size that would contain the same amount of information inherent in the prior. |
mem.scale.prior |
scalar; A hyper parameter being the scale of the concentration of baseline negative binomial measurement error model. |
df.mem.scale.prior |
scalar; A hyper parameter being the degrees-of-freedom of the prior for the standard deviation of the dispersion parameter in the visibility model. This gives the equivalent sample size that would contain the same amount of information inherent in the prior for the standard deviation. |
mem.overdispersion |
scalar; A parameter being the overdispersion of the negative binomial distribution that is the baseline for the measurement error model. |
visibility |
logical; Indicate if the measurement error model
is to be used, whereby latent visibilities are used in place of the reported
network sizes as the unit size variable. If |
type.impute |
The type of imputation to use for the summary visibilities
(returned in the component |
Np |
integer; The overall visibility distribution is a mixture of the
|
n |
integer; the number of people in the sample. This is usually computed from
|
n2 |
integer; If |
mu_proposal |
scalar; The standard deviation of the proposal distribution for the mean visibility. |
nu_proposal |
scalar; The standard deviation of the proposal distribution for the CMP scale parameter that determines the standard deviation of the visibility. |
beta_0_proposal |
scalar; The standard deviation of the proposal distribution for the beta_0 parameter of the recruit model. |
beta_t_proposal |
scalar; The standard deviation of the proposal distribution for the beta_t parameter of the recruit model. This corresponds to the visibility variable. |
beta_u_proposal |
scalar; The standard deviation of the proposal distribution for the beta_u parameter of the recruit model. This corresponds to the time-to-recruit variable. |
memmu_proposal |
scalar; The standard deviation of the proposal distribution for the log of the optimism parameter (that is, gamma). |
memscale_proposal |
scalar; The standard deviation of the proposal distribution for the log of the s.d. in the optimism model. |
burnintheta |
count; the number of proposals in the Metropolis-Hastings
sub-step for the visibility distribution parameters ( |
burninbeta |
count; the number of proposals in the Metropolis-Hastings
sub-step for the visibility distribution parameters ( |
parallel |
count; the number of parallel processes to run for the Monte-Carlo sample. This uses MPI or PSOCK. The default is 1, that is not to use parallel processing. |
parallel.type |
The type of parallel processing to use. The options are "PSOCK" or "MPI". This requires the corresponding type to be installed. The default is "PSOCK". |
seed |
integer; random number integer seed. Defaults to |
maxbeta |
scalar; The maximum allowed value of the |
supplied |
list; If supplied, is a list with components |
max.coupons |
The number of recruitment coupons distributed to each enrolled subject (i.e. the maximum number of recruitees for any subject). By default it is taken by the attribute or data, else the maximum recorded number of coupons. |
recruit.time |
vector; An optional value for the data/time that the person was interviewed. It needs to resolve as a numeric vector with number of elements the number of rows of the data with non-missing values of the network variable. If it is a character name of a variable in the data then that variable is used. If it is NULL then the sequence number of the recruit in the data is used. If it is NA then the recruitment is not used in the model. Otherwise, the recruitment time is used in the model to better predict the visibility of the person. |
recruit.time2 |
vector; An optional value for the data/time that the person in the second RDS survey was interviewed. It needs to resolve as a numeric vector with number of elements the number of rows of the data with non-missing values of the network variable. If it is a character name of a variable in the data then that variable is used. If it is NULL, the default, then the sequence number of the recruit in the data is used. If it is NA then the recruitment is not used in the model. Otherwise, the recruitment time is used in the model to better predict the visibility of the person. |
include.tree |
logical; If |
unit.scale |
numeric; If not |
optimism |
logical; If |
reflect.time |
logical; If |
equalize |
logical; If |
verbose |
logical; if this is |
posteriorsize
returns a list consisting of the
following elements:
pop |
vector; The final posterior draw for the
degrees of the population. The first |
K |
count; the maximum visibility for an individual. This is usually calculated as twice the maximum observed degree. |
n |
count; the sample size. |
samplesize |
count; the number of Monte-Carlo samples to draw to compute the posterior. This is the number returned by the Metropolis-Hastings algorithm.The default is 1000. |
burnin |
count; the number of proposals before any MCMC sampling is done. It typically is set to a fairly large number. |
interval |
count; the number of proposals between sampled statistics. |
mu |
scalar; The
hyper parameter |
sigma |
scalar; The hyper parameter |
df.mean.prior.visibility |
scalar; A hyper parameter being the degrees-of-freedom of the prior for the mean. This gives the equivalent sample size that would contain the same amount of information inherent in the prior. |
df.sd.prior.visibility |
scalar; A hyper parameter being the degrees-of-freedom of the prior for the standard deviation. This gives the equivalent sample size that would contain the same amount of information inherent in the prior for the standard deviation. |
Np |
integer; The
overall visibility distribution is a mixture of the |
mu_proposal |
scalar; The standard deviation of the proposal distribution for the mean visibility. |
nu_proposal |
scalar; The standard deviation of the proposal distribution for the CMP scale parameter of the visibility distribution. |
N |
vector of length 5; summary statistics for the posterior population size.
|
maxN |
integer; maximum possible population size. By default this is determined from an upper quantile of the prior distribution. |
sample |
matrix of dimension
|
vsample |
matrix of dimension |
lpriorm |
vector; the vector of (log) prior
probabilities on each value of |
burnintheta |
count; the number of
proposals in the Metropolis-Hastings sub-step for the visibility distribution
parameters ( |
verbose |
logical; if this is
|
predictive.visibility.count |
vector; a vector
of length the maximum visibility ( |
predictive.visibility |
vector; a vector of length the maximum visibility
( |
MAP |
vector of length 6
of MAP estimates corresponding to the output
|
mode.prior.sample.proportion |
scalar; A
hyperparameter being the mode of the prior distribution on the sample
proportion |
median.prior.size |
scalar; A hyperparameter being the mode of the prior distribution on the population size. |
mode.prior.size |
scalar; A hyperparameter being the mode of the prior distribution on the population size. |
mean.prior.size |
scalar; A hyperparameter being the mean of the prior distribution on the population size. |
quartiles.prior.size |
vector of length 2; A pair of hyperparameters being the lower and upper quartiles of the prior distribution on the population size. |
visibilitydistribution |
count; the
parametric distribution to use for the individual network sizes (i.e.,
visibilities). The options are |
priorsizedistribution |
character; the type of parametric distribution
to use for the prior on population size. The options are |
The best way to specify the prior is via the
hyperparameter mode.prior.size
which specifies the mode of the prior
distribution on the population size. You can alternatively specify the
hyperparameter median.prior.size
which specifies the median of the
prior distribution on the population size, or mean.prior.sample
proportion
which specifies the mean of the prior distribution on the
proportion of the population size in the sample or mode.prior.sample
proportion
which specifies the mode of the prior distribution on the
proportion of the population size in the sample. Finally, you can specify
quartiles.prior.size
as a vector of length 2 being the pair of lower
and upper quartiles of the prior distribution on the population size.
Gile, Krista J. (2008) Inference from Partially-Observed Network Data, Ph.D. Thesis, Department of Statistics, University of Washington.
Gile, Krista J. and Handcock, Mark S. (2010) Respondent-Driven Sampling: An Assessment of Current Methodology, Sociological Methodology 40, 285-327.
Gile, Krista J. and Handcock, Mark S. (2014) sspse: Estimating Hidden Population Size using Respondent Driven Sampling Data R package, Los Angeles, CA. Version 0.5, https://hpmrg.org/sspse/.
Handcock MS (2003). degreenet: Models for Skewed Count Distributions Relevant to Networks. Statnet Project, Seattle, WA. Version 1.2, https://statnet.org/.
Handcock, Mark S., Gile, Krista J. and Mar, Corinne M. (2014) Estimating Hidden Population Size using Respondent-Driven Sampling Data, Electronic Journal of Statistics, 8, 1, 1491-1521
Handcock, Mark S., Gile, Krista J. and Mar, Corinne M. (2015) Estimating the Size of Populations at High Risk for HIV using Respondent-Driven Sampling Data, Biometrics.
network, statnet, degreenet
data(fauxmadrona)
# Here interval=1 so that it will run faster. It should be higher in a
# real application.
fit <- posteriorsize(fauxmadrona, median.prior.size=1000,
burnin=20, interval=1, samplesize=100)
summary(fit)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.