Estimating hidden population size using RDS data
Description
posteriorsize
computes the posterior distribution of the
population size based on data collected by Respondent Driven Sampling. The
approach approximates the RDS via the Sequential Sampling model of Gile
(2008). As such, it is referred to as the Sequential Sampling  Population Size Estimate (SSPSE).
It uses the order of selection of the sample to provide information
on the distribution of network sizes over the population members.
Usage
1 2 3 4 5 6 7 8 9 10 11 12  posteriorsize(s, median.prior.size = NULL, interval = 10, burnin = 5000,
maxN = NULL, K = max(s, na.rm = TRUE), samplesize = 1000,
quartiles.prior.size = NULL, mean.prior.size = NULL,
mode.prior.size = NULL, priorsizedistribution = c("beta", "flat",
"nbinom", "pln", "supplied"), effective.prior.df = 1,
sd.prior.size = NULL, mode.prior.sample.proportion = NULL, alpha = NULL,
degreedistribution = c("cmp", "nbinom", "pln"), mean.prior.degree = NULL,
sd.prior.degree = NULL, max.sd.prior.degree = 4, df.mean.prior = 1,
df.sd.prior = 3, Np = 0, nk = NULL, n = length(s), muproposal = 0.1,
sigmaproposal = 0.15, burnintheta = 500, parallel = 1,
parallel.type = "MPI", seed = NULL, maxbeta = 120, dispersion = 0,
supplied = list(maxN = maxN), verbose = TRUE)

Arguments
s 
vector of integers; the vector of degrees from the RDS in order they are recorded. 
median.prior.size 
scalar; A hyperparameter being the mode of the prior distribution on the population size. 
interval 
count; the number of proposals between sampled statistics. 
burnin 
count; the number of proposals before any MCMC sampling is done. It typically is set to a fairly large number. 
maxN 
integer; maximum possible population size. By default this is determined from an upper quantile of the prior distribution. 
K 
count; the maximum degree for an individual. This is usually
calculated as 
samplesize 
count; the number of MonteCarlo samples to draw to compute the posterior. This is the number returned by the MetropolisHastings algorithm.The default is 1000. 
quartiles.prior.size 
vector of length 2; A pair of hyperparameters
being the lower and upper quartiles of the prior distribution on the
population size. For example, 
mean.prior.size 
scalar; A hyperparameter being the mean of the prior distribution on the population size. 
mode.prior.size 
scalar; A hyperparameter being the mode of the prior distribution on the population size. 
priorsizedistribution 
character; the type of parametric distribution
to use for the prior on population size. The options are 
effective.prior.df 
scalar; A hyperparameter being the effective number of samples worth of information represented in the prior distribution on the population size. By default this is 1, but it can be greater (or less!) to allow for different levels of uncertainty. 
sd.prior.size 
scalar; A hyperparameter being the standard deviation of the prior distribution on the population size. 
mode.prior.sample.proportion 
scalar; A hyperparameter being the mode of the prior distribution on the sample proportion n/N. 
alpha 
scalar; A hyperparameter being the first parameter of the beta prior model for the sample proportion. By default this is NULL, meaning that 1 is chosen. it can be any value at least 1 to allow for different levels of uncertainty. 
degreedistribution 
count; the parametric distribution to use for the
individual network sizes (i.e., degrees). The options are 
mean.prior.degree 
scalar; A hyper parameter being the mean degree for the prior distribution for a randomly chosen person. The prior has this mean. 
sd.prior.degree 
scalar; A hyper parameter being the standard deviation of the degree for a randomly chosen person. The prior has this standard deviation. 
max.sd.prior.degree 
scalar; The maximum allowed value of 
df.mean.prior 
scalar; A hyper parameter being the degreesoffreedom of the prior for the mean. This gives the equivalent sample size that would contain the same amount of information inherent in the prior. 
df.sd.prior 
scalar; A hyper parameter being the degreesoffreedom of the prior for the standard deviation. This gives the equivalent sample size that would contain the same amount of information inherent in the prior for the standard deviation. 
Np 
integer; The overall degree distribution is a mixture of the

nk 
vector; the vector of counts for the number of people in the
sample with degree k. This is usually computed from s automatically as

n 
vector; the vector of counts for the number of people in the sample with degree k. This is usually computed from s automatically and not usually specified by the user. 
muproposal 
scalar; The standard deviation of the proposal distribution for the mean degree. 
sigmaproposal 
scalar; The standard deviation of the proposal distribution for the standard deviation of the degree. 
burnintheta 
count; the number of proposals in the MetropolisHastings substep for the degree distribution parameters (θ) before any MCMC sampling is done. It typically is set to a modestly large number. 
parallel 
count; the number of parallel processes to run for the MonteCarlo sample. This uses PVM or MPI. The default is 1, that is not to use parallel processing. 
parallel.type 
The type of parallel processing to use. The options are "PVM" or "MPI". This requires the corresponding type to be installed. 
seed 
integer; random number integer seed. Defaults to 
maxbeta 
scalar; The maximum allowed value of the 
dispersion 
scalar; dispersion to use in the reported network size compared to the actual network size. 
supplied 
list; If supplied, is a list with components 
verbose 
logical; if this is 
Value
posteriorsize
returns a list consisting of the
following elements:
pop 
vector; The final posterior draw for the degrees of the population. The first n are the sample in sequence and the reminder are nonsequenced. 
K 
count; the maximum degree for an individual. This is usually calculated as twice the maximum observed degree. 
n 
count; the sample size. 
samplesize 
count; the number of MonteCarlo samples to draw to compute the posterior. This is the number returned by the MetropolisHastings algorithm.The default is 1000. 
burnin 
count; the number of proposals before any MCMC sampling is done. It typically is set to a fairly large number. 
interval 
count; the number of proposals between sampled statistics. 
mu 
scalar; The
hyper parameter 
sigma 
scalar; The hyper parameter 
df.mean.prior 
scalar; A hyper parameter being the degreesoffreedom of the prior for the mean. This gives the equivalent sample size that would contain the same amount of information inherent in the prior. 
df.sd.prior 
scalar; A hyper parameter being the degreesoffreedom of the prior for the standard deviation. This gives the equivalent sample size that would contain the same amount of information inherent in the prior for the standard deviation. 
Np 
integer; The
overall degree distribution is a mixture of the 
muproposal 
scalar; The standard deviation of the proposal distribution for the mean degree. 
sigmaproposal 
scalar; The standard deviation of the proposal distribution for the standard deviation of the degree. 
N 
vector of length 5; summary statistics for the posterior population size.

maxN 
integer; maximum possible population size. By default this is determined from an upper quantile of the prior distribution. 
sample 
matrix of dimension

lpriorm 
vector; the vector of (log) prior
probabilities on each value of m=Nn  that is, the number of
unobserved members of the population. The values are

burnintheta 
count; the number of proposals in the MetropolisHastings substep for the degree distribution parameters (θ) before any MCMC sampling is done. It typically is set to a modestly large number. 
verbose 
logical; if this is

predictive.degree.count 
vector; a vector
of length the maximum degree ( 
predictive.degree 
vector; a vector of length the maximum degree
( 
MAP 
vector of length 6
of MAP estimates corresponding to the output

mode.prior.sample.proportion 
scalar; A hyperparameter being the mode of the prior distribution on the sample proportion n/N. 
median.prior.size 
scalar; A hyperparameter being the mode of the prior distribution on the population size. 
mode.prior.size 
scalar; A hyperparameter being the mode of the prior distribution on the population size. 
mean.prior.size 
scalar; A hyperparameter being the mean of the prior distribution on the population size. 
quartiles.prior.size 
vector of length 2; A pair of hyperparameters being the lower and upper quartiles of the prior distribution on the population size. 
degreedistribution 
count; the
parametric distribution to use for the individual network sizes (i.e.,
degrees). The options are 
priorsizedistribution 
character; the type of parametric distribution
to use for the prior on population size. The options are 
Details on priors
The best way to specify the prior is via the
hyperparameter mode.prior.size
which specifies the mode of the prior
distribution on the population size. You can alternatively specify the
hyperparameter median.prior.size
which specifies the median of the
prior distribution on the population size, or mean.prior.sample
proportion
which specifies the mean of the prior distribution on the
proportion of the population size in the sample or mode.prior.sample
proportion
which specifies the mode of the prior distribution on the
proportion of the population size in the sample. Finally, you can specify
quartiles.prior.size
as a vector of length 2 being the pair of lower
and upper quartiles of the prior distribution on the population size.
References
Gile, Krista J. (2008) Inference from PartiallyObserved Network Data, Ph.D. Thesis, Department of Statistics, University of Washington.
Gile, Krista J. and Handcock, Mark S. (2010) RespondentDriven Sampling: An Assessment of Current Methodology, Sociological Methodology 40, 285327.
Gile, Krista J. and Handcock, Mark S. (2014) sspse: Estimating Hidden Population Size using Respondent Driven Sampling Data R package, Los Angeles, CA. Version 0.5, http://hpmrg.org.
Handcock MS (2003). degreenet: Models for Skewed Count Distributions Relevant to Networks. Statnet Project, Seattle, WA. Version 1.2, http://statnetproject.org.
Handcock, Mark S., Gile, Krista J. and Mar, Corinne M. (2014) Estimating Hidden Population Size using RespondentDriven Sampling Data, Electronic Journal of Statistics, 8, 1, 14911521
Handcock, Mark S., Gile, Krista J. and Mar, Corinne M. (2015) Estimating the Size of Populations at High Risk for HIV using RespondentDriven Sampling Data, Biometrics.
See Also
network, statnet, degreenet
Examples
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29  ## Not run:
N0 < 200
n < 100
K < 10
# Create probabilities for a Waring distribution
# with scaling parameter 3 and mean 5, but truncated at K=10.
probs < c(0.33333333,0.19047619,0.11904762,0.07936508,0.05555556,
0.04040404,0.03030303,0.02331002,0.01831502,0.01465201)
probs < probs / sum(probs)
# Look at the degree distribution for the prior
# Plot these if you want
# plot(x=1:K,y=probs,type="l")
# points(x=1:K,y=probs)
#
# Create a sample
#
set.seed(1)
pop<sample(1:K, size=N0, replace = TRUE, prob = probs)
s<sample(pop, size=n, replace = FALSE, prob = pop)
out < posteriorsize(s=s,interval=10)
plot(out, HPD.level=0.9,data=pop[s])
summary(out, HPD.level=0.9)
# Let's look at some MCMC diagnostics
plot(out, HPD.level=0.9,mcmc=TRUE)
## End(Not run)
