# Estimating hidden population size using RDS data

### Description

posteriorsize computes the posterior distribution of the population size based on data collected by Respondent Driven Sampling. The approach approximates the RDS via the Sequential Sampling model of Gile (2008). As such, it is referred to as the Sequential Sampling - Population Size Estimate (SS-PSE). It uses the order of selection of the sample to provide information on the distribution of network sizes over the population members.

### Usage

  1 2 3 4 5 6 7 8 9 10 11 12 posteriorsize(s, median.prior.size = NULL, interval = 10, burnin = 5000, maxN = NULL, K = max(s, na.rm = TRUE), samplesize = 1000, quartiles.prior.size = NULL, mean.prior.size = NULL, mode.prior.size = NULL, priorsizedistribution = c("beta", "flat", "nbinom", "pln", "supplied"), effective.prior.df = 1, sd.prior.size = NULL, mode.prior.sample.proportion = NULL, alpha = NULL, degreedistribution = c("cmp", "nbinom", "pln"), mean.prior.degree = NULL, sd.prior.degree = NULL, max.sd.prior.degree = 4, df.mean.prior = 1, df.sd.prior = 3, Np = 0, nk = NULL, n = length(s), muproposal = 0.1, sigmaproposal = 0.15, burnintheta = 500, parallel = 1, parallel.type = "MPI", seed = NULL, maxbeta = 120, dispersion = 0, supplied = list(maxN = maxN), verbose = TRUE) 

### Arguments

 s vector of integers; the vector of degrees from the RDS in order they are recorded. median.prior.size scalar; A hyperparameter being the mode of the prior distribution on the population size. interval count; the number of proposals between sampled statistics. burnin count; the number of proposals before any MCMC sampling is done. It typically is set to a fairly large number. maxN integer; maximum possible population size. By default this is determined from an upper quantile of the prior distribution. K count; the maximum degree for an individual. This is usually calculated as round(quantile(s,0.80)). samplesize count; the number of Monte-Carlo samples to draw to compute the posterior. This is the number returned by the Metropolis-Hastings algorithm.The default is 1000. quartiles.prior.size vector of length 2; A pair of hyperparameters being the lower and upper quartiles of the prior distribution on the population size. For example, quartiles.prior.size=c(1000,4000) corresponds to a prior where the lower quartile (25%) is 1000 and the upper (75%) is 4000. mean.prior.size scalar; A hyperparameter being the mean of the prior distribution on the population size. mode.prior.size scalar; A hyperparameter being the mode of the prior distribution on the population size. priorsizedistribution character; the type of parametric distribution to use for the prior on population size. The options are beta (for a Beta prior on the sample proportion (i.e. n/N)), flat (uniform), nbinom (Negative-Binomial), and pln (Poisson-log-normal). The default is beta. effective.prior.df scalar; A hyperparameter being the effective number of samples worth of information represented in the prior distribution on the population size. By default this is 1, but it can be greater (or less!) to allow for different levels of uncertainty. sd.prior.size scalar; A hyperparameter being the standard deviation of the prior distribution on the population size. mode.prior.sample.proportion scalar; A hyperparameter being the mode of the prior distribution on the sample proportion n/N. alpha scalar; A hyperparameter being the first parameter of the beta prior model for the sample proportion. By default this is NULL, meaning that 1 is chosen. it can be any value at least 1 to allow for different levels of uncertainty. degreedistribution count; the parametric distribution to use for the individual network sizes (i.e., degrees). The options are cmp, nbinom, and pln. These correspond to the Conway-Maxwell-Poisson, Negative-Binomial, and Poisson-log-normal. The default is cmp. mean.prior.degree scalar; A hyper parameter being the mean degree for the prior distribution for a randomly chosen person. The prior has this mean. sd.prior.degree scalar; A hyper parameter being the standard deviation of the degree for a randomly chosen person. The prior has this standard deviation. max.sd.prior.degree scalar; The maximum allowed value of sd.prior.degree. If the passed or computed value is higher, it is reduced to this value. This is done for numerical stability reasons. df.mean.prior scalar; A hyper parameter being the degrees-of-freedom of the prior for the mean. This gives the equivalent sample size that would contain the same amount of information inherent in the prior. df.sd.prior scalar; A hyper parameter being the degrees-of-freedom of the prior for the standard deviation. This gives the equivalent sample size that would contain the same amount of information inherent in the prior for the standard deviation. Np integer; The overall degree distribution is a mixture of the Np rates for 1:Np and a parametric degree distribution model truncated below Np. Thus the model fits the proportions of the population with degree 1:Np each with a separate parameter. This should adjust for an lack-of-fit of the parametric degree distribution model at lower degrees, although it also changes the model away from the parametric degree distribution model. nk vector; the vector of counts for the number of people in the sample with degree k. This is usually computed from s automatically as tabulate(s,nbins=K) and not usually specified by the user. n vector; the vector of counts for the number of people in the sample with degree k. This is usually computed from s automatically and not usually specified by the user. muproposal scalar; The standard deviation of the proposal distribution for the mean degree. sigmaproposal scalar; The standard deviation of the proposal distribution for the standard deviation of the degree. burnintheta count; the number of proposals in the Metropolis-Hastings sub-step for the degree distribution parameters (θ) before any MCMC sampling is done. It typically is set to a modestly large number. parallel count; the number of parallel processes to run for the Monte-Carlo sample. This uses PVM or MPI. The default is 1, that is not to use parallel processing. parallel.type The type of parallel processing to use. The options are "PVM" or "MPI". This requires the corresponding type to be installed. seed integer; random number integer seed. Defaults to NULL to use whatever the state of the random number generator is at the time of the call. maxbeta scalar; The maximum allowed value of the beta parameter. If the implied or computed value is higher, it is reduced to this value. This is done for numerical stability reasons. dispersion scalar; dispersion to use in the reported network size compared to the actual network size. supplied list; If supplied, is a list with components maxN and sample. In this case supplied is a matrix with a column named N being a sample from a prior distribution for the population size. The value maxN specifies the maximum value of the population size, a priori. verbose logical; if this is TRUE, the program will print out additional information, including goodness of fit statistics.

### Value

posteriorsize returns a list consisting of the following elements:

 pop vector; The final posterior draw for the degrees of the population. The first n are the sample in sequence and the reminder are non-sequenced. K count; the maximum degree for an individual. This is usually calculated as twice the maximum observed degree. n count; the sample size. samplesize count; the number of Monte-Carlo samples to draw to compute the posterior. This is the number returned by the Metropolis-Hastings algorithm.The default is 1000. burnin count; the number of proposals before any MCMC sampling is done. It typically is set to a fairly large number. interval count; the number of proposals between sampled statistics. mu scalar; The hyper parameter mean.prior.degree being the mean degree for the prior distribution for a randomly chosen person. The prior has this mean. sigma scalar; The hyper parameter sd.prior.degree being the standard deviation of the degree for a randomly chosen person. The prior has this standard deviation. df.mean.prior scalar; A hyper parameter being the degrees-of-freedom of the prior for the mean. This gives the equivalent sample size that would contain the same amount of information inherent in the prior. df.sd.prior scalar; A hyper parameter being the degrees-of-freedom of the prior for the standard deviation. This gives the equivalent sample size that would contain the same amount of information inherent in the prior for the standard deviation. Np integer; The overall degree distribution is a mixture of the 1:Np rates and a parametric degree distribution model truncated below Np. Thus the model fits the proportions of the population with degree 1:Np each with a separate parameter. This should adjust for an lack-of-fit of the parametric degree distribution model at lower degrees, although it also changes the model away from the parametric degree distribution model. muproposal scalar; The standard deviation of the proposal distribution for the mean degree. sigmaproposal scalar; The standard deviation of the proposal distribution for the standard deviation of the degree. N vector of length 5; summary statistics for the posterior population size. MAPmaximum aposteriori value of N Mean APmean aposteriori value of N Median APmedian aposteriori value of N P025the 2.5th percentile of the (posterior) distribution for the N. That is, the lower point on a 95% probability interval. P975the 97.5th percentile of the (posterior) distribution for the N. That is, the upper point on a 95% probability interval. maxN integer; maximum possible population size. By default this is determined from an upper quantile of the prior distribution. sample matrix of dimension samplesize\times 10 matrix of summary statistics from the posterior. this is also an object of class mcmc so it can be plotted and summarized via the mcmc.diagnostics function in the ergm package (and also the coda package). The statistics are: Npopulation size. muscalar; The mean degree for the prior distribution for a randomly chosen person. The prior has this mean. sigmascalar; The standard deviation of the degree for a randomly chosen person. The prior has this standard deviation. degree1scalar; the number of nodes of degree 1 in the population (it is assumed all nodes have degree 1 or more). lambdascalar; This is only present for the cmp model. It is the λ parameter in the standard parametrization of the Conway-Maxwell-Poisson model for the degree distribution. nuscalar; This is only present for the cmp model. It is the ν parameter in the standard parametrization of the Conway-Maxwell-Poisson model for the degree distribution. lpriorm vector; the vector of (log) prior probabilities on each value of m=N-n - that is, the number of unobserved members of the population. The values are n:(length(lpriorm)-1+n). burnintheta count; the number of proposals in the Metropolis-Hastings sub-step for the degree distribution parameters (θ) before any MCMC sampling is done. It typically is set to a modestly large number. verbose logical; if this is TRUE, the program printed out additional information, including goodness of fit statistics. predictive.degree.count vector; a vector of length the maximum degree (K) (by default K=2*max(sample degree)). The kth entry is the posterior predictive number persons with degree k. That is, it is the posterior predictive distribution of the number of people with each degree in the population. predictive.degree vector; a vector of length the maximum degree (K) (by default K=2*max(sample degree)). The kth entry is the posterior predictive proportion of persons with degree k. That is, it is the posterior predictive distribution of the proportion of people with each degree in the population. MAP vector of length 6 of MAP estimates corresponding to the output sample. These are: Npopulation size. muscalar; The mean degree for the prior distribution for a randomly chosen person. The prior has this mean. sigmascalar; The standard deviation of the degree for a randomly chosen person. The prior has this standard deviation. degree1scalar; the number of nodes of degree 1 in the population (it is assumed all nodes have degree 1 or more). lambdascalar; This is only present for the cmp model. It is the λ parameter in the standard parametrization of the Conway-Maxwell-Poisson model for the degree distribution. nuscalar; This is only present for the cmp model. It is the ν parameter in the standard parametrization of the Conway-Maxwell-Poisson model for the degree distribution. mode.prior.sample.proportion scalar; A hyperparameter being the mode of the prior distribution on the sample proportion n/N. median.prior.size scalar; A hyperparameter being the mode of the prior distribution on the population size. mode.prior.size scalar; A hyperparameter being the mode of the prior distribution on the population size. mean.prior.size scalar; A hyperparameter being the mean of the prior distribution on the population size. quartiles.prior.size vector of length 2; A pair of hyperparameters being the lower and upper quartiles of the prior distribution on the population size. degreedistribution count; the parametric distribution to use for the individual network sizes (i.e., degrees). The options are cmp, nbinom, and pln. These correspond to the Conway-Maxwell-Poisson, Negative-Binomial, and Poisson-log-normal. The default is cmp. priorsizedistribution character; the type of parametric distribution to use for the prior on population size. The options are beta (for a Beta prior on the sample proportion (i.e. n/N), nbinom (Negative-Binomial), pln (Poisson-log-normal), flat (uniform), and continuous (the continuous version of the Beta prior on the sample proportion. The default is beta.

### Details on priors

The best way to specify the prior is via the hyperparameter mode.prior.size which specifies the mode of the prior distribution on the population size. You can alternatively specify the hyperparameter median.prior.size which specifies the median of the prior distribution on the population size, or mean.prior.sample proportion which specifies the mean of the prior distribution on the proportion of the population size in the sample or mode.prior.sample proportion which specifies the mode of the prior distribution on the proportion of the population size in the sample. Finally, you can specify quartiles.prior.size as a vector of length 2 being the pair of lower and upper quartiles of the prior distribution on the population size.

### References

Gile, Krista J. (2008) Inference from Partially-Observed Network Data, Ph.D. Thesis, Department of Statistics, University of Washington.

Gile, Krista J. and Handcock, Mark S. (2010) Respondent-Driven Sampling: An Assessment of Current Methodology, Sociological Methodology 40, 285-327.

Gile, Krista J. and Handcock, Mark S. (2014) sspse: Estimating Hidden Population Size using Respondent Driven Sampling Data R package, Los Angeles, CA. Version 0.5, http://hpmrg.org.

Handcock MS (2003). degreenet: Models for Skewed Count Distributions Relevant to Networks. Statnet Project, Seattle, WA. Version 1.2, http://statnetproject.org.

Handcock, Mark S., Gile, Krista J. and Mar, Corinne M. (2014) Estimating Hidden Population Size using Respondent-Driven Sampling Data, Electronic Journal of Statistics, 8, 1, 1491-1521

Handcock, Mark S., Gile, Krista J. and Mar, Corinne M. (2015) Estimating the Size of Populations at High Risk for HIV using Respondent-Driven Sampling Data, Biometrics.

network, statnet, degreenet

### Examples

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 ## Not run: N0 <- 200 n <- 100 K <- 10 # Create probabilities for a Waring distribution # with scaling parameter 3 and mean 5, but truncated at K=10. probs <- c(0.33333333,0.19047619,0.11904762,0.07936508,0.05555556, 0.04040404,0.03030303,0.02331002,0.01831502,0.01465201) probs <- probs / sum(probs) # Look at the degree distribution for the prior # Plot these if you want # plot(x=1:K,y=probs,type="l") # points(x=1:K,y=probs) # # Create a sample # set.seed(1) pop<-sample(1:K, size=N0, replace = TRUE, prob = probs) s<-sample(pop, size=n, replace = FALSE, prob = pop) out <- posteriorsize(s=s,interval=10) plot(out, HPD.level=0.9,data=pop[s]) summary(out, HPD.level=0.9) # Let's look at some MCMC diagnostics plot(out, HPD.level=0.9,mcmc=TRUE) ## End(Not run) 

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.