Gile's SS Estimates

Share:

Description

This function computes the sequential sampling (SS) estimates for a categorical variable or numeric variable.

Usage

1
2
3
RDS.SS.estimates(rds.data, outcome.variable, N = NULL, subset = NULL,
  number.ss.samples.per.iteration = 500, number.ss.iterations = 5,
  control = control.rds.estimates(), hajek = TRUE, empir.lik = TRUE)

Arguments

rds.data

An rds.data.frame that indicates recruitment patterns by a pair of attributes named “id” and “recruiter.id”.

outcome.variable

A string giving the name of the variable in the rds.data that contains a categorical or numeric variable to be analyzed.

N

An estimate of the number of members of the population being sampled. If NULL it is read as the population.size.mid attribute of the rds.data frame. If that is missing it defaults to 1000.

subset

An optional criterion to subset rds.data by. It is a character string giving an R expression which, when evaluated, subset the data. In plain English, it can be something like "seed > 0" to exclude seeds. It can be the name of a logical vector of the same length of the outcome variable where TRUE means include it in the analysis. If NULL then no subsetting is done.

number.ss.samples.per.iteration

The number of samples to take in estimating the inclusion probabilites in each iteration of the sequential sampling algorithm. If NULL it is read as the eponymous attribute of rds.data. If that is missing it defaults to 5000.

number.ss.iterations

The number of iterations of the sequential sampling algorithm. If that is missing it defaults to 5.

control

A list of control parameters for algorithm tuning. Constructed using
control.rds.estimates.

hajek

logical; Use the standard Hajek-type estimator of Gile (2011) or the standard Hortitz-Thompson. The default is TRUE.

empir.lik

If true, and outcome.variable is numeric, standard errors based on empirical likelihood will be given.

Value

If outcome.variable is numeric then the Gile SS estimate of the mean is returned, otherwise a vector of proportion estimates is returned. If the empir.lik is true, an object of class rds.interval.estimate is returned. This is a list with components

  • estimate: The numerical point estimate of proportion of the trait.variable.

  • interval: A matrix with six columns and one row per category of trait.variable:

    • point estimate: The HT estimate of the population mean.

    • 95% Lower Bound: Lower 95% confidence bound.

    • 95% Upper Bound: Upper 95% confidence bound.

    • Design Effect: The design effect of the RDS.

    • s.e.: Standard error.

    • n: Count of the number of sample values with that value of the trait.

Otherwise, an object of class rds.SS.estimate is returned.

Author(s)

Krista J. Gile with help from Mark S. Handcock

References

Gile, Krista J. 2011 Improved Inference for Respondent-Driven Sampling Data with Application to HIV Prevalence Estimation, Journal of the American Statistical Association, 106, 135-146.

Gile, Krista J., Handcock, Mark S., 2010 Respondent-driven Sampling: An Assessment of Current Methodology, Sociological Methodology, 40, 285-327.

Gile, Krista J., Handcock, Mark S., 2011 Network Model-Assisted Inference from Respondent-Driven Sampling Data, ArXiv Preprint.

Salganik, M., Heckathorn, D. D., 2004. Sampling and estimation in hidden populations using respondent-driven sampling. Sociological Methodology 34, 193-239.

Volz, E., Heckathorn, D., 2008. Probability based estimation theory for Respondent Driven Sampling. The Journal of Official Statistics 24 (1), 79-97.

See Also

RDS.I.estimates, RDS.II.estimates

Examples

1
2
data(fauxmadrona)
RDS.SS.estimates(rds.data=fauxmadrona,outcome.variable="disease",N=1000)

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.