Estimate.b.k: RDS population size estimation

Description Usage Arguments Details Value References See Also Examples

View source: R/estimating_functions.R

Description

Estimate population size from respondent driven samples (RDS) using maximum likelihood, and several variation. The underlying idea is that the sample spreads like an epidemic in the target population as described in the reference.

Usage

1
Estimate.b.k(rds.object, type = "mle", jack.control = NULL)

Arguments

rds.object

A object of class rds-object as constructed by initializeRdsObject or outputted by Estimate.b.k (depending on the type used).

type

A character vector with the type of estimation. Possible values:

  • mle Maximum likelihood.

  • integrated Integrated maximum likelihood.

  • observed Estimate with observed degrees.

  • jeffreys MAP estimation with Jeffreys prior.

  • parametric Assume β[k]:=β * θ^k.

  • rescaling Naive rescaling heuristic estimation.

  • leave-d-out Leave-d-out resampling estimator.

jack.control

A object of class jack.control as constructed by makeJackControl.

Details

As of version 0.95, this function is the main workhorse of the chords package. Given an rds-class object, it will return population size estimates for each degree. Note that for the rescaling and parametric estimators, the input rds-object is expected to contain some initial estimate in the estimates slot.

See the reference for a description of the likelihood problem solved. Optimization is performed by noting that likelihood is coordinate-wise convex, thus amounts to a series of line-searches.

Value

An rds-class object with an updated estimates slot. The estiamtes slot is list with the following components:

call

The function call.

Nk.estimates

The estimated degree frequencies.

log.bk.estimates

The estimated sampling rates for each degree. In log scale.

convergence

0 if estimation of N[k]'s converged. Otherwise, 1 or -1, depending on the sign of the score function at the MLE.

References

[1] Berchenko, Y., Rosenblatt D.J., and S.D.W. Frost. "Modeling and Analyzing Respondent Driven Sampling as a Counting Process." arXiv:1304.3505,

See Also

initializeRdsObject, makeRdsSample, getTheta.

Examples

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
# Preliminaries
data(brazil)
rds.object2<- initializeRdsObject(brazil)
see <- function(x) plot(x$estimates$Nk.estimates, type='h')

# Maximum likelihood
rds.object <- Estimate.b.k(rds.object = rds.object2 )
see(rds.object)

# View estimates:
plot(rds.object$estimates$Nk.estimates, type='h')
# Population size estimate:
sum(rds.object$estimates$Nk.estimates)
plot(rds.object$estimates$log.bk.estimates, type='h')

## Recover theta assuming b.k=b_0*k^theta
getTheta(rds.object)

# How many degrees were imputed?:
table(rds.object$estimates$convergence)


# Observed degree estimation:
rds.object.4 <- Estimate.b.k(rds.object = rds.object, type='observed')
see(rds.object.4)

# Naive rescaling
rds.object.5 <- Estimate.b.k(rds.object = rds.object, type='rescaling')
see(rds.object.5)

# Parametric rates
rds.object.6 <- Estimate.b.k(rds.object = rds.object, 
                             type='parametric')
see(rds.object.6)
jack.control <- makeJackControl(3, 1e1)
rds.object.7 <- Estimate.b.k(rds.object = rds.object, 
                             type='leave-d-out', 
                             jack.control = jack.control)
see(rds.object.7)
rds.object.8 <- Estimate.b.k(rds.object = rds.object, 
                             type='integrated', 
                             jack.control = jack.control)
see(rds.object.8)
rds.object.9 <- Estimate.b.k(rds.object = rds.object, 
                             type='jeffreys')
see(rds.object.9)



## Not run: 
## Simulated data example:
    dk <- c(2, 1e1) # unique degree classes
    true.dks <- rep(0,max(dk)); true.dks[dk] <- dk
    true.Nks <- rep(0,max(dk)); true.Nks[dk] <- 1e3
    beta <- 1 #5e-6
    theta <-  0.1
    true.log.bks <- rep(-Inf, max(dk))
    true.log.bks[dk] <- theta*log(beta*dk)
    sample.length <- 4e2
    nsims <- 1e2
    
    simlist <- list()
    for(i in 1:nsims){
      simlist[[i]] <- makeRdsSample(
        N.k =true.Nks , 
        b.k = exp(true.log.bks),
        sample.length = sample.length)
    }
    
    
    # Estimate betas and theta with chords:
    llvec <- rep(NA,nsims)
    bklist <- list()
    for(i in 1:nsims){
      # i <- 2
      simlist[[i]] <- Estimate.b.k(rds.object = simlist[[i]])
      # llvec[i] <- simlist[[i]]$estimates$likelihood
      bklist[[i]] <- simlist[[i]]$estimates$log.bk.estimates
    }
    b1vec <- bklist 
    b2vec <- bklist 
    
    hist(b1vec)
    abline(v=true.log.bks[2])
    hist(b2vec)
    abline(v=true.log.bks[10])
    
    beta0vec <- rep(-Inf,nsims)
    thetavec <- rep(-Inf,nsims)
    nvec <- rep(-Inf,nsims)
    converged <- rep(9999,nsims)
    
    for(i in 1:nsims){
      # i <- 2
      nvec[i] <- sum(simlist[[i]]$estimates$Nk.estimates)
      converged[i] <- sum(simlist[[i]]$estimates$convergence, na.rm=TRUE)
      # tfit <- getTheta(simlist[[i]])
      # beta0vec[i] <- tfit$log.beta_0
      # thetavec[i] <- tfit$theta
    }
    summary(beta0vec)
    summary(nvec)
    # summary(thetavec)
    # hist(thetavec)
    # abline(v=theta)
    hist(nvec)
    abline(v=sum(true.Nks), col='red')
    abline(v=median(nvec, na.rm = TRUE), lty=2)
    table(converged)
    
    # Try various re-estimatinons:
    rds.object2 <- simlist[[which(is.infinite(nvec))[1]]]
    
    rds.object <- Estimate.b.k(rds.object = rds.object2 )
    see(rds.object)
    rds.object$estimates$Nk.estimates
    
    rds.object.5 <- Estimate.b.k(rds.object = rds.object, type='rescaling')
    see(rds.object.5) # will not work. less than 2 converging estimates.
    rds.object.5$estimates$Nk.estimates
    
    rds.object.6 <- Estimate.b.k(rds.object = rds.object, type='parametric')
    see(rds.object.6) # will not work. less than 2 converging estimates.
    
    
    jack.control <- makeJackControl(3, 1e2)
    rds.object.7 <- Estimate.b.k(rds.object = rds.object, 
                             type='leave-d-out', 
                             jack.control = jack.control)
    see(rds.object.7)
    rds.object.7$estimates$Nk.estimates
    
    
    rds.object.8 <- Estimate.b.k(rds.object = rds.object, type='integrated')
    see(rds.object.8)
    rds.object.8$estimates$Nk.estimates
    
    
    rds.object.9 <- Estimate.b.k(rds.object = rds.object, type='jeffreys')
    see(rds.object.9)
    rds.object.9$estimates$Nk.estimates

## End(Not run)

chords documentation built on May 30, 2017, 3:39 a.m.