R/Dorfman.Functions.R
In binGroup: Evaluation and Experimental Design for Binomial Group Testing

#######################################################################################
# Used to find the characteristics of an Informative Dorfman decoding process specified  
# by method= OD (Optimal Dorfman), TOD (Thresholded Optimal Dorfman), and PSOD (Pool Specific Optimal Dorfman)
# given p (a vector of all subjects' infection probabilities), 
# se (sensitivity of diagnostic test),
# sp (specificity of diagnostic test), 
# max.pool (maximum allowable pool size),
# thresh.pool (initial pool size used for TOD if threshold is not specified), and 
# threshold (threshold value for TOD, note if a threshold value is not specified one is found algorithmically). 
# Function returns 
# p.star (threshold value used, only applies when using TOD),
# res.e (the expected expenditure of the decoding process),
# res.v (the variance of expenditure of the decoding process), and 
# res.mat a matrix of summary measures which includes each subject's infection probability , pool (pool to which they belong),
# PSe (pooling sensitivity), PSp (pooling specificity), PPV (pooling positive predictive value), and 
# NPV (pooling negative predictive value).
#
#EXAMPLE: opt.info.dorf(prob=rbeta(1000,1,10), se = 1, sp = 1, method ="OD", max.pool=15, thresh.pool=8, threshold=NULL)

#' @title Find the characteristics of an informative two-stage 
#' hierarchical (Dorfman) algorithm
#' 
#' @description Find the characteristics of an informative 
#' two-stage hierarchical (Dorfman) decoding process using Optimal 
#' Dorfman (OD), Thresholded Optimal Dorfman (TOD), or Pool-Specific 
#' Optimal Dorfman (PSOD) algorithms.
#' 
#' @param prob a vector of all subjects' infection probabilities.
#' @param se the sensitivity of the diagnostic test.
#' @param sp the specificity of the diagnostic test.
#' @param method character string defining the specific screening
#' procedure for implementation of Dorfman retesting in a 
#' heterogeneous population. Options include Optimal Dorfman 
#' ("\kbd{OD}"), Thresholded Optimal Dorfman ("\kbd{TOD}"), and
#' Pool-Specific Optimal Dorfman ("\kbd{PSOD}"). Further details 
#' are given under 'Details'.
#' @param max.pool the maximum allowable pool size. Further details 
#' are given under 'Details'.
#' @param thresh.pool the initial pool size used for TOD, if 
#' \kbd{threshold} is not specified. Further details are given 
#' under 'Details'.
#' @param threshold the threshold value for TOD. If a threshold
#' value is not specified, one is found algorithmically. Further 
#' details are given under 'Details'.
#' 
#' @details This function finds the characteristics of an informative
#' two-stage hierarchical (Dorfman) decoding process. Characteristics
#' found include the expected expenditure of the decoding process, 
#' the variance of the expenditure of the decoding process, and the 
#' pooling sensitivity, pooling specificity, pooling positive predictive
#' value, and pooling negative predictive value for each individual.
#' Calculations of these characteristics are done using equations
#' presented in McMahan et al. (2012). 
#' 
#' Optimal Dorfman (OD) is an informative Dorfman algorithm in 
#' which the common pool size \eqn{c=c_{opt}} minimizes 
#' \eqn{E(T^(c))}, the expected number of tests needed to decode 
#' all \eqn{N} individuals when pools of size \eqn{c} are used. 
#' 
#' Thresholded Optimal Dorfman (TOD) is an informative Dorfman 
#' algorithm in which all \eqn{N} individuals are partitioned 
#' into two classes, low-risk and high-risk individuals, based 
#' on whether their risk probability falls below or above a 
#' particular threshold value. The threshold can be specified 
#' using the \kbd{threshold} argument or the TOD algorithm can 
#' identify the optimal threshold value. The low-risk individuals
#' are tested using a optimal common pool size,  and high-risk 
#' individuals are tested individually.
#' 
#' Pool-Specific Optimal Dorfman (PSOD) is an informative Dorfman
#' algorithm in which optimal sizes are determined for each pool. 
#' A total of \eqn{N} individuals are tested in pools that minimize
#' the expected number of tests per individual, on a pool-by-pool 
#' basis. If desired, the user can add the constraint of a maximum 
#' allowable pool size, so that each pool will contain no more 
#' than the maximum allowable number of individuals.
#' 
#' All three informative Dorfman procedures described above require
#' individuals to be ordered from smallest to largest probability 
#' of infection. See McMahan et al. (2012) for additional details
#' on the implementation of informative two-stage hierarchical
#' (Dorfman) testing algorithms.
#' 
#' @return A list containing:
#' \item{tv}{the threshold value used for TOD, if applicable.}
#' \item{e}{the expected expenditure of the decoding process.}
#' \item{v}{the variance of the expenditure of the decoding 
#' process.}
#' \item{summary}{a matrix of summary measures that includes
#' each individual's infection probability, pool (pool to which
#' they belong), pooling sensitivity, pooling specificity, 
#' pooling positive predictive value, and pooling negative
#' predictive value.}
#' 
#' @author This function was originally written by Christopher S. 
#' McMahan for McMahan et al. (2012). The function was obtained 
#' from \url{http://chrisbilder.com/grouptesting}.
#' 
#' @references 
#' \insertRef{Dorfman1943}{binGroup}
#' 
#' \insertRef{McMahan2012a}{binGroup}
#' 
#' @seealso 
#' \url{http://chrisbilder.com/grouptesting}
#' 
#' @family Informative Dorfman functions
#' 
#' @examples 
#' # Find the characteristics of an informative
#' #   Dorfman algorithm, using the OD procedure.
#' # This example takes less than 1 second to run.
#' # Estimated running time was calculated using a 
#' #   computer with 16 GB of RAM and one core of an 
#' #   Intel i7-6500U processor.
#' opt.info.dorf(prob=rbeta(1000,1,10), se=1, sp=1, 
#' method ="OD", max.pool=15, thresh.pool=8, threshold=NULL)
#' 
#' # Find the characteristics of an informative 
#' #   Dorfman algorithm, using the TOD procedure.
#' # This example takes less than 1 second to run.
#' # Estimated running time was calculated using a 
#' #   computer with 16 GB of RAM and one core of an 
#' #   Intel i7-6500U processor.
#' set.seed(1002)
#' p.vec <- p.vec.func(p=0.01, alpha=2, grp.sz=20)
#' opt.info.dorf(prob=p.vec, se=0.95, sp=0.95, 
#' method="TOD", max.pool=5, threshold=0.015)

opt.info.dorf<-function(prob, se = 1, sp = 1, method ="OD", max.pool=15, thresh.pool=8, threshold=NULL){

# Saves original ordering
ind.order<-order(prob)

# Orders subjects, required under all Informative measures
prob<-sort(prob)

# Determines number of subjects being screened, and sets up vectors for 
# storing summary measures. Also initializes the threshold p.star
N<-length(prob)
pool.id<-rep(-1000,N)
PSe<-rep(-100,N)
PSp<-rep(-100,N)
PPV<-rep(-100,N)
NPV<-rep(-100,N)
p.star<-threshold


# If method is TOD this finds the threshold value and divides the subjects into 
# the high and low risk classes
if(method=="TOD"){
  if(is.null(p.star)){
  p.star<-thresh.val.dorf(p=prob, psz=thresh.pool, se=se, sp=sp)
  }
  if(p.star < 1){
  N.high<-length(prob[prob > p.star])
  N<-N-N.high
  }
}


# Finds optimal pool size to be used with OD or, finds optimal pool size to be used 
# to decode low risk class in TOD
if(method=="OD" | method=="TOD"){
   if(N==0){psz<-NULL}
   if(N > 0){
     res.psz<-opt.pool.size(p=prob[1:N] ,max.p=max.pool, se=se, sp=sp)
     J<-ceiling(N/res.psz)
     rem<-N-(J-1)*res.psz
     if(rem!=0){
     psz<-c(rep(res.psz,(J-1)),rem)
     }  
     if(rem==0){
     psz<-rep(res.psz,J)
     }
  }
  if(method=="TOD"){
   if(p.star < 1){
   psz<-c(psz,rep(1,N.high))
   J<-length(psz)
   N<-N+N.high
   }
  }
}


# Finds pool sizes to be used with PSOD
if(method=="PSOD"){
  psz<-pool.specific.dorf(p=prob , max.p=max.pool , se=se , sp=sp)
  J<-length(psz)
}

# Finds measures pool by pool
psz<-c(psz,0)
lower<-1
upper<-psz[1]
vec.e<-rep(-1000,J)
vec.v<-rep(-1000,J)
for(i in 1:J){
p.pool<-prob[lower:upper]
pool.id[lower:upper]<-rep(i,length(p.pool))

res<-characteristics.pool(p=p.pool,se=se,sp=sp)
vec.e[i]<-res$e
vec.v[i]<-res$v

res.acc<-accuracy.dorf(p=p.pool,se=se,sp=sp)
PSe[lower:upper]<-res.acc$PSe
PSp[lower:upper]<-res.acc$PSp
PPV[lower:upper]<-res.acc$PPV
NPV[lower:upper]<-res.acc$NPV

lower<-1+upper
upper<-upper+psz[i+1]
}

# Finds total expectation and variation
res.e<-sum(vec.e)
res.v<-sum(vec.v)

# Returns all subjects to original ordering, along with their corresponding measures
prob<-prob[order(ind.order)]
pool.id<-pool.id[order(ind.order)]
PSe<-PSe[order(ind.order)]
PSp<-PSp[order(ind.order)]
PPV<-PPV[order(ind.order)]
NPV<-NPV[order(ind.order)]

res.mat<-matrix(c(pool.id, prob, PSe, PSp, PPV, NPV),nrow=N,ncol=6,byrow=FALSE, 
                dimnames=list(as.character(1:N) , c("pool","probability","PSe", "PSp", "PPV", "NPV")))

prob<-prob[ind.order]
return(list("tv"=p.star, "e"=res.e, "v"=res.v, "summary"=res.mat))
}



######################################
### DORFMAN DECODING FUNCTIONS #######
######################################

############################################################################
# Function for the expectation and variation in testing expenditure 
# of a pool of size greater than or equal to one
# here p is a vector of all subjects probability of infection,
# se is sensitivity, and
# sp is specificity.
# res.e is the expected expenditure of the pool and
# res.v is the variance of expenditure of the pool.

#' @title Testing expenditure for informative Dorfman testing
#' 
#' @description Calculate the expectation and variation of the testing
#' expenditure of a pool used with informative Dorfman testing.
#' 
#' @param p a vector of each individual's probability of infection.
#' @inheritParams opt.info.dorf
#' 
#' @details This function calculates the expected value and variance
#' of the testing expenditure of a pool of size greater than or equal 
#' to one used with informative Dorfman testing. Calculations of these 
#' measures are done using the equations presented in McMahan et al. (2012). 
#' 
#' @return a list containing:
#' \item{e}{the expected testing expenditure of the pool.}
#' \item{v}{the variation of the testing expenditure of the pool.}
#'  
#' @author This function was originally written by Christopher S. 
#' McMahan for McMahan et al. (2012). The function was obtained 
#' from \url{http://chrisbilder.com/grouptesting}.
#' 
#' @references 
#' \insertRef{McMahan2012a}{binGroup}
#' 
#' @seealso 
#' \url{http://chrisbilder.com/grouptesting} 
#' 
#' @family Informative Dorfman functions
#' 
#' @examples 
#' # This example takes less than 1 second to run.
#' # Estimated running time was calculated using a 
#' #   computer with 16 GB of RAM and one core of an 
#' #   Intel i7-6500U processor.
#' set.seed(8135)
#' p.vec <- p.vec.func(p=0.02, alpha=1, grp.sz=10)
#' characteristics.pool(p=p.vec[1:3], se=0.90, sp=0.90)

characteristics.pool<-function(p,se,sp){
n<-length(p)
if(n>1){
prob<-se+(1-se-sp)*prod(1-p)
res.e<-1+n*prob
res.v<-n^2*prob*(1-prob)
}
if(n==1){
res.e<-1
res.v<-0
}
return(list("e"=res.e, "v"=res.v))
}



############################################################################
# Function that returns PSe, PSp, PPV, and NPV for all individuals
# belonging to a pool of size greater than or equal to one
# here p is a vector of all subject probabilities,
# se is sensitivity, and
# sp is specificity.

#' @title Accuracy measures for informative Dorfman testing
#' 
#' @description Calculate the accuracy measures for each individual 
#' in a pool used with informative Dorfman testing.
#' 
#' @param p a vector of each individual's probability of infection.
#' @inheritParams opt.info.dorf
#' 
#' @details This function calculates the pooling sensitivity, pooling
#' specificity, pooling positive predictive value, and pooling negative
#' predictive value for each individual belonging to a pool of size 
#' greater than or equal to one used with informative Dorfman testing. 
#' Calculations of these measures are done using the equations presented 
#' in McMahan et al. (2012). 
#' 
#' @return a list containing:
#' \item{PSe}{a vector containing each individual's pooling sensitivity.}
#' \item{PSp}{a vector containing each individual's pooling specificity.}
#' \item{PPV}{a vector containing each individual's pooling positive 
#' predictive value.}
#' \item{NPV}{a vector containing each individual's pooling negative 
#' predictive value.}
#'  
#' @author This function was originally written by Christopher S. 
#' McMahan for McMahan et al. (2012). The function was obtained 
#' from \url{http://chrisbilder.com/grouptesting}.
#' 
#' @references 
#' \insertRef{McMahan2012a}{binGroup}
#' 
#' @seealso 
#' \url{http://chrisbilder.com/grouptesting}
#' 
#' @family Informative Dorfman functions
#' 
#' @examples 
#' # This example takes less than 1 second to run.
#' # Estimated running time was calculated using a 
#' #   computer with 16 GB of RAM and one core of an 
#' #   Intel i7-6500U processor.
#' set.seed(8135)
#' p.vec <- p.vec.func(p=0.02, alpha=1, grp.sz=10)
#' accuracy.dorf(p=p.vec[1:3], se=0.90, sp=0.90)

accuracy.dorf<-function(p,se,sp){
cj<-length(p)
se.vec<-rep(-100,cj)
sp.vec<-rep(-100,cj)
ppv.vec<-rep(-100,cj)
npv.vec<-rep(-100,cj)

if(cj==1){
se.vec<-se
sp.vec<-sp
ppv.vec<-p*se/(p*se+(1-p)*(1-sp))
npv.vec<-(1-p)*sp/((1-p)*sp + p*(1-se))
}

if(cj>1){
for(i in 1:cj){
se.vec[i]<-se^2
sp.vec[i]<-1-(1-sp)*(se+(1-se-sp)*prod(1-p[-i]))
ppv.vec[i]<-p[i]*se^2/(p[i]*se^2+(1-p[i])*(1-sp.vec[i]))
npv.vec[i]<-(1-p[i])*sp.vec[i]/((1-p[i])*sp.vec[i] + p[i]*(1-se^2))
}
}

return(list("PSe"=se.vec, "PSp"=sp.vec, "PPV"=ppv.vec, "NPV"=npv.vec))
}



##############################################
# Pooling Algorithm Minimizes Individual 
# Risk on a per pool basis, so it allows for multiple 
# pooling sizes (psz) here p is a vector of all subject probabilities,
# se is sensitivity,
# sp is specificity, and
# max.p is the maximum allowable pool size.

#' @title Find the optimal pool sizes for Pool-Specific Optimal Dorfman 
#' (PSOD) testing
#' 
#' @description Find the set of optimal pool sizes for Pool-Specific 
#' Optimal Dorfman (PSOD) testing.
#' 
#' @param p a vector of each individual's probability of infection.
#' @param max.p the maximum allowable pool size.
#' @inheritParams opt.info.dorf
#' 
#' @details This function finds the set of optimal pool sizes for PSOD
#' testing. PSOD testing uses a greedy algorithm and does not consider
#' all possible sets of pool sizes. See McMahan et al. (2012) for 
#' additional details on the implementation of PSOD testing.
#' 
#' @return The optimal set of pool sizes for PSOD testing.
#'  
#' @author This function was originally written by Christopher S. 
#' McMahan for McMahan et al. (2012). The function was obtained 
#' from \url{http://chrisbilder.com/grouptesting}.
#' 
#' @references 
#' \insertRef{McMahan2012a}{binGroup}
#' 
#' @seealso 
#' \url{http://chrisbilder.com/grouptesting}
#' 
#' @family Informative Dorfman functions
#' 
#' @examples 
#' # This example takes less than 1 second to run.
#' # Estimated running time was calculated using a 
#' #   computer with 16 GB of RAM and one core of an 
#' #   Intel i7-6500U processor.
#' set.seed(8135)
#' p.vec <- p.vec.func(p=0.02, alpha=1, grp.sz=10)
#' pool.specific.dorf(p=p.vec, max.p=3, se=0.95, sp=0.95)

pool.specific.dorf<-function(p,max.p,se,sp){
p<-sort(p)
N<-length(p)
psz<-rep(-99,N)
k<-0
ind<-1

while(k!=N){
et<-1
max=min(length(p),max.p)
if(length(p)>1){
et<-c(et,rep(99,max))
  for(i in 2:max){
  et[i]<-((i)*(se-(se+sp-1)*prod(1-p[1:(i)]))+1)/(i)
  }}
  m<-1:max
  m<-m[order(et)[1]]
  psz[ind]<-m
  ind<-ind+1
  p<-p[-(1:m)]
  k<-k+m
}
psz<-psz[psz!=-99]
return(psz)
}



####################################################
# Function for finding the optimal pool size (psz),                                               
# here p is a vector of all subject probabilities,
# se is sensitivity,
# sp is specificity, and
# max.p is the maximum allowable pool size.

#' @title Find the optimal pool size for Optimal Dorfman or
#' Thresholded Optimal Dorfman
#' 
#' @description Find the optimal common pool size for Optimal Dorfman
#' (OD) or Thresholded Optimal Dorfman (TOD) testing.
#' 
#' @param p a vector of each individual's probability of infection.
#' @param max.p the maximum allowable pool size.
#' @inheritParams opt.info.dorf
#' 
#' @details This function finds the optimal common pool size for OD or
#' TOD testing. Using OD testing, all individuals are tested using an 
#' optimal common pool size. Using TOD testing, individuals are partitioned 
#' into low-risk and high-risk groups, and all low-risk individuals are 
#' tested using an optimal common pool size. See McMahan et al. (2012) for 
#' additional details on the implementation of OD or TOD testing.
#' 
#' @return The optimal common pool size for OD or TOD testing.
#'  
#' @author This function was originally written by Christopher S. 
#' McMahan for McMahan et al. (2012). The function was obtained 
#' from \url{http://chrisbilder.com/grouptesting}.
#' 
#' @references 
#' \insertRef{McMahan2012a}{binGroup}
#' 
#' @seealso 
#' \url{http://chrisbilder.com/grouptesting}
#' 
#' @family Informative Dorfman functions
#' 
#' @examples 
#' # This example takes less than 1 second to run.
#' # Estimated running time was calculated using a 
#' #   computer with 16 GB of RAM and one core of an 
#' #   Intel i7-6500U processor.
#' set.seed(8135)
#' p.vec <- p.vec.func(p=0.02, alpha=1, grp.sz=10)
#' opt.pool.size(p=p.vec, max.p=3, se=0.95, sp=0.95)

opt.pool.size<-function(p, max.p, se=1, sp=1){
               
N<-length(p)
M<-min(N, max.p)
p<-sort(p)
e0<-N
psz<-2
L<-ceiling(N/psz)
if(N<(L*psz)){psz.vec<-c(rep(psz,L-1),(N-psz*(L-1)))}
if(N==(L*psz)){psz.vec<-rep(psz,L)}
e1<-0
sub.ind<-c(0,cumsum(psz.vec))
for(m in 1:L){
e1<-e1+(1+psz.vec[m]*(se+(1-se-sp)*prod(1-p[(sub.ind[m]+1):sub.ind[m+1]])))
}

while(e1<e0 & psz<=max.p){
e0<-e1
psz<-psz+1
L<-ceiling(N/psz)
if(N<(L*psz)){psz.vec<-c(rep(psz,L-1),(N-psz*(L-1)))}
if(N==(L*psz)){psz.vec<-rep(psz,L)}
e1<-0
sub.ind<-c(0,cumsum(psz.vec))
for(m in 1:L){
e1<-e1+(1+psz.vec[m]*(se+(1-se-sp)*prod(1-p[(sub.ind[m]+1):sub.ind[m+1]])))
}}

return(psz-1)
}



######################################################################
# Thresholding function for Dorfman Procedure, finds threshold (p.star).                                                                   
# Here p is a vector of all subject probabilities,
# se is sensitivity,
# sp is specificity, and
# psz is the initial pool size.

#' @title Find the optimal threshold value for Thresholded Optimal 
#' Dorfman testing
#' 
#' @description Find the optimal threshold value for Thresholded Optimal
#' Dorfman (TOD) testing.
#' 
#' @param p a vector of each individual's probability of infection.
#' @param psz the initial pool size.
#' @inheritParams opt.info.dorf
#' 
#' @details This function finds the optimal threshold value for TOD 
#' testing for situations where the threshold value is not specified. 
#' See McMahan et al. (2012) for additional details on the implementation 
#' of TOD testing.
#' 
#' @return The optimal threshold value for TOD testing.
#'  
#' @author This function was originally written by Christopher S. 
#' McMahan for McMahan et al. (2012). The function was obtained 
#' from \url{http://chrisbilder.com/grouptesting}.
#' 
#' @references 
#' \insertRef{McMahan2012a}{binGroup}
#' 
#' @seealso 
#' \url{http://chrisbilder.com/grouptesting}
#' 
#' @family Informative Dorfman functions
#' 
#' @examples
#' # This example takes approximately 4 seconds to run. 
#' # Estimated running time was calculated using a 
#' #   computer with 16 GB of RAM and one core of an 
#' #   Intel i7-6500U processor.
#' \dontrun{
#' set.seed(3154)
#' p.vec <- p.vec.func(p=0.10, alpha=0.5, grp.sz=1000)
#' thresh.val.dorf(p=p.vec, psz=10, se=0.95, sp=0.95)}
#' 
#' # This example takes less than 1 second to run.
#' # Estimated running time was calculated using a 
#' #   computer with 16 GB of RAM and one core of an 
#' #   Intel i7-6500U processor.
#' p.vec <- p.vec.func(p=0.15, alpha=2, grp.sz=100)
#' thresh.val.dorf(p=p.vec, psz=10, se=0.95, sp=0.95)

thresh.val.dorf<-function(p, psz, se=1, sp=1){
  N<-length(p)
  p<-sort(p)
  L<-ceiling(N/psz)
  p.star<-1
  upper<-N
  lower<-upper-psz+1
  lower<-max(1,lower)
  Ex.pool<-length(p[lower:upper])*(se+(1-se-sp)*prod(1-p[lower:upper]))+1
    if(Ex.pool > length(p[lower:upper])){
        while(Ex.pool > length(p[lower:upper]) & lower > 1){
        upper<-upper-psz
        lower<-lower-psz
        lower<-max(1,lower)
        Ex.pool<-length(p[lower:upper])*(se+(1-se-sp)*prod(1-p[lower:upper]))+1
        }
    p.star<-(p[upper]+p[upper+1])/2
    }
    if(lower==1){p.star<-0}
return(p.star)

}
Any scripts or data that you put into this service are public.
binGroup documentation built on May 2, 2019, 8:57 a.m.
rdrr.io home R language documentation Run R code online
CRAN packages Bioconductor packages R-Forge packages GitHub packages
Note that we can't provide technical support on individual packages. You should contact the package authors for that.
binGroup
Evaluation and Experimental Design for Binomial Group Testing

R/Dorfman.Functions.R
In binGroup: Evaluation and Experimental Design for Binomial Group Testing

Try the binGroup package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

binGroup Evaluation and Experimental Design for Binomial Group Testing

R/Dorfman.Functions.R In binGroup: Evaluation and Experimental Design for Binomial Group Testing

Try the binGroup package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

binGroup
Evaluation and Experimental Design for Binomial Group Testing

R/Dorfman.Functions.R
In binGroup: Evaluation and Experimental Design for Binomial Group Testing