R/clearanceEstimatorBayes.R

#' @title Bayesian Hierarchical Regression on Clearance Rates
#'
#' @description
#' \code{clearanceEstimatorBayes} estimates the parasite clearance rates by using a Bayesian hierarchical model. 
#' Moreover, it provides regression analysis of clearance rates on given covariates.
#' 
#' @param data a data frame containing the profiles of patients.
#' This data frame must contain \code{id}, \code{time}, and \code{count} columns, 
#' in that order. The first column represents the IDs of patients. 
#' The second and third columns contain parasite measurements (per microliter) in different times.
#' @param covariates an optional data frame containing individual level covariates. This argument may be \code{NULL}, 
#' in which case estimation of clearance rates is of primary interest.
#' @param seed a user-specified number used to initialize a pseudorandom number generator. 
#' The default value is set to be 1234 for reproducibility. If \code{seed = NULL}, then its value will be 
#' automatically obtained from the system clock.
#' @param detect.limit detection limit of the parasite density in blood (parasites per microliter)
#' @param outlier.detect indicator of whether or not to use Flegg's outlier detection method.
#' \code{outlier.detect = TRUE} is recommended.
#' @param conf.level required confidence level for reporting credible intervals
#' @param niteration total number of simulations after the burn-in period
#' @param burnin length of the burn-in period in the MCMC used in \code{clearanceEstimatorBayes}
#' @param thin step size of the thinning process in the MCMC used in \code{clearanceEstimatorBayes}
#' @param filename the name of the csv file used to store some output elements. This file contains
#' \code{id}, \code{clearance.mean}, \code{lag.median}, and \code{tail.median}.
#'  
#' @import stats
#' @import MCMCpack
#' @import MASS
#' @import msm
#' @import mvtnorm
#' @import survival
#' @import AER
#' @import Cairo
#' @import utils
#' 
#' @export
#' 
#' @return The function \link{summary} (i.e., \link{summary.bhrcr}) can be used to obtain a summary of the results.
#' \code{clearanceEstimatorBayes} returns an object of class "bhrcr" which is a list containing:
#' \item{CALL}{function call}
#' \item{clearance.post}{posterior distributions of clearance rates}
#' \item{clearance.mean}{mean values of the posterior distributions of clearance rates}
#' \item{clearance.median}{median values of the posterior distributions of clearance rates}
#' \item{intercept.post}{posterior distributions of the intercepts (alpha_i's) in the model}
#' \item{gamma.post}{posterior distribution of gamma}
#' \item{gamma.post.thin}{thinned posterior sample of gamma}
#' \item{gamma.mean}{mean values of the posterior distribution of gamma}
#' \item{gamma.median}{median values of the posterior distribution of gamma}
#' \item{gamma.CI}{Credible intervals for gamma}
#' \item{halflifeslope.post}{posterior distribution for the effect of covariates on log half-lives}
#' \item{halflifeslope.mean}{mean values of the posterior distribution for the effect of covariates on log half-lives}
#' \item{halflifeslope.median}{median values of the posterior distribution for the effect of covariates on log half-lives}
#' \item{halflifeslope.CI}{Credible intervals for the effect of covariates on log half-lives}
#' \item{predicted.pce}{PCE estimates}
#' \item{eta.post}{posterior distribution of eta}
#' \item{changelag.post}{posterior distributions of changetime between lag and decay phases}
#' \item{changetail.post}{posterior distributions of changetime between decay and tail phases}
#' \item{lag.median}{median values of the posterior distributions of changetime between lag and decay phases}
#' \item{tail.median}{median values of the posterior distributions of changetime between decay and tail phases}
#' \item{var.epsilon.post}{posterior variance of epsilon after simulation}
#' \item{var.error.post}{thinned posterior sample of variance of epsilon}
#' \item{var.alpha.post}{posterior distribution of variance of alpha}
#' \item{var.beta.post}{posterior distribution of variance of beta}
#' \item{index}{a list containing each patient's indices in the data}
#' \item{counts}{Original parasite counts of all patients}
#' \item{counts.current}{Parasite counts of all patients after sampling censored measurements}
#' \item{t.overall}{measurement times of all patients}
#' \item{p.lag}{posterior value of the priori probability of there being a lag phase after simulation}
#' \item{p.lag.thin}{thinned posterior sample of the priori probability of there being a lag phase}
#' \item{p.tail}{posterior value of the priori probability of there being a tail phase after simulation}
#' \item{p.tail.thin}{thinned posterior sample of the priori probability of there being a tail phase}
#' \item{var1.post}{posterior distribution of c^2}
#' \item{var2.post}{posterior distribution of d^2}
#' \item{mu1.post}{posterior distribution of a}
#' \item{mu2.post}{posterior distribution of b}
#' \item{detect.limit}{the detection limit of parasitemia}
#' \item{lag.post}{posterior distributions of index of changetime between lag and decay phases}
#' \item{lag2.post}{posterior distributions of index of changetime between decay and tail phases}
#' \item{theta.post}{posterior distributions of log-parasite-count's mean in lag phase}
#' \item{theta2.post}{posterior distributions of log-parasite-count's mean in tail phase}
#' \item{burnin}{length of the burn-in period}
#' 
#' @details
#' This function estimates parasite clearance rates, along with the effect of covariates on them, 
#' by using the Bayesian hierarchical model which was introduced in Fogarty et al. (2015). 
#' A change point model is used on the log of the parasite densities to account for three potential phases: 
#' (1) a constant phase (the lag phase); (2) a phase with a linear decrease (decay phase); 
#' (3) another constant phase (the tail phase). Hence the estimation of the parasite clearance rate is only 
#' based on observations within the decay phase. The Bayesian approach allows us to treat the delineation between
#' lag, decay, and tail phases within an individual's clearance profile as themselves being random variables,
#' thus taking into account the additional uncertainty of boundaries between phases.
#' Details are in Fogarty et al. (2015).
#' 
#' @author Colin B. Fogarty <cfogarty@mit.edu>, Saeed Sharifi-Malvajerdi <saeedsh@wharton.upenn.edu>, Feiyu Zhu <feiyuzhu@sas.upenn.edu>
#' 
#' @references Flegg, J. A., Guerin, P. J., White, N. J., & Stepniewska, K. (2011). 
#' Standardizing the measurement of parasite clearance in falciparum malaria: the parasite clearance estimator. 
#' Malaria journal, 10(1), 339.
#' @references Fogarty, C. B., Fay, M. P., Flegg, J. A., Stepniewska, K., Fairhurst, R. M., & Small, D. S. (2015). 
#' Bayesian hierarchical regression on clearance rates in the presence of "lag" and "tail" phases 
#' with an application to malaria parasites. Biometrics, 71(3), 751-759.
#' 
#' @examples
#' \dontshow{
#' data("pursat")
#' data("pursat_covariates")
#' data = pursat[pursat["id"] <= 80 & pursat["id"] > 70,]
#' covariates = pursat_covariates[71:80,]
#' out <- clearanceEstimatorBayes(data = data, covariates = covariates, outlier.detect = TRUE,
#'                               niteration = 3, burnin = 1, thin = 1)
#' }
#' \donttest{
#' data("pursat")
#' data("pursat_covariates")
#' out <- clearanceEstimatorBayes(data = pursat, covariates = pursat_covariates, outlier.detect = TRUE,
#'                                niteration = 200, burnin = 50, thin = 10)
#' }
#' 


clearanceEstimatorBayes = function(data, covariates = NULL, seed = 1234, detect.limit = 40, outlier.detect = TRUE, conf.level = .95, niteration = 100000, burnin = 500, thin = 50, filename = "output.csv"){

base.log = exp(1)

if (!identical(
  dimnames(data)[[2]][1:3],
  c("id","time","count"))){
  stop("first three columns of data must be named 'id', 'time', and 'count' in that order, names are case sensitive")
}

if(!is.null(seed)) set.seed(seed)

# show users this message ealier than the actual start of MCMC
print(paste("Progress starts. It may take a while..."))

# save the current working directory
dir.old <- getwd()
# get current working directory
dir <- getwd()

if(is.null(data$Predicted)) {
  par(ask = FALSE)
  #cat("\n Calculating WWARN PCE Estimates... \n")
  print("Calculating WWARN PCE Estimates...")
  if (!dir.exists(paste(dir, "/PceEstimates", sep="")))
    {
      dir.create(paste(dir, "/PceEstimates", sep=""))
    }
    setwd(paste(dir, "/PceEstimates", sep=""))
    sink("PceEstimates.txt")
    suppressWarnings(
        suppressMessages(calculateParasiteClearance(Name = "pceestimates", minpara = detect.limit, data1 = data))
    )
    sink()
	data = read.csv("pceestimates_Estimates.csv")
	data = as.data.frame(data)
	
	# If users want to use Flegg's method to detect outliers (outlier.detect == TRUE), then
	# after running the PCE method, we detect and exclude points with outlier status 1 and 2
	# outlier status 3 appears in a tail phase, we don't remove that
	if (outlier.detect == TRUE) {
	  data = data[(data$Outlier!=1 & data$Outlier!=2), ]
	}
	
	# If users don't want to detect outliers in this case (i.e. data doesn't have the Predicted column)
	# a warning message should be reported
	if (outlier.detect == FALSE) {
	  warning("You may forget to detect outliers.", 
	          "\n If outliers before the tail phase were removed from the input data by yourself, please ignore this warning.",
	          "\n Otherwise, please re-run the function by setting outlier.detect = TRUE and the outliers will be automatically removed for you.")
	}
}
# restore the original working directory
setwd(dir.old)

# If data contains predicted PCE (generated by users or by our program) but the type 1 or type 2 outliers are not eliminated, 
# then the program ends and produces an error message.
if(!is.null(data$Precited) & (is.element(1, data$Outlier) | is.element(2, data$Outlier))) {
  stop("There are still outliers before the tail phase, please remove them and then predict Flegg's PCE.",
       "\nYou can choose to delete the old predicted PCE column and then run the program with outlier.detect = TRUE,",
       "\nwhich will automatically clean the data set and generate a predicated PCE column by Flegg's method.")
}

#cat("\n Conducting Bayesian Analysis... \n")
print("Conducting Bayesian Analysis...")
id = data$id
unique.id = unique(id)
t.overall = data[,2]
if(is.null(t.overall))
{
  t.overall = data$time
}
counts = data[,3]
predicted.flegg = data$Predicted/log(base.log)
outlier.detect = F

if(!is.null(covariates)){
  d = c()
  covariates = as.data.frame(covariates)
  for(i in 1:ncol(covariates))
  {
    if(all(covariates[,i] == 1))
    {
      d = c(d, i)
    }
  }
  if(!is.null(d)){
    covariates = covariates[,-c(d)]
    
  }
  covariates = as.data.frame(covariates)
  names.cov = c(colnames(covariates))
  b = runif(nrow(covariates))
  X.cov = lm(b ~., data = as.data.frame(covariates), x = T)$x
  X.cov = as.matrix(X.cov)
  
} 


if(is.null(covariates)){ 
  X.cov = t(t(rep(1, length(unique.id))))
}

nobs = length(unique(id))

lag2.current = rep(0, nobs)

nsim = burnin + niteration

index = vector("list", nobs)
alpha.current <- rep(1, nobs)
beta.current <- rep(-.05, nobs)
outlier = vector("list",nobs)
maximum.t = rep(0, nobs)
for(i in 1:nobs)
{
  
  temp = counts[id==unique.id[i]]
  count.temp = temp
  count.temp[count.temp == 0] = detect.limit
  index[[i]] = which(id == unique.id[i]) #indices for the ith individual
  maximum.t[i] = max(t.overall[index[[i]]]) #maximum t observed for ith individual
  outlier[[i]] = rep(0,length(index[[i]]))
  lag2.current[i] = length(count.temp)
  vec = lm(log(count.temp, base = base.log)~t.overall[index[[i]]])$coef
  alpha.current[i] <- vec[1]
  beta.current[i] <- vec[2]
  if(beta.current[i] >= 0 || is.na(beta.current[i]))
  {
    beta.current[i] = -.05/log(base.log, 10)
  }
  if(alpha.current[i] <= 0 || is.na(alpha.current[i]))
  {
    alpha.current[i] = 5/log(base.log, 10)
  }
  
  
}
prior.lag = 1 
prior.tail = 1 
prior.nolag = 1 
prior.notail = 1
change.current = rep(0, nobs)
change2.current = maximum.t
lag.current = rep(0, nobs)
outlier.post = rep(list(outlier),nsim)
theta.current = alpha.current
theta2.current = rep(0, nobs)
gamma.current = rep(0, ncol(X.cov))
eta.current = rep(0, ncol(X.cov))
alpha.pop.current = rep(0, nobs)
beta.pop.current = rep(0, nobs)
var.epsilon.current = 1/(log(base.log, 10))^2
p.lag = .25
p.tail = .25
c.out = 2

var.epsilonout.current = 5/(log(base.log, 10))^2
var.theta.current = 1
var.theta2.current = 1
var.beta.current = 1
var.alpha.current = 1
mu1.current =  log(3) 
mu2.current = log(1)
var1.current = 1
var2.current = 1
mu1.post = rep(0, nsim)
mu2.post = rep(0, nsim)
var1.post = rep(0, nsim)
var2.post = rep(0, nsim)
probs = rep(0, length(t.overall))
var.epsilon.post = rep(1, nsim)
var.epsilonout.post = rep(1, nsim)
var.theta.post = rep(1, nsim)
var.theta2.post = rep(1, nsim)
alpha.post = matrix(0, nobs, nsim)
beta.post = matrix(0, nobs, nsim)
beta.pop.post = matrix(0, nobs, nsim)
alpha.pop.post = matrix(0, nobs, nsim)
gamma.post = matrix(0, ncol(X.cov), nsim)
halflifeslope.post = matrix(0, ncol(X.cov), nsim)
eta.post = matrix(0, ncol(X.cov), nsim)
theta.post = matrix(0, nobs, nsim)
theta2.post = matrix(0, nobs, nsim)
var.alpha.post = rep(1, nsim)
c.out.post = rep(2, nsim)
var.beta.post = rep(1, nsim)
p.lag.post = rep(0, nsim)
p.tail.post = rep(0, nsim)
shape.post = rep(0,nsim)
rate.post = rep(0, nsim)
lag.post = matrix(0, nobs, nsim)
lag2.post = matrix(0, nobs, nsim)
change.post = matrix(0, nobs, nsim)
change2.post = matrix(0, nobs, nsim)
counts.current = counts
X = cbind(rep(1, length(t.overall)), t.overall)
prob.out = .2
same = rep(0, nobs)
prob.out.post = rep(0,nsim)
tripX = solve(t(X.cov)%*%X.cov)%*%t(X.cov)

for(i in 1:nsim)
{
  SSE = 0
  sum.delta = 0
  sum.delta2 = 0
  n.total = 0
  n.delta = 0
  n.delta2 = 0
  SSEout = 0
  sum.deltaout = 0
  sum.delta2out = 0
  s.delta2out= rep(0, nobs)
  n.totalout = 0
  n.deltaout = 0
  n.delta2out = 0
  nindivid.out = rep(0, nobs)
  for(j in 1:nobs)
  {	
    ind = index[[j]]
    out = outlier[[j]]
    cc = counts[ind]
    t = t.overall[ind]
    lt = length(t)
    #################### sample Censored Observations ######################
    s = which(cc < detect.limit)
    if(length(s) > 0)
    {
      for(k in s)
      {
        if(lag2.current[j] < k)
        {
          new = base.log^(rtnorm(1, theta2.current[j], sqrt(var.epsilon.current)*(1-out[k]) + sqrt(var.epsilonout.current)*out[k], upper = log(detect.limit,base=base.log)))
        }
        else
        {
          new = base.log^(rtnorm(1, alpha.current[j] + beta.current[j]*t[k], sqrt(var.epsilon.current)*(1-out[k]) + sqrt(var.epsilonout.current)*out[k], upper = log(detect.limit,base = base.log)))
        }
        counts.current[ind[k]] = new
      }
    }
    #########################################################################
    
    lc = unlist(log(counts.current[index[[j]]], base = base.log))
    beta.old = -beta.current[j]
    alpha.old = alpha.current[j]
    
    t.ind = ((lag.current[j]+1):lag2.current[j])
    
    ########## Defining the z_ij values ############################
    if(lag.current[j] != lag2.current[j])
    {
      tnew = c(rep(change.current[j], lag.current[j]), t[t.ind], rep(change2.current[j], (length(t)-lag2.current[j])))
    }else{
      tnew = c(rep(change.current[j], lag.current[j]), rep(change2.current[j], (length(t)-lag2.current[j])))
    }
    ################################################################
    outnew = out
    Xt = cbind(rep(1, length(t)), -tnew)
    
    m.a = exp(alpha.pop.current[j] + .5*var.alpha.current)
    m.b = exp(beta.pop.current[j] + .5*var.beta.current)
    v.a = (exp(var.alpha.current) - 1)*exp(2*alpha.pop.current[j] + var.alpha.current)
    v.b = (exp(var.beta.current) - 1)*exp(2*beta.pop.current[j] + var.beta.current)
    y.new = unlist(c(lc, m.a, m.b))
    X.new = rbind(Xt, diag(2))
    V.new = var.epsilon.current*diag((lt+2))
    if(lt != 1)
    {
      diag(V.new[1:lt,1:lt]) = var.epsilon.current*(1-outnew) + var.epsilonout.current*outnew
    }else {
      V.new[1:lt,1:lt] = var.epsilon.current*(1-outnew) + var.epsilonout.current*outnew
    }	   
    Vinv = diag(1/diag(V.new[1:lt, 1:lt])) 
    if(lt == 1)
    {
      Vinv = 1/V.new[1,1]
    }
    diag(V.new)[(lt+1):(lt+2)] = c(v.a, v.b)
    X3 = solve(t(X.new)%*%solve(V.new)%*%X.new)%*%t(X.new)%*%solve(V.new)
    Var.b = 1.3*solve(t(X.new)%*%solve(V.new)%*%X.new)
    mean = X3%*%y.new
    t.list = -t.overall[index[[j]]]
    vec = c(0,0)
    
    vec.old = c(alpha.old, beta.old)
    vec[2] = rtnorm(1, mean[2], sqrt(Var.b[2,2]), lower = 0)
    condvar = Var.b[1,1] - Var.b[2,1]^2/Var.b[2,2]
    condmean = mean[1] + Var.b[2,1]/Var.b[2,2]*(vec[2] - mean[2])
    vec[1] = rtnorm(1, condmean, sqrt(condvar), lower = 0)
    condmean.old = mean[1] + Var.b[2,1]/Var.b[2,2]*(vec.old[2] - mean[2])
    beta.new = vec[2]
    alpha.new = vec[1]
    lratio = -.5*t(lc - Xt%*%vec)%*%Vinv%*%(lc-Xt%*%vec)-log(beta.new) - log(alpha.new) - (log(beta.new)-beta.pop.current[j])^2/(2*var.beta.current) -(log(alpha.new)-alpha.pop.current[j])^2/(2*var.alpha.current) - (-.5*t(lc - Xt%*%vec.old)%*%Vinv%*%(lc-Xt%*%vec.old) -log(beta.old) - log(alpha.old) - (log(beta.old)-beta.pop.current[j])^2/(2*var.beta.current) - (log(alpha.old)-alpha.pop.current[j])^2/(2*var.alpha.current)) + dtnorm(beta.old, mean[2], sqrt(Var.b[2,2]), lower = 0, log = T) + dtnorm(alpha.old, condmean.old, sqrt(condvar), lower = 0,log = T) - dtnorm(beta.new, mean[2], sqrt(Var.b[2,2]), lower = 0, log = T) - dtnorm(alpha.new, condmean, sqrt(condvar), lower = 0, log = T)
    # lratio 
    vec[2] = -beta.old
    vec[1] = alpha.old
    if(log(runif(1)) <= lratio)
    {
      beta.current[j] = -beta.new
      alpha.current[j] = alpha.new
      theta2.current[j] = alpha.current[j]+ beta.current[j]*change2.current[j]
      theta.current[j] = alpha.current[j] + beta.current[j]*change.current[j]
    }
    
    if(i > 2)
    {
      if(beta.current[j] == beta.post[j,(i-1)])
      {
        same[j] = same[j] + 1
      }else{same[j] = 0}
    }				
    
    lc = unlist(log(counts.current[index[[j]]], base = base.log))
    t = t.overall[index[[j]]]
    int = rep(0, lag2.current[j]+1)
    for(k in 0:(lag2.current[j]))
    {
      
      if(k == 0)
      {
        int[(k+1)] = densityintegrate.LN(0, change2.current[j], 0, lag2.current[j], alpha.current[j], beta.current[j], t, lc, var.epsilon.current, var.epsilonout.current, out, mu1.current, var1.current)*(1-p.lag)
      }else if(k == length(t)){
        int[(k+1)] = densityintegrate.LN(t[length(t)], change2.current[j], k, lag2.current[j], alpha.current[j], beta.current[j], t, lc, var.epsilon.current, var.epsilonout.current, out, mu1.current, var1.current)*(p.lag)
      }else{
        int[(k+1)] = integrate(densityintegrate.LN, lower = t[k], upper = min(t[(k+1)], change2.current[j]), change2.current[j], k, lag2.current[j], alpha.current[j], beta.current[j], t, lc, var.epsilon.current, var.epsilonout.current, out, mu1.current, var1.current)$value*(p.lag)
      }	
    }
    
    
    probs = int/sum(int)
    lag.current[j] = sample(0:(lag2.current[j]),size = 1, prob = probs)
    choice = lag.current[j] + 1
    
    k = lag.current[j]
    if(k == 0)
    {
      change.current[j] = 0
    }else if(k==length(index[[j]])){change.current[j] = t[length(t)]
    
    }else{	
      logmaxval = optimize(densitymax.LN, c(t[k], min(t[(k+1)], change2.current[j])), change2.current[j], k, lag2.current[j], alpha.current[j], beta.current[j], t, lc, var.epsilon.current, var.epsilonout.current, out, mu1.current, var1.current, maximum = TRUE)$objective
      stop = FALSE
      while(stop == FALSE)
      {
        prop = runif(1, t[k], min(t[(k+1)], change2.current[j]))
        logdensity.value = densitymax.LN(prop, change2.current[j], k, lag2.current[j], alpha.current[j], beta.current[j], t, lc, var.epsilon.current, var.epsilonout.current, out,mu1.current, var1.current)
        ratio = exp(logdensity.value- logmaxval)
        if(runif(1) <= ratio){
          change.current[j] = prop
          stop = TRUE
        }
      }
    }
    
    
    
    
    N = length((lag.current[j]):length(index[[j]]))
    int = rep(0, N)
    for(k in (lag.current[j]):(length(index[[j]])))
    {
      
      if(k == length(index[[j]]))
      {
        int[(k - (lag.current[j]-1))] = densityintegrate2.LN(maximum.t[j], change.current[j], lag.current[j], k, alpha.current[j], beta.current[j], t, lc, var.epsilon.current, var.epsilonout.current, out, mu2.current, var2.current)*(1-p.tail)
      }else if(k == 0){
        int[(k - (lag.current[j]-1))] = densityintegrate2.LN(0, change.current[j], lag.current[j], k, alpha.current[j], beta.current[j], t, lc, var.epsilon.current, var.epsilonout.current, out, mu2.current, var2.current)*(p.tail)
      }else{
        
        int[(k - (lag.current[j]-1))] = integrate(densityintegrate2.LN, lower = max(t[k], change.current[j]), upper = t[(k+1)], change.current[j], lag.current[j], k, alpha.current[j], beta.current[j], t, lc, var.epsilon.current, var.epsilonout.current, out, mu2.current, var2.current)$value*(p.tail)
      }
      
    }
    prior = c(rep((1-p.tail), (N-1)), p.tail)
    probs2 = int/sum(int)
    if(length(probs2) > 1)
    {
      lag2.current[j] = sample((lag.current[j]):(length(index[[j]])), size = 1, prob = probs2)
    }
    if(length(probs2)==0)
    {
      lag2.current[j] = length(index[[j]])
    }
    
    k = lag2.current[j]
    
    if(k == length(index[[j]]))
    {
      change2.current[j] = t[length(t)]
    }else if(k==0){
      change2.current[j] = 0
    }else{
      logmaxval2 = optimize(densitymax2.LN, c(max(t[k], change.current[j]), t[k+1]), change.current[j], lag.current[j], k, alpha.current[j], beta.current[j], t, lc, var.epsilon.current, var.epsilonout.current, out, mu2.current, var2.current, maximum = TRUE)$objective
      
      
      stop = FALSE
      while(stop == FALSE)
      {
        prop = runif(1, max(t[k], change.current[j]), t[(k+1)])
        logdensity.value = densitymax2.LN(prop, change.current[j], lag.current[j], k, alpha.current[j], beta.current[j], t, lc, var.epsilon.current, var.epsilonout.current, out, mu2.current, var2.current)
        ratio = exp(logdensity.value- logmaxval2)
        if(runif(1) <= ratio)
        {
          change2.current[j] = prop
          stop = TRUE
        }
      }
    }
    
    
    
    
    if(outlier.detect == T)
    {
      outlier[[j]] = identify.outlier(alpha.current[j], beta.current[j],theta.current[j], theta2.current[j], var.epsilon.current, var.epsilonout.current, var.epsilon.current, var.epsilonout.current, var.epsilon.current, var.epsilonout.current,t, lc, lag.current[j], lag2.current[j], prob.out)
    }
    out.current = outlier[[j]]
    
    ind = index[[j]]
    t.ind = (lag.current[j]+1):(lag2.current[j])	
    if(lag.current[j] != lag2.current[j])
    {
      tnew = c(rep(change.current[j], lag.current[j]), t[t.ind], rep(change2.current[j], (length(t)-lag2.current[j])))
    }else{
      tnew = c(rep(change.current[j], lag.current[j]), rep(change2.current[j], (length(t)-lag2.current[j])))
    }
    
    
    fit = alpha.current[j] + beta.current[j]*tnew
    
    SSE = SSE + sum((1-out.current)*(lc - fit)^2)
    SSEout = SSEout + sum((out.current)*(lc - fit)^2)
    n.total = n.total + sum(1-out.current)
    n.totalout = n.totalout + sum(out.current)
  }
  
  shape = .5*(n.total + n.delta + n.delta2 +n.totalout + n.deltaout + n.delta2out) 
  rate = .5*(SSE + sum.delta + sum.delta2) + .5*(SSEout + sum.deltaout + sum.delta2out)/c.out
  var.epsilon.current = 1/rgamma(1, shape, rate)
  
  
  
  
  if(outlier.detect == T)
  {
    var.epsilonout.current = var.epsilon.current
    shapeout = .5*(n.totalout + n.deltaout + n.delta2out) + 2.5
    rateout = .5*(SSEout + sum.deltaout + sum.delta2out)/var.epsilon.current + 5
    e = 1
    while(var.epsilonout.current <= var.epsilon.current & e <= 40)
    {
      c.out = 1/rgamma(1, shapeout, rateout)
      var.epsilonout.current = c.out*var.epsilon.current
      if(e == 40)
      {
        var.epsilonout.current = var.epsilon.current
      }
      e = e + 1
    }
    if(var.epsilonout.current > 1e3)
    {
      var.epsilonout.current = 1e3
    }
  }	
  gamma.current = mvrnorm(1, tripX%*%log(-beta.current), var.beta.current*solve(t(X.cov)%*%X.cov))
  eta.current = mvrnorm(1, tripX%*%log(alpha.current), var.alpha.current*solve(t(X.cov)%*%X.cov))
  theta.pop.current = rnorm(1,mean(theta.current), sqrt(var.theta.current/nobs))
  
  beta.pop.current = X.cov%*%gamma.current
  alpha.pop.current = X.cov%*%eta.current
  shape.beta = .5*nobs - 1/2
  rate.beta = .5*sum((log(-beta.current) - beta.pop.current)^2)
  var.beta.current = 1/rgamma(1, shape.beta, rate.beta)
  shape.alpha = .5*nobs - 1/2
  rate.alpha = .5*sum((log(alpha.current) - alpha.pop.current)^2)
  var.alpha.current = 1/rgamma(1, shape.alpha, rate.alpha)
  
  
  lcurrent = log(counts.current, base = base.log)
  
  c1.temp = change.current[change.current > 0]
  c2.temp = change2.current[change.current > 0]
  if(length(c1.temp) >= 0)
  {
    
    v.post = 1/(length(c1.temp)/var1.current + 1/.25)
    m.post = v.post*(sum(log(c1.temp))/var1.current + log(6)/.25) 		
    
    
    mu1.prop = rnorm(1, m.post, sqrt(v.post))
    
    lratio = sum(pnorm(log(c2.temp)), mu1.current, sqrt(var1.current), log.p = T) -  sum(pnorm(log(c2.temp)), mu1.prop, sqrt(var1.current), log.p = T)
    if(log(runif(1)) < lratio)
    {
      mu1.current = mu1.prop
    }
    var1.prop = 1/rgamma(1, length(c1.temp)/2+1, .5*sum((log(c1.temp) - mu1.current)^2) + 1 )
    lratio = sum(pnorm(log(c2.temp)), mu1.current, sqrt(var1.current), log.p = T) -  sum(pnorm(log(c2.temp)), mu1.current, sqrt(var1.prop), log.p = T)
    if(log(runif(1)) < lratio)
    {
      var1.current= var1.prop
    }
    
  }
  c2.temp = change2.current[change2.current < maximum.t]
  c1.temp =  change.current[change2.current < maximum.t]
  maxt.temp = maximum.t[change2.current < maximum.t]
  if(length(c2.temp) >= 0)
  {
    v.post = 1/(length(c2.temp)/var2.current + 1/.25)
    m.post = v.post*(sum(log(maxt.temp - c2.temp))/var2.current + log(6)/.25)
    
    mu2.prop = rnorm(1, m.post, sqrt(v.post))
    lratio = sum(pnorm(log(maxt.temp - c1.temp)), mu2.current, sqrt(var2.current), log.p = T) -  sum(pnorm(log(maxt.temp - c1.temp)), mu2.prop, sqrt(var2.current), log.p = T)
    if(log(runif(1)) < lratio)
    {
      mu2.current = mu2.prop
    }
    var2.prop = 1/rgamma(1, length(c1.temp)/2 + 1, .5*sum((log(maxt.temp - c2.temp) - mu2.current)^2) + 1 )
    lratio = sum(pnorm(log(maxt.temp - c1.temp)), mu2.current, sqrt(var2.current), log.p = T) -  sum(pnorm(log(maxt.temp - c1.temp)), mu2.current, sqrt(var2.prop), log.p = T)
    if(log(runif(1)) < lratio)
    {
      var2.current= var2.prop
    }
    
  }
  
  
  
  if(outlier.detect == T)
  {	
    prob.out = rbeta(1,length(t.overall)*.005 +n.deltaout + n.delta2out + n.totalout, length(t.overall)*.095 + n.delta + n.delta2 + n.total)
  }
  
  nlag = sum(lag.current!=0)
  ntail = sum(change2.current != maximum.t)
  p.lag = rbeta(1, nlag+ prior.lag, nobs - nlag + prior.nolag)
  p.tail = rbeta(1, ntail+prior.tail, nobs - ntail + prior.notail)
  beta.post[,i] = beta.current
  alpha.post[,i] = alpha.current
  theta.post[,i] = theta.current
  theta2.post[,i] = theta2.current
  beta.pop.post[,i] = beta.pop.current
  alpha.pop.post[,i] = alpha.pop.current
  gamma.post[,i] = gamma.current
  eta.post[,i] = eta.current	
  var.epsilon.post[i] = var.epsilon.current
  var.epsilonout.post[i] = var.epsilonout.current
  c.out.post[i] = c.out
  p.lag.post[i] = p.lag
  p.tail.post[i] = p.tail
  var.beta.post[i] = var.beta.current
  var.alpha.post[i] = var.alpha.current
  var.theta.post[i] = var.theta.current
  var.theta2.post[i] = var.theta2.current
  lag.post[,i] = lag.current
  lag2.post[,i] = lag2.current
  change.post[,i] = change.current
  change2.post[,i] = change2.current
  prob.out.post[i] = prob.out
  outlier.post[[i]] = outlier
  var1.post[i] = var1.current
  var2.post[i] = var2.current
  mu1.post[i] = mu1.current
  mu2.post[i] = mu2.current
  
  #if(i == 1)
      # print(paste("Progress starts. It may take a while..."))
  
  if(i %% 50 == 0)
  {
    print(paste("Progress:", i, "out of", nsim))
  }
  
}


thin = seq.int(burnin+1, nsim, thin)

halflifeslope.post = as.vector( solve(t(X.cov)%*%X.cov)%*%t(X.cov)%*%rep(log(log(2)), nrow(X.cov)))- gamma.post

clearance.median = -apply(beta.post[,thin], 1, median)
clearance.mean = -apply(beta.post[,thin], 1, mean)
gamma.median = apply(gamma.post[,thin], 1, median)
gamma.mean = apply(gamma.post[,thin], 1, mean)
halflifeslope.median = apply(halflifeslope.post[,thin], 1, median)
halflifeslope.mean = apply(halflifeslope.post[,thin], 1, mean)

outlier.freq = outlier
for(i in 1:nobs)
{
  temp = matrix(0, length(thin),length(index[[i]]))
  for(j in 1:length(thin))
  {
    temp[j,] = outlier.post[[thin[j]]][[i]]
  }
  outlier.freq[[i]] = colMeans(temp)
}
lag.median = apply(change.post[,thin], 1, median)
tail.median = apply(change2.post[,thin], 1, median)

a = (1-conf.level)/2
A = 100*a

if(!is.null(covariates))
{
  gamma.CI = apply(gamma.post[,thin], 1, quantile, c(a, .5, 1-a))
  gamma.CI = t(gamma.CI)
  rownames(gamma.CI) = colnames(X.cov)
  colnames(gamma.CI) = c(paste("CI ", A, "%", sep = ""), "Median", paste("CI ", 100-A, "%", sep = ""))
  
  halflifeslope.CI = apply(halflifeslope.post[,thin], 1, quantile, c(a, .5, 1-a))
  halflifeslope.CI = t(halflifeslope.CI)
  rownames(halflifeslope.CI) = colnames(X.cov)
  colnames(halflifeslope.CI) = c(paste("CI ", A, "%", sep = ""), "Median", paste("CI ", 100-A, "%", sep = ""))
}else{
  
  
  gamma.CI = quantile(gamma.post, c(a, .5, 1-a))
  names(gamma.CI) = c(paste("CI ", A, "%", sep = ""), "Median", paste("CI ", 100-A, "%", sep = ""))
  
  halflifeslope.CI = quantile(halflifeslope.post, c(a, .5, 1-a))
  names(halflifeslope.CI) = c(paste("CI ", A, "%", sep = ""), "Median", paste("CI ", 100-A, "%", sep = ""))
  
}

write.csv(cbind(unique.id, clearance.mean, lag.median, tail.median), filename)

rownames(change.post)  <- unique.id
rownames(change2.post) <- unique.id
names(clearance.mean) <- unique.id
names(clearance.median) <- unique.id
clearance.post = -beta.post[,thin]
dimnames(clearance.post) <- list(unique.id, NULL)
dimnames(alpha.post) <- list(unique.id, NULL)

rownames(lag.post) <- unique.id
rownames(lag2.post) <- unique.id
rownames(theta.post) <- unique.id
rownames(theta2.post) <- unique.id
names(lag.median) <- unique.id
names(tail.median) <- unique.id

## We define a class called "bhrcr" for the output, following the rule of S3 method in R.
results <- list(CALL = match.call(), clearance.median = clearance.median, clearance.mean = clearance.mean, clearance.post = clearance.post, 
                intercept.post = alpha.post[,thin], gamma.post = gamma.post, gamma.post.thin = gamma.post[,thin], gamma.median = gamma.median, gamma.mean = gamma.mean, 
                gamma.CI = gamma.CI, halflifeslope.post = halflifeslope.post[,thin], halflifeslope.median = halflifeslope.median,
                halflifeslope.mean = halflifeslope.mean, halflifeslope.CI = halflifeslope.CI,
                eta.post = eta.post[,thin], var.epsilon.post = var.epsilon.post, var.error.post = var.epsilon.post[thin], changelag.post = change.post[,thin], 
                changetail.post = change2.post[,thin], index = index, lag.post = lag.post[,thin], 
                lag2.post = lag2.post[,thin], counts.current = counts.current, t.overall = t.overall, 
                theta.post = theta.post[,thin], theta2.post = theta2.post[,thin], predicted.pce = predicted.flegg, 
                counts = counts, var.beta.post = var.beta.post[thin], var.alpha.post = var.alpha.post[thin], 
                p.lag = p.lag.post, p.lag.thin = p.lag.post[thin], p.tail = p.tail.post, p.tail.thin = p.tail.post[thin], detect.limit = detect.limit,
                var1.post = var1.post, var2.post = var2.post, mu1.post = mu1.post, mu2.post = mu2.post, 
                lag.median = lag.median, tail.median = tail.median, burnin = burnin, id = unique.id)
class(results) <- c("bhrcr", "list")
return(results)
}

Try the bhrcr package in your browser

Any scripts or data that you put into this service are public.

bhrcr documentation built on May 1, 2019, 8:41 p.m.