Nothing
#' Simultaneous Tolerance Bands (STB).
#'
#' Compute And/Or Plot Simultaneous Tolerance Bands for numeric vectors.
#'
#' Function takes a numeric vector 'vec' and computes the 100(1-alpha)\%-simultaneous tolerance band (STB) for
#' the (DEFAULT )Null-hypothesis H0: vec~N(my, sigma^2) distributed, which is equal to checking whether the residuals of
#' the simplest linear model y = mu + e (y~1) are normally distributed, i.e. 'e ~ N(0, sigma^2)'. By specification of argument \code{rand.func}
#' other null-distributions can be specified. One has to specify a function with a single argument 'n', which returns a random sample with 'n'
#' observations, randomly drawn from the desired null-distribution (see description argument \code{rand.func} below).
#' Note that all random samples as well as vector \code{vec} will be centered to mean=0 and scaled to sd=1.
#'
#' One can choose between three methods for constructing the 100(1-alpha)\% STB. There are two implementations of the quantile-based algorithm
#' ("C", "R" see 1st reference) and one of the rank-based algorithm (see 2nd reference). Methods "C" and "R" can be run in parallel. The rank-based
#' algorithm does not benefit form parallel processing, at least not in the current implementation. It is still the default and recommended for small to medium
#' sized vectors and 10000 <= N <= 20000 simulations, which should be sufficiently accurate reflect the null-distribution. The "C" and "R" options refer
#' to implementations of the quantile-based algorithm. The "C" implementation benefits most from using multiple cores, i.e. the larger 'Ncpu' the better,
#' and should be used for large problems, i.e. rather large number of elements and large number of simulations.
#'
#' The table below gives an impression how these algorithms perform. Runtimes were measured under Windows 7 on a Intel Xeon E5-2687W 3.1 GHz workstation with 16
#' logical cores and 16 GB RAM. The runtime of the C-implementation of the quantile-based algorithm is denoted as "t_qC12" applied parallely with 12 cores.
#' Each setting was repeated 10 times and the overall run time was then divided by 10 providing sufficiently robust simulation results.
#' Note, that for smaller problem sizes a large proportion of the overall runtime is due to simulating, i.e. drawing from the null-distribution.
#'
#' \tabular{rrrr}{
#' \strong{_____N_obs} \tab \strong{_____N_sim} \tab \strong{____t_rank} \tab \strong{____t_qC12} \cr
#' 25 \tab 5000 \tab 0.4s \tab 0.5s \cr
#' 25 \tab 10000 \tab 0.8s \tab 1.3s \cr
#' 50 \tab 10000 \tab 1.0s \tab 3.2s \cr
#' 100 \tab 10000 \tab 1.7s \tab 2.9s \cr
#' 100 \tab 20000 \tab 3.0s \tab 4.8s \cr
#' 225 \tab 20000 \tab 5.1s \tab 8.3s \cr
#' 300 \tab 30000 \tab 9.6s \tab 17.2s \cr
#' 300 \tab 50000 \tab 16.1s \tab 24.9s \cr
#' 1000 \tab 50000 \tab 47.8s \tab 123.5s \cr
#' }
#'
#' @param obj (numeric) vector, which is supposed to be N(my, sigma^2)-distributed
#' @param N (integer) value specifying the number of random samples to be used for constructing the STB
#' @param alpha (numeric) value specifying the simultaneous tolerance level, i.e. 100(1-alpha)\% of all 'N'
#' random samples have to be completely enclosed by the bounds of the STB
#' @param rand.func (function) a function which generates random samples, e.g. \code{rand.func=rnorm} which corresponds to random
#' sampling from the standard normal distribution. Another example is defining func=function(n){rchisq(n=n, df=3, ncp=2)}
#' and using \code{rand.func=func}. See examples for further examples.
#' @param tol (numeric) value specifying the max. acceptable deviation from 'alpha' used in the bisection algorithm
#' @param max.iter (integer) value specifying the max. number of iteration for finding the bounds of the STB
#' @param algo (character) (string) specifying the method to be used for constructing a 100(1-alpha)\% STB,
#' choose "rank" for the rank-based, "C" for a C-implementation of the quantile-based, and
#' "R" for an R-implentation of the quantile-based algorithm (see details). "C" uses SAS PCTLDEF5 definition of quantiles,
#' whereas "R" can use any of the built-in R types of quantiles (see \code{\link{quantile}}.
#' @param Ncpu (integer) specifying the number cores/CPUs to be used, for N>1 multi-processing is applied
#' @param q.type (integer) the quantile-type used if \code{algo="R"}, see ? quantile for details.
#' @param stb.col (character) string, a valid specification of a color to be used for the STB
#' @param main (characer) string for a main title appearing over the plot
#' @param add.pwb (logical) should the point-wise tolerance band be plotted for comparison?
#' @param quiet (logical) TRUE = no additional output ist printed (progress bar etc.)
#' @param add (logical) TRUE = the 100(1-alpha)\% STB is added to an existing plot
#' @param col.points (character) color for the points in the QQ-plot
#' @param col.out (character) color for points outsied of the 100(1-alpha)\% STB
#' @param col.pwb (character) color for the point-wise STB (not adjusted for multiplicity), defaults to "#0000FF40" which is "blue"
#' with 80\% transparency
#' @param plot (logical) TRUE = either a QQ-plot with STB (add=FALSE) or a STB added to an existing plot (add=TRUE) is plotted.
#' FALSE = only computations are carried out without plotting, an object of class 'STB' is returned which
#' can be stored an plotted later on, e.g. to avoid computing an STB every time a Sweave/mWeave report is updated
#' @param legend (logical) TRUE a legend is plotted "topleft"
#' @param timer (logical) TRUE = the time spent for computing the STB will be printed
#' @param pch (integer) plotting symbols for the QQ-plot
#' @param pch.out (integer) plotting symbols for outlying points
#' @param seed (numeric) value interpreted as integer, setting the random number generator (RNG) to a defined state
#' @param ... further graphical parameters passed on
#'
#' @return invisibly returns a list-type object of class \code{STB}, which comprises all arguments accepted by this function.
#'
#' @author Andre Schuetzenmeister \email{andre.schuetzenmeister@@roche.com}
#'
#' @seealso \link{plot.STB}
#'
#' @method stb default
#'
#' @references
#'
#' Schuetzenmeister, A., Jensen, U., Piepho, H.P. (2011), Checking assumptions of normality and homoscedasticity in the general linear model.
#' Communications in Statistics - Simulation and Computation; S. 141-154
#'
#' Schuetzenmeister, A. and Piepho, H.P. (2012). Residual analysis of linear mixed models using a simulation approach.
#' Computational Statistics and Data Analysis, 56, 1405-1416
#'
#' @examples
#' ### log-normal vector to be checked for normality
#' \dontrun{
#' set.seed(111)
#' stb(exp(rnorm(30)), col.out="red", legend=TRUE)
#'
#' ### uniformly distributed sample checked for Chi-Squared Distribution with DF=1, degrees of freedom
#' set.seed(707)
#' stb(runif(25, -5, 5), rand.func=function(n){rchisq(n=n, df=1)},
#' col.out="red", legend=TRUE, main="Chi-Squared with DF=1")
#'
#' ### check whether an Chi-Squared (DF=1) random sample better fits
#' stb(rchisq(25, df=1), rand.func=function(n){rchisq(n=n, df=1)},
#' col.out="red", legend=TRUE, main="Chi-Squared with DF=1")
#'
#' ### add STB to an existing plot
#' plot(sort(rnorm(30)), sort(rnorm(30)))
#' stb(rnorm(30), add=TRUE)
#'
#' ### compute STB for later use and prevent plotting
#' STB <- stb(rnorm(30), plot=FALSE)
#' }
stb.default <- function(obj, N=10000L, alpha=.05, rand.func=rnorm, tol=.0001, max.iter=100L, algo=c("rank", "C", "R"),
Ncpu=1, q.type=2L, stb.col="#0000FF40", col.points="black", col.out="red", col.pwb="#0000FF40",
main=NULL, add.pwb=FALSE, quiet=FALSE, add=FALSE, plot=TRUE, legend=FALSE, timer=FALSE, pch=16, pch.out=16, seed=NULL, ... )
{
ARGS <- list(...)
stopifnot(alpha>0 && alpha<1)
algo <- toupper(algo)[1]
algo <- match.arg(algo, c("RANK", "C", "R"))
stopifnot(q.type %in% 1:9)
vec <- obj
if(any(is.na(vec)))
vec <- na.omit(vec)
vec <- c(scale(vec, center=TRUE, scale=TRUE)) # center and scale 'vec' to have mean=0, and sd=1
vec <- sort(vec)
Nobs <- length(vec)
if(is.null(seed))
seed <- runif(1, -2^30, 2^30)
set.seed(seed)
Sample <- eval(call(name="rand.func", n=N*Nobs)) # random sampling according to 'rand.func'
if(!isTRUE(all.equal(rand.func, rnorm)))
fun <- function(x){sort(scale(x, scale=TRUE, center=TRUE))} # mean-centering, scaling to sd=1 and sorting if 'rand.func' is not N(0,1)
else
fun <- sort # only sorting if N(0,1)
mc.mat <- t(apply(matrix(Sample, ncol=Nobs), 1, fun)) # 'N' random samples from N(0,1)
if(algo=="C")
{
stb <- fastSTB(mc.mat, alpha=alpha, tol=tol, max.iter=max.iter, timer=timer, Ncpu=Ncpu )
}
else if(algo=="R")
{
stb <- getSTB( mc.mat, alpha=alpha, output=quiet, tol=tol, max.iter=max.iter, # compute bounds of the STB
q.type=q.type, timer=timer, Ncpu=Ncpu)#, q.Rtype=q.Rtype)
}
else
{
stb <- rankSTB(mc.mat, alpha=alpha)
}
means <- apply(mc.mat, 2, mean) # compute means for each order-statistic, i.e. X-values for the QQ-plot
Q <- stb$Q # lower and upper bound of the simultaneous tolerance band (STB)
# extend STB beyond smallest and largest observation for a nicer look
ll <- -10 * ( (Q[1,2]-Q[1,1])/ (means[2]-means[1]) ) # Y-value for x=-10 corresponding to the left-lower bound
lu <- -10 * ( (Q[2,2]-Q[2,1])/ (means[2]-means[1]) ) # left-upper
rl <- 10 * ( (Q[1,Nobs]-Q[1,(Nobs-1)])/ (means[Nobs]-means[(Nobs-1)]) ) # right-lower
ru <- 10 * ( (Q[2,Nobs]-Q[2,(Nobs-1)])/ (means[Nobs]-means[(Nobs-1)]) ) # right-upper
if(add.pwb)
{
pw.Q <- apply(mc.mat, 2, quantile, probs=c(alpha/2, 1-alpha/2))
ll2 <- -10 * ( (pw.Q[1,2]-pw.Q[1,1])/ (means[2]-means[1]) ) # Y-value for x=-10 corresponding to the left-lower bound
lu2 <- -10 * ( (pw.Q[2,2]-pw.Q[2,1])/ (means[2]-means[1]) ) # left-upper
rl2 <- 10 * ( (pw.Q[1,Nobs]-pw.Q[1,(Nobs-1)])/ (means[Nobs]-means[(Nobs-1)]) ) # right-lower
ru2 <- 10 * ( (pw.Q[2,Nobs]-pw.Q[2,(Nobs-1)])/ (means[Nobs]-means[(Nobs-1)]) ) # right-upper
pw.Q <- cbind( c(ll2, lu2), pw.Q, c(rl2, ru2) )
}
if(!add && plot)
plot(means, means, ylim=c(min(Q[1,]), max(Q[2,])), type="n", main=main, xlab="Theoretical Quantiles", ylab="Sample Quantiles" )
Q <- cbind(c(ll,lu) ,Q)
Q <- cbind(Q, c(rl,ru))
means <- c(-10, means, 10)
if(plot)
{
polygon(x=c(means, rev(means)), y=c(Q[1,],rev(Q[2,])), border=NA, col=stb.col )
if(!add)
{
abline(0,1)
box()
points( means[-c(1,length(means))], vec, pch=pch, col=col.points, ... )
out.high <- which(vec > Q[2,-c(1,ncol(Q))])
out.low <- which(vec < Q[1, -c(1,ncol(Q))])
out <- unique(c(out.high, out.low))
if(length(out > 0))
points(means[out+1], vec[out], pch=pch.out, col=col.out) # means[1] and means[N] have no counterpart in vec
if(legend)
legend("topleft", fill=stb.col, legend=paste("STB (", 100*round(stb$cov,5),"% coverage)",sep=""),
paste(N," Simulations", sep=""), box.lty=0, border=NA)
}
}
if(add.pwb && plot)
{
polygon(x=c(means, rev(means)), y=c(pw.Q[1,],rev(pw.Q[2,])), border=NA, col=col.pwb )
if(legend)
legend("topleft", fill=c(stb.col, rgb(t(col2rgb("blue")), maxColorValue=255, alpha=120), NA), legend=c(paste("STB (", 100*round(stb$cov,5),"% coverage)",sep=""),
paste("PWB (", 100*(1-alpha),"% coverage)",sep=""), paste(N," Simulations", sep="")), box.lty=0, border=c("black","black", NA))
}
res <- list(vec=vec, means=means, Q=Q, N=N, alpha=alpha, stb.col=stb.col, col.pwb=col.pwb, main=main, add.pwb=add.pwb,
quiet=quiet, add=add, col=col.points, col.out=col.out, coverage=stb$cov, pch=pch, pch.out=pch.out,
rand.func=rand.func, quantile.type=q.type, stb.algo=algo, legend=legend, ARGS=ARGS)
class(res) <- "STB"
invisible(res)
}
#' Plot Objects of Class 'STB'.
#'
#' Standard plotting method for objects of class 'STB'.
#'
#' This function plots objects of class 'STB' as generated by function \code{\link{stb}}.
#' Objects of S3-class 'STB' are list-type objects storing all the information
#' needed to plot QQ-plots with simultaneous tolerance bounds.
#'
#' @param x (object) of class 'STB' as generated by function \code{getSTB}
#' @param ... arguments passed to other methods
#'
#' @author Andre Schuetzenmeister \email{andre.schuetzenmeister@@roche.com}
#'
#' @method plot STB
#'
#' @seealso \link{stb}
#'
#' @examples
#' \dontrun{
#' ### generate an 'STB' object without plotting
#' obj <- stb(rnorm(30), plot=FALSE)
#' plot(obj)
#'
#' ### manipulate the 'STB' object for plotting
#' obj$legend=TRUE
#' plot(obj)
#'
#' ### add a previously generated STB-ocject to an existing plot
#' plot(sort(rnorm(30)), sort(rnorm(30)))
#' obj$add <- TRUE
#' plot(obj)
#' }
plot.STB <- function(x, ...)
{
obj <- x
stopifnot(class(obj) == "STB")
N <- length(obj$means)
if(!obj$add)
plot(obj$means[-c(1,N)], obj$means[-c(1,N)], ylim=c(min(obj$Q[1,-1]), max(obj$Q[2,-N])),
type="n", main=obj$main, xlab="Theoretical Quantiles", ylab="Sample Quantiles")
polygon(x=c(obj$means, rev(obj$means)), y=c(obj$Q[1,], rev(obj$Q[2,])), border=NA, col=obj$stb.col )
if(!obj$add)
{
abline(0,1)
box()
points( obj$means[-c(1,length(obj$means))], obj$vec, pch=obj$pch, col=obj$col )
out.high <- which(obj$vec > obj$Q[2,-c(1,ncol(obj$Q))])
out.low <- which(obj$vec < obj$Q[1, -c(1,ncol(obj$Q))])
out <- unique(c(out.high, out.low))
if(length(out > 0))
points(obj$means[out+1], obj$vec[out], pch=obj$pch.out, col=obj$col.out) # means[1] and means[N] have no counterpart in vec
if(obj$legend)
legend("topleft", fill=obj$stb.col, legend=paste("STB (", 100*round(obj$coverage,5),"% coverage)",sep=""),
paste(obj$N," Simulations", sep=""), box.lty=0, border=NA)
}
}
#' Simultaneous Tolerance Bands Using a Fast C-Implementation.
#'
#' This function represents the interface to a C-implementation of the bisection algorithm for computing
#' simultaneous tolerance bounds as described in Schuetzenmeister et al. 2012.
#'
#' Quantiles will be computed according to SAS PCTLDEF5 definition of quantiles, which is identical to 'type=2'
#' in function \code{\link{quantile}}. This is also the default-option throughout this package. Function \code{\link{SASquantile}}
#' is a R-implementation identical to the C-implementation used here.
#'
#' @param mat (numeric) matrix with rows representing simulations of 'ncol' values, rows are supposed to be sorted
#' @param alpha (numeric) 100(1-alpha)\% simultaneous tolerance level (coverage)
#' @param tol (numeric) convergence tolerance level for the bisection algorithm
#' @param max.iter (integer) number of bisection-steps for the algorithm
#' @param Ncpu (integer) specifying the number of processors (CPUs, cores, ...) to be used
#' @param timer (logical) TRUE = the time spent for computing the STB will be printed
#'
#' @return (list) with elements:
#' \item{mat}{(numeric) matrix with sorted random vector stored in rows}
#' \item{nCol}{(integer) number of columns of 'mat'}
#' \item{nRow}{(integer) number of rows of 'mat'}
#' \item{alpha}{(numeric) values used for the 100(1-alpha)\% STB}
#' \item{tol}{(numeric) the tolerance value used as stopping criterion for the bisection algorithm}
#' \item{max.iter}{(integer) the max. number of iteration of the bisection algorithm}
#' \item{Q}{(numeric) matrix with 2 rows, corresponding to the bounds of the STB (1st row = lower bounds, 2nd row = upper bounds)}
#' \item{coverage}{(numeric) value corresponding to the actual coverage of the STB}
#'
#' @author Andre Schuetzenmeister \email{andre.schuetzenmeister@@roche.com}
#'
#' @seealso \link{getSTB} \link{stb}
#'
#' @references Schuetzenmeister, A., Jensen, U., Piepho, H.P. (2011), Checking assumptions of normality and homoscedasticity in the general linear model.
#' Communications in Statistics - Simulation and Computation; S. 141-154
#'
#' @examples
#' \dontrun{
#' ### example for 10000 x 30 matrix
#' set.seed(333)
#' mat <- t(apply(matrix(rnorm(10000*30), ncol=30), 1, sort))
#' stb.obj1 <- fastSTB(mat, timer=TRUE)
#' stb.obj1$coverage
#' stb.obj2 <- fastSTB(mat, timer=TRUE, Ncpu=4)
#' stb.obj3 <- fastSTB(mat, timer=TRUE, Ncpu=6)
#' stb.obj4 <- fastSTB(mat, timer=TRUE, Ncpu=8)
#' }
fastSTB <- function(mat, alpha=.05, tol=0.0001, max.iter=100L, Ncpu=2, timer=FALSE)
{
t1 <- Sys.time()
stopifnot(alpha>0 && alpha<1)
stopifnot(is.matrix(mat) && is.numeric(mat))
stopifnot(is.numeric(max.iter) && max.iter > 1)
stopifnot(is.numeric(tol) && tol > 0)
stopifnot(is.numeric(Ncpu) && Ncpu > 0)
stopifnot(is.logical(timer))
NumCores <- detectCores()
if(Ncpu <= 0)
Ncpu <- 1 # sequential processing
if(Ncpu > NumCores)
{
warning("There are only ", NumCores," cores available but Ncpu set to ",Ncpu,"! Use ", NumCores," cores instead!")
Ncpu <- NumCores # rule of thumb for parallel-processing
}
res <- .C("getSTB", mat=as.double(mat), nCol=as.integer(ncol(mat)), nRow=as.integer(nrow(mat)), alpha=as.double(alpha),
tol=as.double(tol), max.iter=as.integer(max.iter), Ncpu=as.integer(Ncpu),
Q=double(2*ncol(mat)), coverage=double(1), PACKAGE="STB")
res$Q <- matrix(res$Q, nrow=2)
rownames(res$Q) <- c("lower", "upper")
res$mat <- matrix(res$mat, ncol=ncol(mat))
if(timer)
print(Sys.time()-t1)
return(res)
}
#' Load/unload C-lib.
#.onLoad <- function(lib, pkg)
#{
# library.dynam(chname="STB", package=pkg, lib.loc=lib)
#}
#
#.onUnload <- function(lib)
#{
# library.dynam.unload(chname="STB", libpath=lib)
#}
#' Simultaneous Tolerance Bands Using R-Implementation.
#'
#' Compute simultaneous tolerance bounds (STB) from a matrix.
#'
#' Function computes 100(1-alpha)\% simultaneous tolerance bounds (intervals, bands) from a matrix, where rows correspond
#' to order statistics (sorted) of random samples of a, e.g. normal distribution. It starts by computing joint coverage
#' of the naive unadjusted (point-wise) alpha-level intervals, computed as percentiles across each order statistic (columns).
#' Alpha is decreased using a bisection search until the joint coverage becomes at least 100(1-alpha)\% depending on arguments
#' \code{max.iter} and \code{tol}. If the number of simulated random samples goes to infinity, the STB becomes exact.
#' Note that checking whether a numeric vector is normally distributed is equal to checking whether the residuals of the simplest
#' linear model y = mu + e (y~1) are normally distributed [e ~ N(0, sigma^2)].
#'
#' @param mat (numeric) matrix with nrow=N_simulation, ncol=N_points, where each row is sorted in increasing order
#' @param alpha (numeric) 100(1-alpha)\% simultaneous tolerance-level will be computed
#' @param tol (numeric) value controlling the termination of the bisection algorithm, i.e. the condition \code{coverage-(1-alpha)<=tol}
#' has to be TRUE.
#' @param max.iter (integer) maximum number of iterations of the bisection algorithm. If this number is reached the algorithm terminates
#' and returns the best approximation of lower and upper bounds reached up to this iteration.
#' @param q.type (integer) the quantile-type used if \code{algo="R"}, see ? quantile for details.
#' @param logfile (character) string specifying the name of the (optional) log-file, storing the iteration history
#' @param output (logical) TRUE a txtProgressBar will be printed as well as additional status information
#' @param timer (logical) TRUE = the time spent for computing the STB will be printed
#' @param Ncpu (integer) specifying the number cores/CPUs to be used in computing the coverage on C-level,
#' for N>1 multi-processing is applied
#'
#' @return (list) with elements:
#' \item{Q}{(numeric) matrix with two rows, where the 1st row = lower bounds of the STB, 2nd row = upper bounds of the STB}
#' \item{cov}{(numeric) value indicating the actual coverage of the STB}
#'
#' @author Andre Schuetzenmeister \email{andre.schuetzenmeister@@roche.com}
#'
#' @seealso \link{fastSTB} \link{stb}
#'
#' @references Schuetzenmeister, A., Jensen, U., Piepho, H.P. (2011), Checking assumptions of normality and homoscedasticity in the general linear model.
#' Communications in Statistics - Simulation and Computation; S. 141-154
#'
#' @examples
#' \dontrun{
#' set.seed(333)
#' mat <- t(apply(matrix(rnorm(10000*30), ncol=30), 1, sort))
#' stb.obj <- getSTB(mat, timer=TRUE)
#' stb.obj$cov
#' }
getSTB <- function( mat, alpha=.05, tol=.0001, max.iter=100L, q.type=2L, logfile=NULL, output=TRUE, timer=FALSE, Ncpu=2)
{
t1 <- Sys.time()
stopifnot(alpha>0 && alpha<1)
stopifnot(is.matrix(mat) && is.numeric(mat))
stopifnot(is.numeric(max.iter) && max.iter > 1)
stopifnot(is.numeric(tol) && tol > 0)
stopifnot(is.numeric(q.type) && q.type %in% 1:9)
stopifnot(is.numeric(Ncpu) && Ncpu > 0)
stopifnot(is.logical(timer))
stopifnot(is.logical(output))
if( !is.null(logfile) && class(logfile) == "character" && length(logfile) > 0)
file.remove( logfile )
check <- TRUE
include <- 1-alpha # requested confidence limit
alpha.old <- alpha # for the 1st iteration
alpha <- alpha/2 # actually 1st alpha value to be used
iter <- 0
if( output ){
cat("\n\nConstruct simultaneous tolerance bounds - Progress:\n\n") # progress bar
pb <- txtProgressBar(min=0, max=max.iter, width=getOption("width")*.25, char="*", style=3)
}
nc <- ncol(mat)
nr <- nrow(mat)
best.cov <- 1
while( check )
{
iter <- iter + 1
Qs <- apply( mat, 2, quantile, probs=c(alpha,1-alpha), type=q.type ) # point-wise confidence-interval bounds
coverage <- .C( "getCoverage", mat=as.double(mat), lower=as.double(Qs[1,]), # coverage computed as C-function
upper=as.double(Qs[2,]), nCol=as.integer(ncol(mat)),
nRow=as.integer(nrow(mat)), nCPU=as.integer(Ncpu),
cov=double(1), PACKAGE="STB")$cov
if(iter == 1)
best.Qs <- Qs
if( coverage >= include ){ # at least (1-alpha)*100% coverage
if( (coverage < best.cov) ){ # new best coverage
best.alpha <- alpha
best.Qs <- Qs
best.cov <- coverage
}
}
if( !is.null(logfile) && class(logfile) == "character" && length(logfile) > 0){ # create or update logfile
if( file.exists(logfile) ){ # update
line <- paste(iter, alpha, alpha.old, abs(alpha-alpha.old)/2, (coverage * 100), coverage - include, sep="\t" )
cat( line, file=logfile, append=TRUE, sep="\n" )
}
else{ # create log-file
cat( paste("Iter", "alpha_i", "alpha_i-1", "stride", "%coverage", "Diff", sep="\t"), file=logfile, append=FALSE, sep="\n" )
line <- paste(iter, alpha, alpha.old, abs(alpha-alpha.old)/2, (coverage * 100), coverage - include, sep="\t" )
cat( line, file=logfile, append=TRUE, sep="\n" )
}
}
if( abs(alpha-alpha.old)/2 == 0 ){ ########## difference in local alpha of current iteration and previous iteration is 0
if(output){
setTxtProgressBar(pb,max.iter)
close(pb)
cat(paste(best.cov*100,"%", sep=""),"of",nrow(mat),"simulations covered simultaneously.\n\n")
}
if(!is.null(logfile) && class(logfile) == "character")
cat("\nAlgorithm stopped because the value abs(alpha-alpha.old)/2 == 0!\n", file=logfile, append=TRUE, sep="\n" )
if(timer)
print(Sys.time()-t1)
return( list( mat=mat, Q=best.Qs, coverage=best.cov ) )
}
if( iter == max.iter ){ ######### max. number of iterations reached --> use best coverage so far obtained
if(output){
setTxtProgressBar(pb,max.iter)
close(pb)
cat("\n\nMaximum number of iterations reached (max.iter =",max.iter,").\n\n")
cat(paste(best.cov*100,"%", sep=""),"of",nrow(mat),"simulated samples simultaneously covered.\n\n")
}
if(timer)
print(Sys.time()-t1)
return( list( mat=mat, Q=best.Qs, coverage=best.cov ) )
}
if( (abs(coverage - include) <= tol) && (coverage - include >= 0) ){ ######### tolerance level reached AND difference is positive
if(output){
setTxtProgressBar(pb,max.iter)
close(pb)
cat("\n\n",(coverage*100),"% of",nrow(mat),"simulated samples simultaneously covered.\n\n")
}
if(!is.null(logfile) && class(logfile) == "character")
cat(paste("\n\nAlgorithm terminated because the tolerance ",tol," was reached.", sep=""), file=logfile, append=TRUE, sep="\n" )
if(timer)
print(Sys.time()-t1)
return( list(mat=mat, Q=best.Qs, coverage=best.cov ) )
}
else{ ######### bisection step ##########
if( coverage - include < 0 ){ # alpha too large
tmp <- alpha
alpha <- alpha - abs(alpha.old-alpha)/2
alpha.old <- tmp
}
else{ # alpha too small
tmp <- alpha
alpha <- alpha + abs(alpha.old-alpha)/2
alpha.old <- tmp
}
}
if(output)
setTxtProgressBar(pb,iter)
}
}
#' Implements SAS-quantile Calculation (PCTLDEF=5) as Described by SAS-Help.
#'
#' This implementation seems to be identical with 'type=2' in function \code{\link{quantile}}
#' but less efficiently implemented, i.e. for large vectors x it is much slower than the built-in
#' quantile-function.
#'
#' @param x (numeric) vector
#' @param prob (numeric) value 0 < prob < 1
#' @param tol (numeric) value used for checking numerical equivalence
#' @param type (character) "R" = uses the R-implementation, "C" = calls the C-implementation
#' n
#' @author Andre Schuetzenmeister \email{andre.schuetzenmeister@@roche.com}
#' @examples
#' \dontrun{
#' SASquantile(1:100, .75)
#'
#' ### compare to R-default
#' quantile(1:100, .75)
#'
#' ### or to what R calls SAS-definition
#' quantile(1:100, .75, type=3)
#'
#' # should work for any vector (no seed)
#' v <- rnorm(50000,20,7)
#' Q.R2 <- quantile(v, probs=c(.01, .025, .05, .1, .25, .5, .75, .9, .95, .975, .99), type=2)
#' Q.SAS.R <- SASquantile(v, prob=c(.01, .025, .05, .1, .25, .5, .75, .9, .95, .975, .99), type="R")
#' Q.SAS.C <- SASquantile(v, prob=c(.01, .025, .05, .1, .25, .5, .75, .9, .95, .975, .99), type="C")
#'
#' Q.R2
#' Q.SAS.R
#' Q.SAS.C
#' }
SASquantile <- function(x, prob, tol=1e-12, type=c("R", "C"))
{
v <- x
type <- match.arg(toupper(type[1]), c("R", "C"))
if(type == "C")
{
tmp.res <- .C( "getSASquantile", v=as.double(v), p=as.double(prob), len=as.integer(length(v)),
numQ=as.integer(length(prob)), tol=as.double(tol), Q=double(length(prob)), PACKAGE="STB")
res <- tmp.res$Q
names(res) <- paste(100*prob, "%", sep="")
}
else
{
if( length(prob) > 1)
return(sapply(prob, function(x) return(SASquantile(v, x, type=type))))
v <- na.omit(v)
n <- length(v)
vo <- sort(v) # sort increasing order
np <- n * prob
j <- floor(np) # integer part
g <- np - j # fractional part
res <- ifelse(g < tol, mean(vo[c(j,j+1)]), vo[j+1]) # can g be treated as zero?
names(res) <- paste(100*prob, "%", sep="")
}
return(res)
}
#' Simultaneous Tolerance Bounds on Residuals and Random Effects for 'VCA' Objects.
#'
#' Simulate \eqn{N}-times data incorporating the estimated variance-covariance
#' matrix of observations \eqn{y} and construct a 100(1-alpha)\% simultaneous tolerance band.
#'
#' A Linear Mixed Models, noted in standard matrix notation, can be written as \eqn{y = Xb + Zg + e}, where
#' \eqn{y} is the column vector of observations, \eqn{X} and \eqn{Z} are design matrices assigning fixed (\eqn{b}),
#' respectively, random (\eqn{g}) effects to observations, and \eqn{e} is the column vector of residual errors.
#'
#' Here, simulation is performed incorporating the variance-covariance matrix \eqn{V = ZGZ^{T}+R}{V = ZGZ'+R} of observations \eqn{y}.
#' There are two types of random variates in a mixed model, random effects \eqn{g} and residual errors \eqn{e}. These
#' follow a multivariate normal distribution with expectation zero and covariance matrices \eqn{G} and \eqn{R}. See the 1st
#' reference for a detailed description of the properties.
#' Following Schuetzenmeister and Piepho (2012), a Cholesky decomposition \eqn{V = CC'} is applied to \eqn{V},
#' yielding the upper triangular matrix \eqn{C}, which can be used to simulate a new set of observations
#' \eqn{y_{sim}=Cz}{y_sim = Cz}, where \eqn{z} is a vector of independent standard normal deviates of the same size as \eqn{y}.
#' Actually, \eqn{y_sim = C'z} is used here, because the Cholesky decomposition in \code{R} is defined as \eqn{V=C'C}.
#' For each simulated dataset, random variates of interest ('term') are extracted, possibly transformed ('mode') and
#' stored in ordered form (order statistics) in a \eqn{N x n} matrix, \eqn{N} being the number of simulated datasets and
#' \eqn{n} being the number of random variates of interest. For each column of this matrix tolerance intervals
#' are computed iteratively untill the joint coverage is as close to but >= 100(1-alpha)/% or the max. number of
#' iterations is reached. This quantile-based algorithm is exact for \eqn{N --> Infinity}.
#'
#' SAS-quantile definition PCTLDEF=5 is used in the fast C-implementation of the STB-algorithm (\code{\link{SASquantile}}),
#' i.e. in case \code{algo="C"}. One can compute and plot two types of plots (see argument 'type'). Simultaneous tolerance
#' bands (STB) are helpful in assessing the general distribution of a random variate, i.e. checking for departure from
#' the normality assumption. Outlying observations may also be detected using STB. Simultaneous tolerance intervals (STI)
#' are taylored for identification of extreme values (possible outliers). STI are a simplification of STB, where simultaneous
#' coverage is only required for extreme values of each simulation, i.e. an STB is constructed from the min and max values
#' from all N simulations. This results in lower and upper bounds, which can be used in residuals plots for assessing outliers.
#'
#' One can choose between 3 methods for constructing the 100(1-alpha)\% STB. The fastest one is the rank-based algorithm ("rank"), which
#' should only be applied for reasonably large number of simulations (rule of thumb: N>5000). For fewer simulations,
#' the quantile-based algorithm is recommended. It exists in two flavours, a native R-implementation ("R") and a pure C-implementation ("C").
#' Both can be applied using parallel-processing (see arguments 'parallel' and 'Ncpu'). Only the R-implementation allows to specify
#' a specific quantile-definition other than \code{type=2} of function \code{\link{quantile}}.
#'
#' @param obj (VCA) object
#' @param term (character, integer) specifying a type of residuals if one of c("conditional",
#' "marginal"), or, the name of a random term (one of obj$re.assign$terms). If 'term'
#' is a integer, it is interpreted as the i-th random term in 'obj$re.assign$terms'.
#' @param mode (character) string specifying a possible transformation of random effects or
#' residuals (see \code{\link{residuals.VCA}} and \code{\link{ranef.VCA}}for details)
#' @param N (integer) specifying the number of simulated datasets \eqn{y_sim}
#' @param alpha (numeric) value 0 < alpha < 1 specifying the min. 100(1-alpha)\% coverage of the
#' simultaneous tolerance band (STB)
#' @param algo (character) (string) specifying the method to be used for constructing a 100(1-alpha)\% STB,
#' choose "rank" for the rank-based, "C" for a C-implementation of the quantile-based, and
#' "R" for an R-implentation of the quantile-based algorithm (see details).
#' @param q.type (integer) value specifying the quantile type to be used as offered in \code{\link{quantile}}
#' in case 'algo="R"'. Whenever 'algo="C"', quantiles are computed according to SAS PCTLDEF5,
#' which is identical to type 2. The rank-based algorithm does not employ quantiles.
#' @param plot (logical) TRUE = create 'stbVCA' object and plot it, FALSE = only create the 'stbVCA' object
#' @param legend (logical) TRUE = add legend to the plot(s)
#' @param main1 (character) string specifying an user-defined main-title of the 'type=1' plot (STB)
#' @param main2 (character) string specifying an user-defined main-title of the 'type=2' plot (STI)
#' @param seed (integer) value used as seed for the RNG
#' @param orient (integer) in QQ-plot, specifying whether to plot expected values vs. observed values (1)
#' or observed vs. expected (2)
#' @param type (integer) 1 = QQ-plot with simultaneous tolerance band (STB), 2 = residual plot with simultaneous
#' tolerance interval (STI), 3 = both plot at once
#' @param pb (logical) TRUE = a text-based progress bar will display the simulation progress
#' @param parallel (logical) TRUE = parallel processing will be attempted on 'Ncpu' cores of the local machine.
#' FALSE = no parallel processing applied, only in this case, 'pb' will have an effect, since this
#' is only available for non-parallel processing.
#' @param Ncpu (integer) specifying the number of CPUs on which the parallelization will be carried out.
#' In case that 'Ncup' is larger than the number of existing CPUs, the max. number of CPUs will be used
#' instead. Note, that setting 'Ncpu' to the max. number available may not result in the min. time
#' spent on computing.
#' @param ... additional arguments passed to other methods
#'
#' @return (stbVCA) object, a list with all information needed to create the QQ-plot with ~100(1-alpha)\% STB.
#'
#' @author Andre Schuetzenmeister \email{andre.schuetzenmeister@@roche.com}
#'
#' @seealso \code{\link{getSTB}}, \code{\link{fastSTB}}, \code{\link{rankSTB}}
#'
#' @examples
#' \dontrun{
#' library(VCA)
#' data(dataEP05A2_1)
#' fit <- anovaVCA(y~day/run, dataEP05A2_1)
#' fit
#'
#' # use studentized conditional residuals
#' stb.obj1 <- stb(fit, term="cond", mode="student", N=1000)
#'
#' # plot it again
#' plot(stb.obj1)
#'
#' # now request also plotting the corresponding residual plot
#' # capture additional computation results which are invisibly
#' # returned
#' stb.obj1 <- plot(stb.obj1, type=3)
#'
#' # use other type of legend in QQ-plot
#' plot(stb.obj1, stb.lpos="topleft")
#'
#' # use random effects "day" and apply standardization
#' stb.obj2 <- stb(fit, term="day", mode="stand", N=1000)
#' # plot it again
#' plot(stb.obj2)
#'
#' # more complex example
#' data(Orthodont)
#' Ortho <- Orthodont
#' Ortho$age2 <- Ortho$age - 11
#' Ortho$Subject <- factor(as.character(Ortho$Subject))
#' fit.Ortho <- anovaMM(distance~Sex+Sex:age2+(Subject)+(Subject):age2-1, Ortho)
#'
#' # studentized conditional residuals
#' stb.cr.stud <- stb(fit.Ortho, term="cond", mode="stud", N=1000)
#'
#' # same model fitted via REML (same covariance structure of random effects by
#' # constraining it to be diagonal)
#' fit.Ortho.reml1 <- remlMM(distance~Sex*age2+(Subject)*age2, Ortho, cov=FALSE)
#'
#' # allow block-diagonal covariance structure of random effects due to non-zero
#' # correlation between intercept and slope of random regression part,
#' # not 'cov=TRUE' is the default
#' fit.Ortho.reml2 <- remlMM(distance~Sex*age2+(Subject)*age2, Ortho)
#' fit.Ortho.reml1
#' fit.Ortho.reml2
#'
#' # covariance matrices of random effects 'G' differ
#' getMat(fit.Ortho.reml1, "G")[1:10, 1:10]
#' getMat(fit.Ortho.reml2, "G")[1:10, 1:10]
#'
#' # therefore, (conditional) residuals differ
#' resid(fit.Ortho.reml1)
#' resid(fit.Ortho.reml2)
#'
#' # therefore, STB differ
#'
#' # studentized conditional residuals
#' system.time({
#' stb.cr.stud.reml1 <- stb(fit.Ortho.reml1, term="cond", mode="stud",
#' N=5000, Ncpu=2, seed=11) })
#' system.time({
#' stb.cr.stud.reml2 <- stb(fit.Ortho.reml2, term="cond", mode="stud",
#' N=5000, Ncpu=4, seed=11) })
#'
#' # same seed-value should yield identical results
#' system.time({
#' stb.cr.stud.reml3 <- stb(fit.Ortho.reml2, term="cond", mode="stud",
#' N=5000, Ncpu=4, seed=11) })
#'
#' par(mfrow=c(1,2))
#' plot(stb.cr.stud.reml2)
#' plot(stb.cr.stud.reml3)
#'
#' # both type of plots side-by-side
#' plot(stb.cr.stud.reml2, type=3)
#'
#' # and enabling identification of points
#' # identified elements in the 1st plot will
#' # be automatically added to the 2nd one
#' plot(stb.cr.stud.reml2, type=3, pick=TRUE)
#'
#' # raw "day" random effects
#' stb.re.subj <- stb(fit.Ortho, term="Subject", N=1000)
#'
#' # identify points using the mouse
#' stb.re.subj <- plot(stb.re.subj, pick=TRUE, type=3)
#'
#' # now click on points
#' }
#'
#' @method stb VCA
#'
#' @seealso \code{\link{fastSTB}}
#'
#' @references
#'
#' Schuetzenmeister, A. and Piepho, H.P. (2012). Residual analysis of linear mixed models using a simulation approach.
#' Computational Statistics and Data Analysis, 56, 1405-1416
#'
#' Schuetzenmeister, A., Jensen, U., Piepho, H.P., 2012. Checking the assumptions of normality and homoscedasticity in
#' the general linear model using diagnostic plots. Communications in Statistics-Simulation and Computation 41, 141-154.
stb.VCA <- function(obj, term=NULL, mode=c("raw", "student", "standard", "pearson"),
N=5000, alpha=.05, algo=c("rank", "R", "C"), q.type=2L, plot=TRUE, legend=TRUE, orient=1, main1=NULL,
main2=NULL, seed=NULL, type=1, pb=TRUE, parallel=TRUE, Ncpu=2, ...)
{
stopifnot(class(obj) == "VCA")
stopifnot(orient %in% 1:2)
stopifnot(!is.null(term))
stopifnot(type %in% 1:3)
stopifnot(q.type %in% 1:9)
stopifnot(alpha>0 && alpha<1)
stopifnot(is.logical(plot))
stopifnot(is.logical(legend))
stopifnot(is.logical(parallel))
stopifnot(is.numeric(Ncpu) && Ncpu > 0)
stopifnot(is.logical(pb))
stopifnot(is.null(main1) || is.character(main1))
stopifnot(is.null(main2) || is.character(main2))
stopifnot(is.numeric(N) && N > 0)
algo <- toupper(algo)[1] # use 1st option if multiple specified
algo <- match.arg(algo, c("RANK", "R", "C"))
if(algo == "RANK" && N < 5000)
warning("The rank-based algorithm is most robustly used with larger number of simulations!")
if(is.null(obj$re.assign)) # solve mixed model equations
obj <- solveMME(obj)
if(length(mode) > 1)
mode <- mode[1]
if(is.null(seed))
seed <- runif(1, -2^30, 2^30)
if(!is.character(term))
{
term <- obj$re.assign$terms[as.integer(term)]
RE <- TRUE
mode.opts <- c("raw", "student", "standard")
}
else
{
term <- match.arg(term, choices=c("conditional", "marginal", obj$re.assign$terms))
if(term %in% c("conditional", "marginal"))
{
RE <- FALSE
mode.opts <- c("raw", "student", "standard", "pearson")
}
else
{
RE <- TRUE
mode.opts <- c("raw", "student", "standard")
}
}
mode <- match.arg(mode, choices=mode.opts)
m.names <- c(raw="raw", student="studentized", standard="standardized", pearson="Pearson-type")
if(is.null(main1))
main1 <- paste("QQ-Plot of ", m.names[mode], " ", ifelse(RE, "", term), ifelse(RE, " Random Effects for", " Residuals"), ifelse(RE, paste(" '", term, "'", sep=""), ""), sep="")
if(is.null(main2))
main2 <- "Residual Plot"
sti.ylab <- paste(term, " - ", m.names[mode], ifelse(RE, "' Random Effects", "' Residuals"), sep="")
V <- getMat(obj, "V")
C <- Matrix(t(chol(V))) # Cholesky decomposition of V = C'C, transpose C since in the reference it is V = CC'
if(RE)
{
vec <- ranef(obj, term=term, mode=mode)
nam <- rownames(vec)
vec <- vec[,1]
names(vec) <- nam
}
else
{
vec <- resid(obj, type=term, mode=mode)
}
Nvec <- length(vec)
G <- getMat(obj, "G")
Z <- getMat(obj, "Z")
Vi <- getMat(obj, "Vi")
REasgn <- obj$re.assign
set.seed(seed)
seeds <- runif(N, -2^30, 2^30)
if(parallel)
{
# function to be called when parallel processing is activated
Ncores <- detectCores() # number of available cores
if(is.null(Ncpu) || Ncpu > Ncores)
Ncpu <- Ncores
loopFunc <- function(iter, obj, Z, C, Vi, G, RE, term, mode, seeds) # applied for each simulation cycle
{
set.seed(seeds[iter])
obj <- obj
#y <- C %*% Matrix:::Matrix(rnorm(obj$Nobs), ncol=1)
y <- C %*% Matrix(rnorm(obj$Nobs), ncol=1)
#simRE <- G %*% Matrix:::t(Z) %*% Vi %*% y
simRE <- G %*% t(Z) %*% Vi %*% y
obj$RandomEffects <- simRE
obj$FixedEffects[,"Estimate"] <- 0
obj$Matrices$y <- y
# if(RE)
# return(sort(as.numeric(VCA:::ranef.VCA(obj, term=term, mode=mode)[,1])))
# else
# return(sort(as.numeric(VCA:::residuals.VCA(obj, type=term, mode=mode))))
if(RE)
return(sort(as.numeric(ranef.VCA(obj, term=term, mode=mode)[,1])))
else
return(sort(as.numeric(residuals.VCA(obj, type=term, mode=mode))))
}
cluster <- makePSOCKcluster(Ncpu)
vecs <- t(parSapply(cluster, 1:N, loopFunc, obj, Z, C, Vi, G, RE, term, mode, seeds)) # actually do parallel-processing of N-loops
stopCluster(cluster)
}
else
{
if(pb)
{
cat("\n Simulation Progress\n")
PB <- txtProgressBar(max=N, style=3, width=.5*unlist(options("width")), char=">")
}
for(i in 1:N)
{
y <- C %*% Matrix(rnorm(obj$Nobs), ncol=1)
#simRE <- G %*% Matrix:::t(Z) %*% Vi %*% y
simRE <- G %*% t(Z) %*% Vi %*% y
obj$RandomEffects <- simRE
obj$FixedEffects[,"Estimate"] <- 0
obj$Matrices$y <- y
if(RE)
sim <- sort(ranef(obj, term=term, mode=mode))
else
sim <- sort(resid(obj, type=term, mode=mode))
if(i == 1)
{
vecs <- matrix(sim, nrow=1)
}
else
{
vecs <- rbind(vecs, sim)
}
if(pb)
setTxtProgressBar(PB, i)
}
if(pb)
close(PB)
}
res <- list()
res$VCAobj <- obj
res$mat <- vecs
res$alpha <- alpha
res$vec <- vec
res$term <- term
res$mode <- mode
res$plot$pch <- 16
res$plot$col <- "black"
res$plot$col.out <- "red"
res$plot$pch.out <- 16
res$plot$sti.lty=2
res$plot$sti.lwd=1
res$plot$sti.col="red"
res$plot$sti.ylab <- sti.ylab
res$plot$sti.lpos <- "title"
res$plot$sti.main <- main2
res$plot$stb.col <- "#0000FF40"
res$plot$stb.lpos <- "title"
res$plot$stb.main <- main1
res$legend <- TRUE
res$RE <- RE
res$seed <- seed
if(type %in% c(1,3))
{
if(algo == "C")
stbObj <- fastSTB(vecs, alpha=alpha, Ncpu=Ncpu)
else if(algo == "R")
stbObj <- getSTB(vecs, alpha=alpha, Ncpu=Ncpu, q.type=q.type)
else
{
stbObj <- rankSTB(vecs, alpha=alpha)
stbObj$mat <- vecs
}
STB <- unclass(stbObj)
means <- apply(STB$mat, 2, mean)
res$N <- N
STB$add <- FALSE
STB$legend <- legend
# extend STB beyond smallest and largest observation for a nicer look
Q <- STB$Q
Nobs <- length(vec)
ll <- -10 * ( (Q[1,2]-Q[1,1])/ (means[2]-means[1]) ) # Y-value for x=-10 corvecsponding to the left-lower bound
lu <- -10 * ( (Q[2,2]-Q[2,1])/ (means[2]-means[1]) ) # left-upper
rl <- 10 * ( (Q[1,Nobs]-Q[1,(Nobs-1)])/ (means[Nobs]-means[(Nobs-1)]) ) # right-lower
ru <- 10 * ( (Q[2,Nobs]-Q[2,(Nobs-1)])/ (means[Nobs]-means[(Nobs-1)]) ) # right-upper
Q <- cbind(c(ll,lu) ,Q)
Q <- cbind(Q, c(rl,ru))
means <- c(-10, means, 10)
STB$means <- means
STB$Q <- Q
STB$mat <- NULL
res$STB <- STB
}
if(type %in% c(2,3))
{
if(algo == "C")
stbObj <- fastSTB(vecs[,c(1, ncol(vecs))], alpha=alpha, Ncpu=Ncpu)
else if(algo == "R")
stbObj <- getSTB(vecs[,c(1, ncol(vecs))], alpha=alpha, Ncpu=Ncpu)
else
stbObj <- rankSTB(vecs[,c(1, ncol(vecs))], alpha=alpha)
STI <- unclass(stbObj)
STI$mat <- NULL
res$STI <- STI
}
if(type == 3)
{
old.par <- par(mfrow=c(1,2))
}
else
old.par <- par
res$type <- type
class(res) <- "stbVCA"
if(plot)
plot(res, orient=orient)
par(old.par)
invisible(res)
}
#' Plot Objects of Class 'stbVCA'.
#'
#' Standard plotting method for objects of Class 'stbVCA'.
#'
#' This function plots objects of class 'stbVCA' as generated by function \code{\link{stb.VCA}}.
#' Objects of S3-class 'stbVCA' are list-type objects storing all the information
#' needed to plot QQ-plots with simultaneous tolerance bounds. Additionally to the information
#' contained in ordinary 'STB' objects, a copy of the 'VCA' object is stored as well as the
#' type of random variate and the mode, i.e. the type of transformation applied.
#'
#' One can specify additional parameters for changing the appearance of the plot(s). Any of the following
#' parameters can be set to a value different from the default: \cr
#' \tabular{lcl}{
#' \code{legend} \tab ... \tab (logical) TRUE = will add legend to plot(s) (is default) \cr
#' \code{pch} \tab ... \tab plotting symbol for non-outlying points (default=16) \cr
#' \code{col} \tab ... \tab point color for non-outlying points (default="black") \cr
#' \code{col.out} \tab ... \tab point color for outlying points (default="red") \cr
#' \code{pch.out} \tab ... \tab plotting symbold for outlying points (default=16) \cr
#' \code{stb.col} \tab ... \tab color of the STB in the QQ-plot (default="#0000FF40") \cr
#' \code{stb.lpos} \tab ... \tab position placement as done in function 'legend', or, character string "title" indicating\cr
#' \tab \tab that legend information should be displayed as (sub-main) title (default="title") \cr
#' \code{stb.main} \tab ... \tab character string used as main title in the QQ-plot with STB \cr
#' \code{sti.lty} \tab ... \tab line type for STI-bounds in the residual plot (default=2) \cr
#' \code{sti.lwd} \tab ... \tab line width for STI-bounds (default=1) \cr
#' \code{sti.col} \tab ... \tab line color for STI bounds (default="red") \cr
#' \code{sti.ylab} \tab ... \tab character string specifying the Y-axis label in the resiudal plot with STI \cr
#' \code{sti.lpos} \tab ... \tab position placement as done in function 'legend' (default="topright") \cr
#' \code{sti.main} \tab ... \tab character string used as main title in the residual plot with STI \cr
#' }
#'
#' @param x (object) of class 'STB' as generated by function \code{getSTB}
#' @param orient (integer) in 'type=1' plots, 1 = expected vs. observed values, 2 = observed vs. expected values
#' @param pick (logical, integer) TRUE = triggers function \code{\link{identify}} for identification
#' of points using the mouse. Stop interactive identification by pressing 'Esc' or
#' "Right-Mousclick --> Stop". If 'TRUE' and 'type=3', identified points will be added to the residual
#' plot with STI automatically. If 'pick=1', only in the QQ-plot identification will be toggled, and if
#' 'pick=2' only in the residual plot identification will be possible.
#' @param type (integer) 1 = plot simultaneous tolerance band (STB), 2 = plot simultaneous tolerance interval (STI),
#' 3 = plot STB and STI
#' @param ... additional arguments changing the visual appearance of the plot, e.g.
#' 'pch', 'pch.out', 'col', 'col.out', 'stb.col', 'stb.main', "'sti.main', 'legend', ...
#' in fact all elements of 'x'
#'
#' @return (stbVCA) object is invisibly returned, any additional compution results are added
#'
#' @author Andre Schuetzenmeister \email{andre.schuetzenmeister@@roche.com}
#'
#' @seealso \link{stb.VCA}
#'
#' @method plot stbVCA
#'
#' @examples
#' \dontrun{
#' library(VCA)
#' data(dataEP05A2_1)
#' fit <- anovaVCA(y~day/run, dataEP05A2_1)
#' fit
#'
#' # use studentized conditional residuals
#' stb.obj1 <- stb.VCA(fit, term="cond", mode="student", N=1000)
#'
#' # plot it again
#' plot(stb.obj1)
#'
#' # use random effects "day" and apply standardization
#' stb.obj2 <- stb.VCA(fit, term="day", mode="stand", N=1000)
#'
#' # plot it again
#' plot(stb.obj2)
#'
#' # initially, request QQ-plot with STB
#' stb.obj3 <- stb.VCA(fit, term="day", mode="stand", N=1000, type=1)
#'
#' # now request plotting of the residual plot as well
#' # catch computation result which are invisibly returned
#' stb.obj4 <- plot(stb.obj3, type=3)
#'
#' # individualize the appearance of the plot
#' plot(stb.obj4, sti.lpos="top", col="darkblue", out.pch=17, out.col="green")
#' }
plot.stbVCA <- function(x, orient=1, pick=FALSE, type=NULL, ...)
{
if(!is.null(type))
stopifnot(type %in% 1:3)
else
type <- x$type
args <- list(...)
x$plot[names(args)] <- args
XLIM <- YLIM <- NULL
if("ylim" %in% names(args))
YLIM <- args[["ylim"]]
if("xlim" %in% names(args))
XLIM <- args[["xlim"]]
obj <- x
stopifnot(class(obj) == "stbVCA")
if(type == 3)
old.par <- par(mfrow=c(1,2))
else
old.par <- par("mfrow")
m.names <- c(raw="raw", student="studentized", standard="standardized", pearson="Pearson-type")
stbYlim <- Ident <- NULL
if(type %in% c(1,3))
{
STB <- x$STB
if(is.null(STB)) # STB results no yet computed
{
STB <- unclass(fastSTB(x$mat, alpha=x$alpha))
means <- apply(STB$mat, 2, mean)
STB$N <- N
STB$add <- FALSE
STB$stb.col <- "#0000FF40"
STB$legend <- legend
# extend STB beyond smallest and largest observation for a nicer look
Q <- STB$Q
Nobs <- length(vec)
ll <- -10 * ( (Q[1,2]-Q[1,1])/ (means[2]-means[1]) ) # Y-value for x=-10 corvecsponding to the left-lower bound
lu <- -10 * ( (Q[2,2]-Q[2,1])/ (means[2]-means[1]) ) # left-upper
rl <- 10 * ( (Q[1,Nobs]-Q[1,(Nobs-1)])/ (means[Nobs]-means[(Nobs-1)]) ) # right-lower
ru <- 10 * ( (Q[2,Nobs]-Q[2,(Nobs-1)])/ (means[Nobs]-means[(Nobs-1)]) ) # right-upper
Q <- cbind(c(ll,lu) ,Q)
Q <- cbind(Q, c(rl,ru))
means <- c(-10, means, 10)
STB$means <- means
STB$Q <- Q
STB$mat <- NULL # simulated data stored in upper most level
x$STB <- STB
}
N <- length(STB$means)
nam <- names(x$vec)
ord <- order(x$vec)
vec <- x$vec[ord]
names(vec) <- nam[ord]
out.high <- which(vec > STB$Q[2,-c(1,ncol(STB$Q))])
out.low <- which(vec < STB$Q[1, -c(1,ncol(STB$Q))])
out <- unique(c(out.high, out.low))
if(orient[1] == 1)
{
Xval <- STB$means[-c(1,length(STB$means))]
Yval <- vec
Xpol <- c(STB$means, rev(STB$means))
Ypol <- c(STB$Q[1,], rev(STB$Q[2,]))
xlim <- range(Xval)
ylim <- c(min(c(STB$Q[1,-1], Yval)), max(c(STB$Q[2,-N], Yval)))
xlab <- "Expected Values"
ylab <- "Observed Values"
}
else
{
Yval <- STB$means[-c(1,length(STB$means))]
Xval <- vec
Ypol <- c(STB$means, rev(STB$means))
Xpol <- c(STB$Q[1,], rev(STB$Q[2,]))
ylim <- range(Yval)
xlim <- c(min(c(STB$Q[1,-1], Xval)), max(c(STB$Q[2,-N], Xval)))
ylab <- "Expected Values"
xlab <- "Observed Values"
}
if(is.null(x$stb.main))
x$plot$stb.main <- paste("QQ-Plot of ", m.names[x$mode], " ", ifelse(x$RE, "", x$term), ifelse(x$RE, " Random Effects for", " Residuals"), ifelse(x$RE, paste(" '", x$term, "'", sep=""), ""), sep="")
if(!is.null(XLIM))
xlim <- XLIM
if(!is.null(YLIM))
ylim <- YLIM
plot(Xval, Yval, ylim=ylim, xlim=xlim,
type="n", main=x$plot$stb.main, xlab=xlab, ylab=ylab)
polygon(x=Xpol, y=Ypol, border=NA, col=x$plot$stb.col )
if(!STB$add)
{
abline(0,1)
box()
if(length(out > 0))
{
points( Xval[-out], Yval[-out], pch=x$plot$pch, col=x$plot$col )
points(Xval[out], Yval[out], pch=x$plot$pch.out, col=x$plot$col.out) # means[1] and means[N] have no counterpart in vec
}
else
points( Xval, Yval, pch=x$plot$pch, col=x$plot$col )
if(x$legend)
{
if(x$plot$stb.lpos == "title")
{
mtext(text=bquote(paste("Simultaneous Tolerance Band (coverage = ", .(100*round(STB$coverage,5)),"% ;", N[sim]==.(x$N), ")", sep="")),
side=3, line=.25, cex=.9)
}
else
{
legend(x$plot$stb.lpos, fill=c(x$plot$stb.col, par("bg")),
legend=c(paste("STB (", 100*round(STB$coverage,5),"% coverage)",sep=""),
paste("N=", x$N," Simulations", sep="")),
box.lty=0, border=NA)
}
}
}
if(orient == 1 && type == 3)
stbYlim <- ylim
if(pick && pick != 2)
{
Ident <- identify(x=Xval, y=Yval, labels=names(vec), pos=TRUE)
nam <- names(vec[Ident$ind])
Ident$ind <- which(names(x$vec) %in% nam)
}
}
if(type %in% c(2,3))
{
STI <- x$STI
if(is.null(STI))
{
STI <- unclass(fastSTB(x$mat[,c(1, ncol(x$mat))], alpha=x$alpha))
STI$mat <- NULL
x$STI <- STI
if(is.null(x$plot$sti.main))
x$plot$sti.main <- "Residual Plot"
}
else
{
STI <- x$STI
}
Xval <- 1:length(x$vec)
Yval <- x$vec
if(!is.null(stbYlim))
ylim <- stbYlim
else
ylim <- range(c(x$vec, c(STI$Q)))
plot(Xval, Yval, type="n", main=x$plot$sti.main, ylim=ylim,
ylab=x$plot$sti.ylab, xlab="i-th random variate")
abline(h=STI$Q[1,1], col=x$plot$sti.col, lty=x$plot$sti.lty, lwd=x$plot$sti.lwd)
abline(h=STI$Q[2,2], col=x$plot$sti.col, lty=x$plot$sti.lty, lwd=x$plot$sti.lwd)
out.high <- which(x$vec > STI$Q[2,2])
out.low <- which(x$vec < STI$Q[1,1])
out <- unique(c(out.high, out.low))
if(length(out > 0))
{
points( Xval[-out], Yval[-out], pch=x$plot$pch, col=x$plot$col )
points(Xval[out], Yval[out], pch=x$plot$pch.out, col=x$plot$col.out) # means[1] and means[N] have no counterpart in vec
}
else
points( Xval, Yval, pch=x$plot$pch, col=x$plot$col )
if(x$legend)
{
if(x$plot$sti.lpos=="title")
{
mtext(text=bquote(paste("Simultaneous Tolerance Interval (coverage = ", .(100*round(STI$coverage,5)),"% ;", N[sim]==.(x$N), ")", sep="")),
side=3, line=.25, cex=.9)
}
else
{
legend(x$plot$sti.lpos, lty=c(x$plot$sti.lty, 0), col=c(x$plot$sti.col, par("bg")),
legend=c(paste("STB (", 100*round(STI$coverage,5),"% coverage)",sep=""),
paste("N=",x$N," Simulations", sep="")),
box.lty=1, bg="white")
}
}
if(pick)
{
if(is.logical(pick))
{
if(is.null(Ident))
identify(x=Xval, y=Yval, labels=names(x$vec))
else
{
if(length(Ident$ind) > 0)
text(Xval[Ident$ind], Yval[Ident$ind], labels=names(x$vec[Ident$ind]), pos=Ident$pos)
}
}
else
{
if(pick == 2)
identify(x=Xval, y=Yval, labels=names(x$vec))
}
}
}
x$type <- type
par(old.par)
invisible(x)
}
#' Rank-Based Algorithm for Computing 100(1-alpha)\% Simulataneous Tolerance Bounds.
#'
#' Implementation of a rank-based algorithm for constructing 100(1-alpha)\% STBs as outlined in the reference.
#'
#' This function is a performance optimized version of the original rank-based algorithm avoiding the time-consuming
#' iteration. In principle it sorts out simulation results which have at least one extreme order statistic untill
#' exactly 100(1-alpha)\% of all simulation results remain. From these, bounds of the STB are constructed determining
#' extreme-values per order-statistic (column).
#'
#' This implementation also corrects step 4) of the published algorithm, which has to find those indices of elements being
#' equal to "min(c)" OR being equal to "N-min(c)+1". This reflects the construction of vector "c", where max. rank-values
#' are transformed to min. rank-values. In step 6) the "N_{k-1}-(1-alpha)*N largest" elements of "d_{l}^{theta}" have to
#' be selected, which needs also correction.
#'
#' Parallel processing did not minimize the computation time in contrast to the algorithms for computing the quantile-based algorithm.
#' Thus, parallel processing is not supported for this algorithm.
#'
#' @param mat (numeric) matrix of dimension (N, n), where i-th row corrsponds to ordered values
#' of the i-th simulation
#' @param alpha (numeric) value defining the desired coverage as 100(1-alpha)\%
#'
#' @return (list) with two elements:\cr
#' \item{Q}{(matrix) 1st row stores lower bounds, 2nd row upper bounds}
#' \item{cov}{(numeric) value corresponding the coverage}
#'
#' @author Andre Schuetzenmeister \email{andre.schuetzenmeister@@roche.com}
#'
#' @references
#'
#' Schuetzenmeister, A. and Piepho, H.P. (2012). Residual analysis of linear mixed models using a simulation approach.
#' Computational Statistics and Data Analysis, 56, 1405-1416
#'
#' @examples
#' \dontrun{
#' # for following problem size the rank-based algo
#' # outperforms the quantile based one, although,
#' # ran serially
#' mat <- matrix(rnorm(10000*100), ncol=100)
#' mat <- t(apply(mat, 1, sort))
#' system.time(stb.rank <- rankSTB(mat))
#' system.time(stb.q.R <- getSTB(mat))
#' system.time(stb.q.C <- fastSTB(mat))
#' x <- apply(mat, 2, mean)
#' plot(x,x, ylim=c(-5,5))
#' lines(x, stb.q.R$Q[1,], col="blue", lwd=2)
#' lines(x, stb.q.R$Q[2,], col="blue", lwd=2)
#' lines(x, stb.q.C$Q[1,], col="red", lwd=2)
#' lines(x, stb.q.C$Q[2,], col="red", lwd=2)
#' lines(x, stb.rank$Q[1,], col="cyan", lwd=2)
#' lines(x, stb.rank$Q[2,], col="cyan", lwd=2)
#' legend("top", legend=c("R-quantile", "C-quantile", "rank-based"),
#' fill=c("blue", "red", "cyan"))
#'
#' # varying Ncpu for the C-implementation of the quantile-based algo
#' system.time(stb.q.C <- fastSTB(mat, Ncpu=4))
#' system.time(stb.q.C <- fastSTB(mat, Ncpu=6))
#' system.time(stb.q.C <- fastSTB(mat, Ncpu=8))
#' system.time(stb.q.C <- fastSTB(mat, Ncpu=10))
#' system.time(stb.q.C <- fastSTB(mat, Ncpu=12))
#' }
rankSTB <- function(mat, alpha=0.05)
{
stopifnot(is.matrix(mat) && is.numeric(mat))
stopifnot(alpha>0 && alpha<1)
ind <- min_v <- NULL
N <- nrow(mat)
N0 <- N1 <- N # is N_{k-1} in the paper
n <- ncol(mat)
D <- apply(mat, 2, function(x) abs(scale(x))) # scaled absolute values per column
C <- apply(mat, 2, rank) # per-column rank values of mat
v <- apply(C, 1, function(x){
return(min(min(x), N-max(x)+1))})
v2 <- sort(v) # this code replaces the loop in the paper
thresh <- N*alpha
cs <- 0
for(i in 1:length(v2)) # determines the rank-value needed for exact coverage
{
cs <- cs + 1
if(cs == thresh)
{
min_v <- v2[i]-1
break
}
}
if(min_v > 0) # case: number of rows with rank = 1 > N*(1-alpha)
{
ind <- which(v <= min_v)
mat <- mat[-ind,]
C <- C[-ind,]
D <- D[-ind,]
v <- v[-ind]
N0 <- nrow(mat)
}
ind <- which(v == (min_v+1)) # if reached exact coverage N0-N1 == N*(1-alpha)
min_v <- min_v + 1
N1 <- length(ind) # loop-replacing code ends here
if(N0-N1 < N*(1-alpha)) # apply post-processing
{
D0 <- D[ind,,drop=F] # is D^{theta} in the paper
C0 <- C[ind,,drop=F] # is C^{theta} in the paper
ind0 <- apply(C0, 1, function(x) which(x %in% c(min_v, N-min_v+1))) # column-indices
if(is.list(ind0)) # multiple elements in row corresponding to min_v
{
d0 <- NULL
for(i in 1:length(ind0))
{
d0 <- c(d0, max(D0[i,ind0[[i]]]))
}
}
else
d0 <- D0[,ind0]
num <- N0 - N*(1-alpha) # that many items need to be removed
ind1 <- order(d0)[1:num]
mat <- mat[-ind1,] # remove as many rows as needed to have exact coverage
}
else
mat <- mat[-ind,] # exact coverage without post-processing required
lower <- apply(mat, 2, min)
upper <- apply( mat, 2, max)
res <- list(Q=rbind(lower, upper), coverage=nrow(mat)/N)
return(res)
}
#' Generic Method for computing Simultaneous Tolerance Bounds (bands and intervals)
#' for fitted models or numeric vectors.
#'
#' If the generic method is applied to an object of class \code{VCA} function \code{\link{stb.VCA}}
#' is called and all its arguments may be specified. Otherwise, function \code{\link{stb.default}}
#' is called and all its arguments may be specified. The latter was developed for numeric vectors
#' (see ?stb.default). Therefore this method will most likely produce strange results on other
#' types of objects than \code{VCA} and numeric vectors.
#'
#' @param obj (object) passed on
#' @param ... additional parameters
stb <- function(obj, ...)
UseMethod("stb")
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.