Nothing
#' Plots a regression nomogram showing covariate distribution
#' @description
#' \code{regplot} plots enhanced regression nomograms. Covariate distributions are superimposed on nomogram scales and the plot
#' can be animated to allow on-the-fly changes to distribution representation and to
#' enable interactive outcome calculation.
#' @param reg An R regression object from a regression command (see Details, for allowed regressions)
#' @param plots Specifies type of covariate plot. Default \code{plots=c("density","boxes")} specifies
#' density plots for numeric covariates and boxes for factors (see Details for
#' other options).
#' @param center If \code{TRUE} the mean values of continuous variables
#' and reference categories of factors are aligned vertically.
#' Otherwise continuous distributions are vertically aligned at zero
#' together with reference categories of factors.
#' @param observation To superimpose an observation, shown in (default) red.
#' If \code{TRUE} superimposes an
#' observation that is first row of the data used to build \code{reg}.
#' Otherwise it may be a specified as any row of \code{reg} data
#' or as a dataframe conforming to the structure of the regression data.
#' \code{FALSE} omits any superposition.
#' @param title A string title for the plot. If omitted the regression object name and class are output.
#' @param points If \code{FALSE} the regression scores of each \eqn{\beta}\eqn{x} contribution are shown.
#' Otherwise contributions are represented by a 0-100 "points" scale.
#'
#' @param failtime For survival models only, otherwise ignored. Used to specify
#' cutoff times for risk probabilities or for quantiles of survival time. For the former
#' \code{failtime=c(5,10)}, for example,
#' specifies two probability scales for survival to 5 and 10 time units while
#' \code{failtime=c("50\%","10\%")} specifies scales for \code{50\%} and \code{10\%} quantiles.
#' If \code{failtime} is omitted or \code{NULL}, a probability scale for a cutoff that is
#' the median of the time variable is adopted. .
#'
#' @param prfail For survival models, otherwise ignored. If \code{TRUE} the
#' probability scale is of failure before \code{failtime}, otherwise after \code{failtime}.
#'
#' @param baseS For \code{coxph} and \code{cph} regressions only. If non-null, it specifies the baseline
#' survival probability, for a non-centered model, corresponding to value(s) of \code{failtime}. If
#' \code{NULL} the baseline probability is established from the regression object \code{reg}.
#' Specifying \code{baseS} can be used coerce alternative baselines.
#' @param odds For probability outcomes, the nomogram scale is of odds rather than probability.
#' @param nsamp The size of a random sub-sample of data to plot covariate distributions
#' (as plotting huge data may be slow and graphical precision, beyond a certain point,
#' unnecessary).
#' @param showP Whether P-value asterisk codes are to be displayed. For factors,
#' the code for the most highly significant level is shown.
#' @param rank Positions the nomogram scales by importance, top down. Two options: \code{rank="range"} is by the
#' range of the \eqn{\beta}\eqn{x}'s, and \code{rank="sd"} is by the standard deviation of the \eqn{\beta}\eqn{x}'s.
#' If \code{NULL} nomogram scales are arranged by order of main effects in the formula, and
#' with interactions at top of the page.
#'
#' @param subticks Puts minor tick marks on axes, where possible.
#'
#' @param interval Draws \code{95\%} confidence and prediction intervals. Values
#' \code{"confidence"}, or \code{"prediction"}, place intervals on a
#' calculated outcome for a specified \code{observation} (if \code{observation} is non \code{NULL}).
#' A value \code{"coefficients"} draws confidence intervals on \eqn{\beta}\eqn{x}
#' for some values of \eqn{x}.
#'
#' @param clickable \code{TRUE} if the graphic is active for on-the-fly mouse input (see Details).
#' @param ... Additional graphics control parameters for font sizes,
#' colours, layout (see Details).
#'
# @import survival
# @import vioplot
# @import beanplot
# @import graphics
# @import stats
# @import sm
# @importFrom grDevices recordPlot replayPlot col2rgb rgb
#'
#' @author Roger Marshall <rj.marshall@@auckland.ac.nz> School of Population Health, The University of Auckland, New Zealand
#'
#' @return If \code{points=TRUE}, an object is returned that
#' is a list of dataframes, each frame giving points per covariate, and the last on the list a
#' total points-to-outcome look-up table.
#'
# @export
#'
#' @details Creates a nomogram representation of a fitted regression.
#' The regression object \code{reg} can be of different types from the \code{stats}, \code{survival} ,
#' \code{rms}, \code{MASS} and \code{lme4} libraries. Specifically models generated by
#' the commands: \code{glm, Glm, lm, ols, lrm,
#' survreg, psm, coxph, cph, glm.nb, polr} or mixed model regressions \code{lmer},
#' \code{glmer}, and \code{glmer.nb}. For \code{glm, Glm} and \code{glmer}
#' the supported family/link pairings are: gaussian/identity, binomial/logit, quasibinomial/logit, poisson/log and quasipoisson/log.
#' For ordinal regression, using \code{polr}, logit and probit models
#' are supported. For \code{survreg} and \code{psm} the distribution may be lognormal, gaussian, weibull, exponential or loglogistic.
#' For \code{glm.nb} (from package \code{MASS}) and \code{glmer.nb} only log-link is allowed.
#'
#' The plot can be made active for mouse input if \code{clickable=TRUE}
#' so allowing on-the-fly changes to distribution plot type
#' (frequency boxes, bars, spikes, box plot, density, empirical cdf, violin and bean plots). These
#' options are presented by a temporary heading menu bar.
#' Individual plots may be selected. Also values of \code{observation} (if non-null) can be changed by clicking new values,
#' effectively making \code{regplot} a interactive regression calculator.
#'
#' The \code{plots} parameter specifies initial plot types. It is length 2.
#' The first item specifies a plot type for non-factor variables as one of:
#' \code{"no plot"}, \code{"density"}, \code{"boxes"}, \code{"spikes"}, \code{"ecdf"}, \code{"bars"},
#' \code{"boxplot"}, \code{"violin"} or \code{"bean"}.
#' The second item, is for factors and is one of: \code{"no plot"}, \code{"boxes"},
#' \code{"bars"} or \code{"spikes"}.
#'
#' The graphic shows a scale for all main effects in the regression formula.
#' Interactions are shown by separate nomogram scales.
#' Factor-by-factor interactions are considered as factors and displayed with factor combinations.
#' Factor-by-numeric interactions are displayed for the scale of
#' the numeric variable(s) and separate scale for each factor level.
#' Numeric-by-numeric interactions are shown with respect to the interaction
#' product scale.
#'
#' For random effects models (\code{lmer} and \code{glmer})
#' an additional random effects scale is included.
#'
#' If models are stratified, by a \code{strata()} (or \code{strat()} for \code{rms} models)
#' in the model formula, the
#' behaviour differs depending on the model class. For survival models
#' each stratum has its own outcome scale, otherwise it is included as a
#' term in the linear score with a its own nomogram scale.
#'
#' If a model formula includes a function (e.g \code{log()} or a spline \code{rcs()})
#' a thumbnail plot of the shape of the transformation is placed on
#' the right of the nomogram scale. It can be toggled on and off by clicking on it (if \code{clickable=TRUE}).
#'
#' Additional \code{...} parameters may include items to control the look of the plot if users wish
#' to change default settings:
#' \code{dencol} colour fill of density plots and other
#' representations of numeric data,
#' \code{boxcol} fill of factor/logical frequency boxes, \code{obscol} colour of superimposed observation,
#' \code{spkcol} colour of spikes. Also font sizes can be set: \code{cexscales} for font size of points and nomogram scales,
#' \code{cexvars} for variable names, \code{cexcats} for category and variable values.
#' To label scales immediately adjacent to the scale (not on the left) use \code{leftlabel=FALSE}.
#' To draw dotted vertical lines to show more clearly score contributions to an observation
#' use \code{droplines=TRUE}.
#'
#' @examples
#' ## 1. Simulation
#' n <-500
#' X <- cbind(rnorm(n, sd = 1),rnorm(n, sd = 0.5))
#' ## make outcome Y with intercept 10 + random variation
#' Y <- 10 + X %*% c(0.2, 0.1) + rnorm(n, sd = 1)
#' D <- as.data.frame(cbind( Y,X)); colnames(D) <- c("Y","x1","x2")
#' model <- lm( Y ~ x1 + x2, data = D)
#' regplot(model, observation = D[1,], interval = "confidence")
#' ## 2 Survival model for pbc data
#' library(survival)
#' data(pbc)
#' pbccox <- coxph(formula = Surv(time,status==2) ~ age
#' + cut(bili,breaks=c(-Inf, 2, 4, Inf)) + sex
#' + copper +as.factor(stage),data=pbc)
#' regplot(pbccox,observation=pbc[1,], clickable=TRUE,
#' points=TRUE, rank="sd",failtime = c(730,1825))
regplot <- function(reg, plots=c("density","boxes"), center=TRUE,observation=NULL,
title=NULL,points=FALSE,failtime=NULL, prfail=NULL,
baseS=NULL,odds=FALSE, nsamp=10000,showP=TRUE,
rank=NULL, subticks=FALSE, interval=NULL, clickable=FALSE,
...)
{
##########################################################
## This is regplot v 1.0.
## This file, regplot.R, contains main regplot() function
## Functions called by regplot() are in separate file
## regplot_functions.R
##########################################################
## Is argument "reg" a valid regression object?
if(is.null(reg)) {return(message("Error: NULL regression object"))}
## test for NA reg for an S3 object
if(!isS4(reg)){if(is.na(reg[1]))
{return(message("Error: NA S3 type regression object"))}}
modelname <- deparse(substitute(reg))
## extract type of regression in "reg" and create
## logical type indicators.
rtype <- class(reg)[1]
rms <- (class(reg)[2]=="rms")
if(is.na(rms)) rms <- FALSE
## rms has call output if(rms){ message(reg$call)}
cox <- (rtype=="coxph") | (rtype=="cph")
lm <- (rtype=="lm" ) | (rtype=="ols")
glm <- (rtype=="glm") | (rtype=="Glm")
survreg <- (rtype=="survreg") | (rtype=="psm")
glm.nb <- (rtype=="negbin")
lmer <- (rtype=="lmerMod")
glmer <- (rtype=="glmerMod")
lrm <- (rtype=="lrm")
## note glmer and glmer.nb have same class "glmerMod"
## so cannot be distinguished at this point.
lme4 <- glmer | lmer
polr <- (rtype=="polr")
if(!( polr | glm | lm | cox | survreg | glm.nb | lme4 | lrm )){
return(message(paste("regplot does not support class",
paste("\"",rtype,"\"",sep=""), "models")) )
}
if(lrm){
## coerce so that lrm seen as logistic model
reg$family <- c("","")
reg$family[1] <- "binomial"
reg$family[2] <- "logit"
glm <- TRUE
}
##==============================================
## checks on model classes and types allowed
## note: survreg here is either "survreg" or "psm(rms)"
## similarly glm is "glm" or "Glm(rms)"
## cox is "coxph" or "cph(rms)"
## lm is "lm" or "ols(rms")
if(survreg){
dist <- reg$dist
## check on survreg() distribution types
if(!(dist == "lognormal" | dist == "loglogistic" | dist == "weibull"|
dist == "exponential" | dist == "gaussian" ) ){
return(message(paste("regplot does not support survreg distribution:",dist)))
}
}
if( cox | survreg ){
if(rms & ( is.null(reg$x) | is.null(reg$y) ) ){
return(message(" \"x=TRUE\" and \"y=TRUE\" required in rms model build"))
}
}
ols <- FALSE
logistic <- FALSE
poisson <- FALSE
negbin <- FALSE
if(glm | glm.nb){
## is glm class done an allowed family/link?
fam <- reg$family[1]
link <- reg$family[2]
ols <- (fam=="gaussian" & link=="identity")
logistic <- (fam=="binomial" & link=="logit")
quasibinomial <- (fam=="quasibinomial" & link=="logit")
poisson <- (fam=="poisson" & link=="log")
quasipoisson <- (fam=="quasipoisson" & link=="log")
negbin <- (glm.nb & link=="log")
if(!(ols | logistic | poisson | negbin | quasipoisson | quasibinomial)){
return(message(paste("Cannot do for glm family/link combination:",
fam,link)))
}
## quasipoisson is essentially poisson model with identical beta-coefficients
## and only standard errors differ
if(quasipoisson) poisson <- TRUE
## and same for quasibinomial
if(quasibinomial) logistic <- TRUE
}
polr_logistic <- FALSE
polr_probit <- FALSE
## ordinal probit or logistic regression with polr
if(polr){
if(reg$method == "logistic"){
logistic <- TRUE
polr_logistic <- TRUE
}
if(reg$method == "probit"){
logistic <- FALSE
polr_probit <- TRUE
}
if(!(polr_logistic|polr_probit)){
return(message(paste("polr method ", reg$method,"not supported")))
}
}
## lme4 mixed random effects models
olser <- FALSE
logisticer <- FALSE
poissoner <- FALSE
negbiner <- FALSE
##for glmer (glmer, glmer.nb)
if(glmer ){
## has glmer done an allowed family/link?
fam <- unlist(reg@resp$family)[1]
link <- unlist(reg@resp$family)[2]
olser <- (fam == "gaussian" & link == "identity")
logisticer <- (fam == "binomial" & link == "logit")
poissoner <- (fam == "poisson" & link == "log")
negbiner <- (link == "log") ## fam of form "Negative Binomial(xxxx)" for glmer.nb()
if(!(olser | logisticer | poissoner | negbiner)){
return(message(paste("glm family/link combination not supported:",
fam,link)))
}
}
if(!(polr | cox | survreg | lm | ols | logistic | poisson | negbin | lmer | glmer )){
return(message("regplot does not support this regression model class"))}
FIRSTRUN <- TRUE
##=========================================================
## check on existence of locator(), for clickable=TRUE
if(clickable ){
if(!exists("locator") ){
message(paste("\"clickable=TRUE\" requires locator() function. It is not present"))
clickable=FALSE
}
}
##===============================================
## ellipsis ... additional allowed parameters
##defaults:
#font sizes
cexvars <- 1 ## variables on lhs of plot
cexcats <- 0.8 ## labels mattached to category values
cexscales <- 0.9 ## fonts on points, total points, nomogram scales
# fefault colours
dencol <- "#EEEEEE" ## colour of density plot fill
boxcol <- "#B3EADF" ## colour of factor boxes plot fill
obscol <- "#FF0000" ## colour of observation points (red)
spkcol <- "#669999" ## colour of spike plots
leftlabel <- TRUE ## label of scale panels on the left
droplines <- FALSE ## vertical faint lines of observation
## ellipsis parameters
## overrides with any ellipsis parameters:
if(length(list(...)) !=0 ){
XXX <- unlist(list(...))
## test for allowed additional ... parameters
allowed <- c("cexscales","cexvars","cexcats", "dencol",
"boxcol", "spkcol","obscol","leftlabel","droplines")
allow <- is.element(names(XXX),allowed)
if(!all(allow)){
message(paste("Note: there are non-allowed ... additional parameter(s):"))
message(paste( names(XXX)[which(allow==FALSE)], collapse=" " ))
}
## possible ellipsis three dot parameters:
V <- is.element(names(XXX),"cexscales")
if(length(which(V==TRUE))>0 ) {
cexscales <- as.single(XXX[which(V==TRUE)])
}
V <- is.element(names(XXX),"cexvars")
if(length(which(V==TRUE))>0 ) {
cexvars <- as.single(XXX[which(V==TRUE)])
}
V <- is.element(names(XXX),"cexcats")
if(length(which(V==TRUE))>0 ) {
cexcats <- as.single(XXX[which(V==TRUE)] )
}
V <- is.element(names(XXX),"boxcol")
if(length(which(V==TRUE))>0 ) {
boxcol <- as.character(XXX[which(V==TRUE)] )
}
V <- is.element(names(XXX),"obscol")
if(length(which(V==TRUE))>0) {
obscol <- as.character(XXX[which(V==TRUE)] )
}
V <- is.element(names(XXX),"dencol")
if(length(which(V==TRUE)) >0) {
dencol <- as.character(XXX[which(V==TRUE)] )
}
V <- is.element(names(XXX),"spkcol")
if(length(which(V==TRUE)) >0) {
spkcol <- as.character(XXX[which(V==TRUE)] )
}
V <- is.element(names(XXX),"leftlabel")
if(length(which(V==TRUE)) >0) {
leftlabel <- XXX[which(V==TRUE)]
}
V <- is.element(names(XXX),"droplines")
if(length(which(V==TRUE)) >0) {
droplines <- XXX[which(V==TRUE)]
}
}
##==========================================================
## check on plots argument (plot types)
if(length(plots) !=2) {
return(message(" \"plots\" argument should be length 2"))
}
allowedc <- c("no plot","density","boxes","ecdf","bars","boxplot","violin","bean","spikes")
if(!(plots[1] %in% allowedc)){
return(message(paste("plots[1] must be one of: "), paste0("\"",allowedc,"\"", collapse=" ")))
}
allowedf <- c("no plot", "boxes","bars", "spikes")
if(!(plots[2] %in% allowedf)){
return(message(paste("plots[2] must be one of: "), paste0("\"",allowedf,"\"", collapse=" ")))
}
##==========================================================
## is a RED (or obscol) "person" to be added, is observation non-NULL?
person <- !is.null(observation)
logicalperson <- FALSE
if(person){
if(is.logical(observation) ){
## emtered a observation=TRUE or FALSE value
logicalperson <- observation
if(observation == FALSE) person <- FALSE
}
else #if(is.logical(observation)
{
if(is.data.frame(observation)){
if(nrow(observation)>1) {
message(paste("\"observation\" has >1 row. The first row provides plotted values"))
observation <- observation[1,]
}
## are there NAs in the observation?
if(!all(!is.na(observation))) {
## Note: need a way to check on variables only required for regression
## Cannot do this at this stage, because these not yet identified.
## but will crash if used vars are NA. Trap added later.
NAobs <- names(observation)[which(is.na(observation))]
#message(paste("Note: NAs in \"observation\" for: ",paste( NAobs ,collapse=", ") ))
## message("Cannot do \"observation\" now FALSE")
## person <- FALSE
}
}
else ## if(is.data.frame(obswrvation))
{return(message(paste(" \"observation\" is not a data frame")))
}
} #if(is.logical(observation))
} ##if(person
##==========================================================
## other initialisations
strata_levels <- NULL
new_obs <- TRUE
adddata <- NULL
weighted <- FALSE
nudist <- TRUE
splineplot <- TRUE
clickedonP <- ""
pointscale <- points
points_values <- list()
##=============================================================
## default graph title, if not specified
## browser()
## modelname <- deparse(substitute(reg))
modelname <- paste(modelname, rtype, collapse=" ")
if(survreg){modelname <- paste0(modelname,"(",dist,")", collapse=" ")
}
if(is.null(title)) {title <- modelname}
FORMULA <- formula(reg)
message("Regression ", modelname," formula:")
message(paste(c(FORMULA[[2]],FORMULA[[1]],FORMULA[[3]]),collapse=" "))
##=====================================================
## test for NA coefficients in model. Stop if so?
## NO. Now (june 2020) continues.
## reg$coefficients not defined for lme4 class
## must extract coefficients from summary(reg)
if(lme4){
Sreg <- summary(reg)
reg_coefficients <- Sreg$coefficients[,1]
NAcoef <- which(is.na(reg_coefficients))
}
else
{
NAcoef <- which(is.na(reg$coefficients))
}
if(length(NAcoef)>0){
message("Model ",deparse(substitute(reg))," has NA beta-coefficient(s) for: ",
paste(names(reg$coefficients)[NAcoef], collapse=", ") )
message("Will attempt to continue. Preferable to reformulate model without NA coefficients")
}
##==================================================================
## extract key quantities required from each of regression classes
## 1: setting up extraction required for lme4 models
if(lme4){
set1 <- lmer_1(reg)
if(is.null(set1))return(message("quitting"))
xlevels <- set1[[1]]
weighted <- set1[[2]]
W <- set1[[3]]
yvar <- set1[[4]]
coefficients <- set1[[5]]
offset <- set1[[6]]
SEbeta <- set1[[7]]
Lcoefficients <- set1[[8]]
Ucoefficients <- set1[[9]]
variable_names <- set1[[10]]
randomv <- set1[[11]]
fixed.varnames <- set1[[12]]
actualdata <- set1[[13]]
fix_frame <- set1[[14]]
Pval <- set1[[15]]
x <- set1[[16]]
nointercept_term <- set1[[17]]
nointercept <- nointercept_term
} ## if(lme4)
##-----------------------------------------------------
## 2: extraction required for polr models
if(polr){
set1 <- polr_1(reg)
xlevels <- set1[[1]]
weighted <- set1[[2]]
W <- set1[[3]]
yvar <- set1[[4]]
coefficients <- set1[[5]]
offset <- set1[[6]]
SEbeta <- set1[[7]]
Lcoefficients <- set1[[8]]
Ucoefficients <- set1[[9]]
variable_names <- set1[[10]]
actualdata <- set1[[11]]
Pval <- set1[[12]]
x <- set1[[13]]
offsetvar <- set1[[14]]
intercepts <- set1[[15]]
nointercept_term <- FALSE
nointercept <- FALSE
}
##----------------------------------------
## 3: extraction required quantities for glm models
##browser()
if(glm | glm.nb| lm ) {set1 <- glm_1(reg)
if(is.null(set1)){return(message(""))}
xlevels <- set1[[1]]
weighted <- set1[[2]]
W <- set1[[3]]
yvar <- set1[[4]]
coefficients <- set1[[5]]
offset <- set1[[6]]
SEbeta <- set1[[7]]
Lcoefficients <- set1[[8]]
Ucoefficients <- set1[[9]]
variable_names <- set1[[10]]
actualdata <- set1[[11]]
Pval <- set1[[12]]
x <- set1[[13]]
offsetvar <- set1[[14]]
nointercept_term <- set1[[15]]
nointercept <- nointercept_term
} ##if(if(glm | glm.nb
##---------------------------------------------------
## 4: extraction for cox and survreg classes
## (which includes cph | coxph and survreg | psm)
if( cox | survreg ){
set1 <- coxsurv_1(reg,cox)
if(is.null(set1))return(message("quitting"))
xlevels <- set1[[1]]
weighted <- set1[[2]]
W <- set1[[3]]
yvar <- set1[[4]]
coefficients <- set1[[5]]
offset <- set1[[6]]
SEbeta <- set1[[7]]
Lcoefficients <- set1[[8]]
Ucoefficients <- set1[[9]]
variable_names <- set1[[10]]
deadvar <- set1[[11]]
actualdata <- set1[[12]]
Pval <- set1[[13]]
x <- set1[[14]]
strata_levels <- set1[[15]]
slevels <- set1[[16]]
vstrat <- set1[[17]]
nointercept_term <- set1[[18]]
## note slevels strata_levels distinction
## strata_levels is of form "x=0, y=a"
## slevels is the strata levels extracted from the reg object
## which could be of form "0, a", omitting variable names
nointercept <- !cox & nointercept_term
##
} ##if(cox |survreg)
#=====================================================================
## make checks on regplot arguments, and rearrange as required
if( !(cox | survreg) & !is.null(failtime) ){
##message("note: failtime parameter ignored for non-survival model")
failtime <- NULL}
if(!(cox | survreg) & !is.null(prfail)){
##message("note: prfail parameter ignored for non-survival model")
prfail <- NULL}
if( is.null(prfail) ) {fail <- TRUE} else {fail <- prfail}
if( is.null(failtime) ) {tcut <- NA} else {tcut <- failtime}
ntimes <- 1
medsurv <- FALSE
if(!is.na(tcut[1])) {
ntimes <- length(tcut)
if(!is.numeric(failtime) ) {
## possibility of %s indicating medsurv TRUE
## quantiles, < time. But need > survival %s
Spercent <- 100 - as.numeric(unlist(strsplit(failtime,"%")) )
medsurv <- TRUE
}
}
## ntimes parameter (which counts number of output response scales)
## allowed for survival models with possibly >1 failtime scales
## and for polr ordinal scales
if(ntimes >1 & !(cox | survreg)) ntimes <- 1
if(polr) ntimes <- length(intercepts)
## NB "s5" leftover name for "5 year survival". But is
## generic vector of baseline survival times
s5 <- vector(length=ntimes)
##==========================================================
## checks on valid call parameter arguments
if( is.null(droplines) ) {droplines <- FALSE} else {droplines <- droplines}
if( is.null(nsamp)) {nsamp <- 5000} else {nsamp <- nsamp}
if( is.null(baseS)) {baseS <- NULL } else {baseS <- baseS}
if(!is.null(baseS)){ nbaseS <- length(baseS)
if(nbaseS != ntimes){
return(message("non-NULL \"baseS\" and \"failtime\" must have same length"))}
}
if( is.null(showP) ) {showP <- TRUE } else {showP <- showP}
if( is.null(odds) ) {odds <- FALSE } else {odds <- odds}
if( is.null(rank)) {rank <- FALSE } else {howrank <- rank
rank <- TRUE
if(howrank!="sd" & howrank!="range"){
return(message("mis-specifed \"rank\" option, must be \"sd\" or \"range\""))
rank <- FALSE}
}
if( is.null(subticks) ) {showsubticks <- FALSE } else {showsubticks <- subticks}
if( is.null(interval) ) {interval <- "none"}
if(!all(is.element(interval,c("conf","pred","none","coeff","coefficients", "confidence","prediction") )) ){
return(message("mispecified \"interval\" parameter" ))
}
coeffCI <- is.element("coeff",interval) | is.element("coefficients",interval)
confI <- is.element("conf",interval) | is.element("confidence",interval)
predI <- is.element("pred",interval) | is.element("prediction",interval)
## cannot specify both confidence and prediction intervals
if(predI & !lm ){
return(message(" \"prediction\" intervals not allowed for non-lm models. "))
}
if(polr & confI){
message("interval = \"confidence\" not supported for polr models")
confI <- FALSE
}
if(confI & predI){
message("Cannot specify both \"confidence\" and \"prediction\" intervals. Precedence to \"confidence\"")
predI <- FALSE
}
if((confI|predI) & !person){
message(" \"confidence\" or \"prediction\" intervals ignored for NULL \"observation\" ")
}
if((confI|predI) & lme4){
message(" \"confidence\" or \"prediction\" intervals not supported for lme4 models ")
confI <- FALSE
predI <- FALSE
}
if(cox & !is.null(baseS) & (confI | predI | coeffCI)){
message("non-NULL \"baseS\" precludes \"interval\". \"interval\" ignored")
confI <- FALSE
predI <- FALSE
coeffCI <- FALSE}
if(medsurv){
##pointscale <- FALSE
if(!is.null(baseS)){
return(message(
" non-NULL \"baseS\" cannot be specifed for quantile \"failtime\""))}
if(confI | predI){
message("confidence or prediction intervals cannot be done for for quantile \"failtime\"")
confI <- FALSE
predI <- FALSE}
}
##==========================================================
##----- offset in model? ---------------------------------------------
if( offset){
if(is.null(offsetvar)) { return(message("no offset variable")) }
message(deparse(substitute(reg))," has an offset variable: ", offsetvar)
## ensure not a function
gX <- getX(offsetvar)
if(gX[2] !=""){
return(message("function offset ",offsetvar," not supported. Create a logged variable"))
}
## need to bind in offset data into x, and augment coefficients and varaible_names
x <- cbind(x,reg$offset)
variable_names <- c(variable_names,offsetvar)
coefficients <- c(coefficients,1.0)
##
} #if(offset)
##============================================================
num <- length(variable_names)
## get raw data with this command
dname <- getCall(reg)$data
##browser()
if(is.null(dname)){
message("A data frame must be specified in the regression command")
return(message("Using \"with(data,..)\" construction? Not supported"))
}
rawdata <- eval(dname,environment(formula(reg)) )
if(class(rawdata)[1] == "table")
{message("Data not data.frame but is: ",
paste( class(rawdata), ". Will coerce to dataframe" ))
## try coecion
rawdata <- as.data.frame(rawdata)
}
## check that it is same length
## for polr models rawdata does not seem to include outcome ?
allv <- all.vars(FORMULA)
if(nrow(rawdata) != nrow(actualdata)){
ccobs <- complete.cases(rawdata[,allv])
rawdata <- rawdata[ccobs,]}
## check on this
if(nrow(rawdata) != nrow(actualdata)){
message("Data has ", nrow(rawdata), " rows. Data used in fitting has ", nrow(actualdata)," rows")
return( message("Data and fitted data nrow mis-match not supported"))
}
##==================================================================
## checks on specified observation value
if(person){
if(logicalperson) {
observation <- rawdata[1,]}
else
{
## check on specified observation matching data
requiredvars <- colnames(actualdata)
## no not names of actual data, which may have a Surv() object
## use colnames of rawdata
rawdatavars <- colnames(rawdata)
obs_indata <- colnames(observation) %in% rawdatavars
data_inobs <- rawdatavars %in% colnames(observation)
if(!all(data_inobs)){
## there are variable that are not in observation
exc <- rawdatavars[which(!data_inobs)]
## but are these required in regression?
req <- exc %in% requiredvars
if(any(req)){
message("\"observation\" excludes required regression variable(s):")
message(paste( exc[which(req==TRUE)] ," " ))
person <- FALSE }
}
## observation has required variables, but may not
## precisely match rawdata, which may include other unused vars
## so need to exclude these else problem with
## app <- rbind(observation,rawdata)
if(person){
reord <- rawdatavars[which(data_inobs)]
observation <- observation[reord]
rawdata <- rawdata[,reord]
}
}
} ##if(person)
##passed check, and person still TRUE. Further checks
## test for NA's of required variables in model (exclude Y)
##browser()
if(person){
req <- all.vars(FORMULA)[-1]
if((survreg | cox)){
if(!is.na(deadvar))req <- req[-1]
}
## print(observation[req])
## check on character observation variables matching data
obsclass <- lapply(observation[req],class)
dataclass <- lapply(rawdata[req],class)
if(!identical(obsclass,dataclass)){
message("warning: classes of data \"observation\" do not agree")
person <- TRUE
}
chars <- which(obsclass=="character")
if(length(chars) > 0){
for(chv in 1:length(chars)){
vnm <- names(chars)[chv]
allowedvalues <- unique(rawdata[,vnm])
if(!(observation[vnm]) %in% allowedvalues){
message(
"inadmissible observation value ",colnames(observation[req])[chars[chv]],"=",
observation[vnm])
person <- FALSE
}
}
}
##check on NAs in onservation
nas <- which(is.na(observation[req]))
if(length(nas) > 0){
message(
"NA values in \"observation\" for required variable(s): ",
colnames(observation[req])[nas])
person <- FALSE
}
## make a note of person shut down
if(person == FALSE){
message("\"observation = FALSE\" is assumed")
}
} ##if(person
##===============================================================
## possibility of :: package prefix added to variable_names. Strip away
## why? :: operator interfers with using ":" as interaction determiner
## need to add prohibition because, clicking obs
## doesnt work. reg$terms contains the package name prefix
dblcolon <- grep("::", variable_names)
if(length(dblcolon>0)){
return(message("package prefix \"::\" detected in formula. Reformulate regression without"))
}
##=================================================================
## set up inverses of functions in formula
inv <- checkvarsnew(variable_names,TRUE)
# if(is.na(inv)) {
# return(message(" unsupported function") )
# }
raw_variable_names <- unlist(inv[1])
inv_variable_names <- unlist(inv[2])
## note raw_variable names gives for example for rcs(X1) + log(X2+10)
## the values "rcs(X1)" and "X2"
ff <- unlist( getXmulti(variable_names)[2])
vv <- unlist( getXmulti(variable_names)[1])
## how to deal with interactions of functions? These returned with
## vv=NA. Replace all such ff with ""
ff[ which(is.na(vv) ) ] <- ""
pseud <- rep("",times=length(ff))
## pseudo variables of these types only, replace with the assocaited variable name
## why log()??
##browser()
wh <- which( ff=="cut()" | ff=="abs()" | ff=="rcs()" | ff=="bs()" | ff == "ns()" | ff== "poly()" )
pseud[wh] <- vv[wh]
nvars <- length(variable_names)
#
#
ordrd <- NULL
for(k in 1:nvars){
# #
whichv <- which(names(actualdata)==variable_names[k])
type_v <- class(actualdata[, whichv ] )
#
## possibity of an "ordered" class of variables (as opposed to "factor", "numeric"
if(type_v[1] == "ordered"){
ordrd <- c(ordrd,variable_names[k])
}
} ## end for(k in 1:nvars)
#
if(!is.null(ordrd)){
return(message(paste("regplot() does not support \"ordered\" class: ",
paste(ordrd,collapse=" ") ) ) )
}
##=====================================================================
## Start endless "idling" loop while(TRUE) for mouse clicks and Esc
##=====================================================================
while(TRUE){
if(person){
## trying to rationalise model.frame() calls with
## new observation. Required argument pseudovarnames in added.obs()
## not known at this point. Fashion alt version pseud
## allowing for possible spline type functions
##raw_variable_names may have spline rcs() or bs(s)
## split into variable and function , only need do on FIRSTRUN
##======lm/glm=========================================================
if( glm | lm | glm.nb | polr) {
if(FIRSTRUN){
##===============================================================
# ## FIRSTRUN creates base plot and other computations on
# ## first cycle of the idling
##===============================================================
vactual <- colnames( actualdata )
## f( glm | lm | glm.nb )
adddata <- added.obs(reg,observation,rawdata,actualdata ,pseud,NA)[[1]]
vadded <- colnames( adddata )
included <- is.element( vactual, vadded )
vin <- vactual[which( !included )]
if(weighted){
## need to coerce in a weights value (=1) into adddata
adddata[which(names(adddata)=="(weights)")] <- 1
}
} ##if(FIRSTRUN)
if(offset ){
## need to coerce name of offset variable in adddata,
## which is returned by model.frame from added.obs() to be "(offset)"
names(adddata)[ which(names(adddata)=="(offset)") ] <- offsetvar
}
both <- rbind(actualdata,adddata)
## augment with whgo;le data
#
## NTM. This model.matrix() call
## occasionally craps out. Unsure why . really should only
## require data=adddata, but seems to require more than one observation
## so I augment actaul data by adding extra adddata row.
## This seems to work but may be slow if actual data large
## glm....
MM <- model.matrix(reg$terms,data=both)
## last row corresponds to adddata since both=rbind(actualdata,adddata)
newX <- MM[nrow(MM),]
if(offset) {
newX <- c(newX, as.numeric(observation[offsetvar]) )
}
## remove the intercept term, or if a no-intercet rms model that
## includes afactor (length(xlevels)>0, test also used in glm_1)
## within if(person)
if(!nointercept_term | (nointercept_term & rms & length(xlevels)>0) ){newX <- newX[-1]}
##nointercept <- nointercept_term
##
} ##if(glm | glm.nb| lm)
##====== survival models =========================================================
if( cox | survreg ){
## this is within an if(person)...
if(FIRSTRUN){
# ## check on time in observation
if(!all(is.element( yvar, names(observation) ) ) ){
return( message(paste(yvar)," missing in \"observation\" "))
}
#
actualdata <- model.frame(reg)
#remove first Surv(time, censor) element from vactual
vactual <- colnames( actualdata )[-1]
adddata <- added.obs(reg,observation,rawdata,actualdata ,pseud,NA)[[1]]
if(weighted){
## need to coerce in a "(weights)" value into adddata
## ladd <- length(adddata)
adddata[which(names(adddata)=="(weights)") ] <- 1
## names(adddata)[ladd+1] <- "(weights)"
}
vadded <- colnames( adddata )[-1]
actualdata1 <- actualdata[1,]
} ##if(FIRSTRUN cox
##
both <- rbind(actualdata1,adddata)
##===============================================================
## why does it need "both" and not just data=adddata?
## ans : because clicked on factors returned as.character.
## fix to coerce as factors by merging into the actualdata dataframe
## but don't need whole thing (which slows things up)
##===============================================================
##cox survreg
##This was working fine now craps out for cph?? FUCK FUCK FUCK
MM <- model.matrix(reg,data=both)
## use last row of combined data for plotting red points
if(ncol(MM)==1){
newX <- MM[nrow(MM),,drop=FALSE]
}else{
newX <- MM[nrow(MM),]
}
if(rms & !is.null(vstrat)){
## model.matrix for rms objects includes strat() variable
## with value attached e.g "strat(stage_4)1" try removing
newX <- newX[ - which( substr(names(newX),start=1,stop=nchar(vstrat))==vstrat) ]
}
## remove intercept term of newX (only for survreg, not Cox)
if(survreg & !nointercept_term){newX <- newX[-1]}
##
} #endif(cox | survreg)
##====lme4 mixed effects models===========================================================
if(lme4){
## this is within an if(person)...
if(FIRSTRUN){
nc <- ncol(fix_frame)
## need manually build the fixed effect part of regression formula
connect <- "~"
if(nointercept_term) connect <- "~ 0 + "
form <- c( paste(fixed.varnames[1]),connect,
paste(variable_names,collapse="+") )
## check for strata() in lme4
##browser()
for(i in 1:length(variable_names)){
check <- getX(variable_names[i])
if(check[2]=="strata()" | check[2]=="strat()"){
return( message(paste("stratified not supported for lme4 models: ", variable_names[i])))
## strat() in formula craps out model.matrix(form,y_adddata) below
## not worth effort to debug??
}
}
form <- paste(form, collapse="")
form <- as.formula(form)
## keep the cut down formula as THE formula
FORMULA <- form
adddata <- added.obs(reg,observation,rawdata,actualdata ,pseud,FORMULA)[[1]]
###
} ##if(FIRSTRUN lmer
newX <- model.matrix(form,adddata)
if(nointercept) {
## form has connect = ~ + 0....
## newX <- model.matrix(form,adddata)
betas <- coefficients[1:(length(coefficients))]
Lbetas <- Lcoefficients[1:(length(Lcoefficients))]
Ubetas <- Ucoefficients[1:(length(Lcoefficients))]
}
else
{
# form generates an intercept, so trim off first element
newX <- newX[-1]
betas <- coefficients[2:(length(coefficients))]
Lbetas <- Lcoefficients[2:(length(Lcoefficients))]
Ubetas <- Ucoefficients[2:(length(Lcoefficients))]
}
##
names(newX) <- names(betas)
beta_order <- list(length=10)
} #endif(lmer
} #endif(person)
##==================================================================
if(FIRSTRUN){
if(!(ols|lm) & predI){
message("Cannot do \"prediction\" interval for ",
paste(rtype),". \"confidence\" will be done instead")
predI <- FALSE
confI <- TRUE
}
## glmer or lmer
if(!lme4){
if(is.null( reg$formula )){ reg$formula <- formula(reg) }
}
nregcoef <- length(coefficients)
npars <- nregcoef
ns <- ""
sigcodes <- ifelse( Pval<.05 ,"*" ,"" )
sigcodes <- ifelse( Pval<.01 ,"**" ,sigcodes )
sigcodes <- ifelse( Pval<.001 ,"***",sigcodes )
#frig to patch if suppressing P values other option showP=FALSE
if(!showP){sigcodes <- ifelse(Pval>0,"","")}
#}
if((cox | survreg) ){
## need actualdata - may not have been done if !person
##@if(!person) {actualdata <- model.frame(reg)}
SU <- actualdata[,1]
tmedian <- median(SU[,1])
if(is.null(tcut[1]) | is.na(tcut[1]) ){
##get Surv object in SU extract first element which is time
## and make time cutoff the median
tcut <- tmedian
}
else ##if(cox | survreg)
{ if(!medsurv & (max(tcut) > max(SU[,1]) | min(tcut) < min(SU[,1])) ){
return( message("failtime ",paste(failtime, collapse=" ")," out of range of observed: ",
paste(signif(min(SU[,1]),4),signif(max(SU[,1]),4) ) ) )}
}
} ##if(cox | survreg
##
if(lme4){
## use predict.merMod gives the linear score of
## fixed effects + estimated random effect
## fixed effect linear score calculates as per
## fixed effect model. The difference used to
##estimate random effects
pred_tot_score <- predict(reg, type="link")
if(weighted){ pred_tot_score <- rep(pred_tot_score,times=W)}
}
## if replicate weighted model, expand out x matrix to unit records
## need to hang on to original unexpanded rawdata
## for use in added.obs() which relies on row matching
## of rawdata to model.frame(reg)
rawdata_original <- rawdata
if(weighted){
#
x <- x[rep(row.names(x),times =W), 1:ncol(x)]
rawdata <- rawdata[rep(row.names(rawdata),times =W), 1:ncol(rawdata)]
##actualdata <- actualdata[rep(row.names(actualdata),times =W), 1:ncol(actualdata)]
}
##----------------------------------------------------------------
## large data? Base distributions on subsample size nsamp (default 10000)
## take subsample. Do so on first pass through
if(nrow(x) > nsamp){
nobs <- nrow(x)
## repeatable subsample
set.seed(1234)
subs <- sample(1:nrow(x), size=nsamp)
x <- x[ subs, , drop = FALSE]
if(lme4) {pred_tot_score <- pred_tot_score[subs]}
message(paste0("Distributions estimated with \"nsamp=",nsamp,
"\" random sub-sample of ", nobs ))
rawdata <- rawdata[ subs , , drop=FALSE]
}
##===============================================================
## need to mess about with parameters for coxph as it has no intercept.
##===============================================================
if(cox ){
betas <- coefficients[1:npars]
Lbetas <- Lcoefficients[1:npars]
Ubetas <- Ucoefficients[1:npars]
## fix up if npars=1, when x is a vector, coerce to matrix
x <- as.matrix(x)
X <- x[,1:npars, drop=FALSE]
intercept <- 0
nointercept <- FALSE
}
else ##if(cox)
{
# nai <- names(coefficients[1])
##
#nointercept <- FALSE
# if( !( nai =="(Intercept)" | nai=="Intercept" ) ){
if(nointercept_term ){
## model with no intercept, missing the word "Intercept"
## in coefficient list
## note: this is important. rms no-intersept models differ
## wrt parameterisation
nointercept <- TRUE
## switch off the "nointercept" indicator, which is only
## used for the way non-rms objects deal with no intercept
## parametrisation
## if(rms) nointercept <- FALSE
intercept <- 0
betas <- coefficients[1:npars]
Lbetas <- Lcoefficients[1:npars]
Ubetas <- Ucoefficients[1:npars]
X <- x[,1:npars,drop = FALSE]
}
else
{
## model with intercept
betas <- coefficients[2:npars]
Lbetas <- Lcoefficients[2:npars]
Ubetas <- Ucoefficients[2:npars]
sigcodes <- sigcodes[2:npars]
intercept <- coefficients[1]
X <- x[,2:npars,drop = FALSE]
npars <- npars -1
}
} ## else if(cox
##---------------------------------------------
## patch for remote possibility but has occurred
## fror exactly equal or near equal to 10 signif
## places beta-coefficients
notNA <- which(!is.na(betas))
Rbetas <- signif(betas,10)
Rbetas <- Rbetas[notNA]
dupes <- duplicated(Rbetas)
if(any(dupes) ) {
t <- which(table(Rbetas)>1)
message("Note: duplicate or near duplicate beta-coefficients for:")
## cant see how to output duplicates except in loop??
lt <- length(t)
for (it in 1:lt){
message(it,": ", paste(collapse=" ",names( which(Rbetas==names(t[it]) ) )) )
}
## this raises a problem with table(S[,i])
## seeing identical values and not distinguishing
## between factor levels. Case in point is:
# library(kulife)
# data(bees)
# model <- glm(Number ~ Locality + Type*Color,
# family=poisson, data=bees)
# XXX <- regplot(model,observation=TRUE,clickable=TRUE)
## fudge by ensuring betas are NOT equal adding random bit
wdupes <- which(dupes)
nw <- length(wdupes)
## print(betas[wdupes])
betas[wdupes] <- betas[wdupes] + runif(n=nw)*0.000001
}
##----------------------------------------
isinteraction <- vector(length=npars)
##
for (i in 1:(npars)){
## obtain last factor that is NOT an interaction (i.e. assume
## interactions all contain : and tagged last - have checked this to be the case )
if(length(grep(":", names( betas[i] ) ) ) == 0 ) {
isinteraction[i] <- FALSE}
else
{
isinteraction[i] <- TRUE}
}
##===============================================================
## check for logical variables in the data and coerce them into appearing
## as factors, by augmenting the reg$xlevels list, which gives levels of
## factor variables.
##===============================================================
nvars <- length(variable_names)
for(k in 1:nvars){
#
whichv <- which(names(actualdata)==variable_names[k])
type_v <- class(actualdata[, whichv ] )
if(type_v[1]=="bs" | type_v[1]=="ns" | type_v[1]=="rcs" | type_v[1]=="poly" ) {
if( getX(variable_names[k])[2] != paste0(type_v[1],"()") ){
return( message( paste0(type_v[1],"() functional must be specified within formula")) )
}}
if(type_v[1] == "logical"){
# message(paste("note: logical variable ",variable_names[k]," coerced to factor"))
# ### try coercing to look like a factor
xxx <- c(xlevels,list(c("FALSE","TRUE")))
names(xxx)[length(xxx)] <- variable_names[k]
xlevels <- xxx
}
#
} ## end for(k in 1:nvars)
## factorvarnames does not include interactions at this point
factorvarnames <- names(xlevels)
n_interaction <- sum(isinteraction)
firstfactor <- 0
isfact <- FALSE
## which interactions are interactions of factors?
## try to fudge in xlevels for factor interactions.
## nvar=lenghth(variable_names)
## note : nvars may not be long enough because with interactions variable_names
## may be augmented. Set length to npars, should always be enough
isa_pseudo <- vector(length=npars)
DF <- vector(length=nvars)
isa_factor <- vector(length=nvars)
names(isa_factor) <- variable_names
names(DF) <- variable_names
#
irow <- 1
## determine type of variables in the raw data
## get names of variables used from actualdata
datavars <- names(actualdata)
##
dataclass <- lapply(actualdata,class)
factorvar <- ifelse(dataclass == "factor" ,TRUE,FALSE)
pseudofactor <- rep(FALSE,times=length(factorvar))
kernel_varname <- rep("",times=length(factorvar))
## factorvar is logical indicator of whether varaibles are factors.
## it includes first element is the outcome Y
## but doesnt account for interactions
## entend length to include intractions
kernel_varname <- rep("",times=length(isinteraction) + 1)
names(pseudofactor) <- names(factorvar)
lvls <- vector(length=length(datavars))
names(lvls) <- datavars
## Note: datavars includes outcome in first position.
## and fix up logical varaibles to behave as factors. WHY???
prohib <- NULL
for(v in 1:length(datavars)){
if(factorvar[v]){lvls[v] <- nlevels(actualdata[,v])}
if(dataclass[v] == "logical"){
factorvar[v] <- TRUE
lvls[v] <- 2
}
cls <- unlist(dataclass[v])[1]
## for rms object, above doesnt work (er rcs bot return, instead "rms"
cls <- getX(datavars[v],outer=TRUE)[2]
if(cls=="cut()" | cls == "bs()" | cls == "poly()" |
cls == "ns()" | cls=="rcs()"){
uses_bs <- cls == "bs()"
uses_poly <- cls == "poly()"
uses_ns <- cls == "ns()"
uses_rcs <- cls == "rcs()"
if(confI){
prohib <- c(prohib, cls)
}
factorvar[v] <- TRUE
##
if(rms ){
## rms changes names of coefficients e.g for bs() names are "1", "2" "3"..
## need to patch in the data varaible name e.g to give "bs(age)1", "bs(age)2"...
## rcs(Age) transforamtion in rms makes element eg Age Age' Age'' etc
## need to change to standard structure rcs(Age)1 rcs(Age)2 rcs(Age)3...
## can use element reg$asssign which has standard names
## later : no longer needed swith names(coefficients) <- colnames(x) for model matrix x
# zz <- which(names(reg$assign)==datavars[v])
# coeffv <- unlist(reg$assign[zz])
# names(coefficients)[coeffv] <- paste0(datavars[v],names(coefficients)[coeffv])
#
#
## and later: Simon T has example rcs() in Glm model fails. This is
## point o failure, specifying levels. Unclear why I need +1?
lvls[v] <- length(grep(datavars[v], colnames(X),fixed=TRUE)) +1
}
else
{
## patch for possibility of for example bs(X,knots=4). The "=" sign has
## been removed from names(coefficents) which has form
## bs(X, knots 3)1 bs(X, knots 3)2 ...
## but is in datavars (which is names(actualdata) ) still includes "=".
## Need tp strip away to count the components of bs()
## not required for rms() models for which datavar names are pasted back
## in previous patch!
### datavars <- gsub("=","",datavars)
lvls[v] <- length(grep(datavars[v], names(coefficients),fixed=TRUE)) + 1
}
##
pseudofactor[v] <- TRUE
## unless cut()??
if(cls=="cut()") {
pseudofactor[v] <- FALSE
## isa_factor[v] <- TRUE
## not allowed labels = in cut()
XXX <- grep(datavars[v], pattern="labels = ")
if(length(XXX) > 0){
XXX <- grep(datavars[v], pattern="labels = NULL")
if(length(XXX)==0){
return(message("\"labels = \" in cuts() not allowed. Use default"))
}}
}
}
} #or(v in 1:length(datavars))
if(!is.null(prohib)){
message(paste(" interval =\"confidence\" not supported for ",paste(list(prohib))))
confI <- FALSE
}
##
## fix up possibility of a character class.
## make it like a factor.
for(v in 1:length(datavars)){
if(dataclass[v] == "character") {
factorvar[v] <- TRUE
lvls[v] <- length(unique(actualdata[,v]))}
}
## relies on ordering of effects - interaction last
mixed_int <- FALSE
is.mixedint <- logical(length=nvars)
n_interaction_vars <- 0
parameter_names <- names(betas)
# ## bug detect here for rms model X1*(X2 >0)
# ## emanating from here. pars_per_var() not correctly
# ## working because variable_name and parameter_name mismatch.
# ## fix with setting parameter_names as variable_names
# ## KAEON!!
# ## later: yest this results in failure for interaction model:
# if(rms) parameter_names <- variable_names
## general case: just strip away intercept?
# parameter_names <- names(coefficients)
# if(!nointercept) parameter_names <- parameter_names[-1]
nmain_effects <- 0
for (k in 1:nvars){
isfact <- FALSE
nway <- length(grep(":", variable_names[k] ))
## is an interaction term:
if(nway>0){
## is an interaction term
n_interaction_vars <- n_interaction_vars +1
##browser()##
## DF[k] <- pars_per_var(variable_names[k],parameter_names)
## DF[k] <- pars_per_var(variable_names[k],variable_names[1:nmain_effects])
ivars <- unlist(strsplit(variable_names[k],":"))
## alt way to get DF?
## which of ivar are factors or logical
arefact <- which(dataclass[ivars]=="factor" | dataclass[ivars]=="logical")
## what are levels of these?
if(length(arefact) > 0 ){
# dfs <- 1
# ## assume dfof the term is (L1-1)(L2-1) for number of levels L1,L2
# ## of factors
# ## unless there are no main effects when dont subtract 1
#
# for(nfs in 1 : length(arefact)){
# one <- ivars[nfs] %in% variable_names
# dfs <- dfs * ( length(unlist( xlevels[names(arefact[nfs])] ) )- one )
# }
# DF[k] <- dfs
# ## browser()
## new way to establish DF using a function to match coefficents to varaibles
DF[k] <- inter_coeff(names(arefact),names(coefficients) )
}
else ## if(length(arefact) > 0 )
{
## no factors in interaction, so contributes 1 d.f
DF[k] <- 1}
for(v in 1:length(ivars)){
iv <- which(names(dataclass)==ivars[v])
class <- unlist(dataclass[iv])[1]
if(class== "poly" | class== "bs" | class== "ns" | class== "rcs" ){
return(message("interactions that include class ",paste(class), "() not supported"))
## why? Not sure but fails sum(DF) == npars checksum
}
}
isa_factor[k] <- FALSE
isa_pseudo[k] <- FALSE
if(all(factorvar[ivars])){
## factor*factor interactions
## all are factors in the interaction
## establish how many parameters for this combo (degrees of freedom)
## it is a factor-interaction of order "nv"
## unclear whether this bit of code necessary
## because labels of factor interactions
## obtained from the parameter name. Anyway, retain as is.
# EE <- expand.grid(xlevels[ivars], stringsAsFactors=FALSE)
# ## IS THIS REDUNDANT CODE?
# ## Yes I think so. "KANEON "xlevels" returned
# ## by routines setting up regression parameters e.g. glm_1, polr_1 etc
# browser()
# cross <- do.call(paste0, EE)
# xxx <- list(xxx=cross)
# names(xxx) <- variable_names[k]
# ## xlevels <- c(xlevels,xxx)
isfact <- TRUE
isa_factor[k] <- TRUE
##browser()
}
else ##if(all(isa_factor
{
## numeric*factor interactions
## prohibition of bs() and poly() interactions
classes <- unlist(dataclass[ivars])
if(is.element("poly",classes) & is.element("bs",classes) )
{return(message("poly() with bs() interaction not allowed"))
}
## this is not a good test DF[k] can = 1
## patchin a MUSTDO!
if(DF[k] > 1 | TRUE ){mixed_int <- TRUE
is.mixedint[k] <- TRUE}
}
}
else ##if(nway >0
{
## not an interaction
nmain_effects <- nmain_effects + 1
if(factorvar[variable_names[k]]){
DF[k] <- lvls[variable_names[k]] -1
isfact <- TRUE
isa_factor[k] <- TRUE
isa_pseudo[k] <- FALSE
##
if(pseudofactor[variable_names[k]] ) {
isa_pseudo[k] <- TRUE
isa_factor[k] <- FALSE
isfact <- FALSE
##establish raw data name for the pseudo factors
## eg. pick out "x" from "bs(x, knots=3)"
gx <- getX(names(pseudofactor[variable_names[k]]),outer=TRUE)
##
if(gx[2] !=""){
## strip away anything to right of comma , e.g rcs(Age,knots=3)
## gx[1]="Age,knots=3" leaving Age
kernel_varname[k] <- unlist(strsplit(gx[1],","))[1]
#
}
}
}
else ## of if(factorvar[variable_names[k]
{DF[k] <- 1
isa_pseudo[k] <- FALSE
isa_factor[k] <- FALSE
gx <- getX(variable_names[k])
if(gx[2] !=""){
## kernel_varname[k] <- unlist(strsplit(gx[1],","))[1]
## kernel_varname[k] <- raw_variable_names[k]
kernel_varname[k] <- gx[1]
}
}
}
## fudge in the additional parameter inserted with nointercept
## model when there is a factor. This required only for non-rms models.
## requiring adding another degree of freedom for the
## additional category
if(nointercept & isfact & firstfactor == 0 & !rms) {
firstfactor <- k
## further patch for case of no main effects and only interaction
## e.g Y ~ 0 + sex:ethnicity, KANEON.
if(nmain_effects>0 ) DF[k] <- DF[k] +1
}
} ## end for(k in 1:nvars)
haslikefactors <- any(isa_pseudo)
##
##-----------------------------------------------------
## check on whether main effects for interactions.
## It has bearing on Cox model: basehaz() which won't
## work unless all main effects included
if(n_interaction_vars>0){
nomain <- NULL
main_effects <- variable_names[1:nmain_effects]
## check for non-allowed pseudo variable in interaction?
if(any(pseudofactor)){
pseudof <- which(pseudofactor==TRUE) - 1
for(ps in 1:length(pseudof)){
lg <- grep(x=variable_names[(nmain_effects +1):nvars],pattern=names(pseudof)[ps],fixed=T)
if(length(lg) >0){
return(message( paste(names(pseudof)[ps])," main effect not supported in interaction"))
}
}
}
for (k in (nmain_effects +1):nvars){
ivars <- unlist(strsplit(variable_names[k],":"))
for(v in 1:length(ivars)){
if(!is.element(ivars[v],main_effects)) {nomain <- c(nomain," ",ivars[v])}
}
} #for(k in...)
if(!is.null(nomain)){
nomain <- unique(nomain)
message("Note: interaction but no main effects for:",paste(nomain,collapse=" "))
if(cox & is.null(baseS)){
return(message("Lower order effects required for Cox, or specify \"baseS\" parameter"))}
}
}
##-----------------------------------------------
## for mixed interactions (factor*numeric) in which the factor
## has more than one level, need to
## fudge variable_names, and update other arrays, after slotting
## in the names of the associated parameters
##
if(mixed_int){
new_variable_names <- NULL
new_DF <- NULL
new_isa_factor <- NULL
i <- 0
for(k in 1:nvars){
if(is.mixedint[k]){
## append the names of the interaction terms
mxname <- parameter_names[(i+1):(i+DF[k])]
new_variable_names <- c(new_variable_names, mxname)
n_interaction_vars <- n_interaction_vars +DF[k] -1
new_DF <- c(new_DF,rep(1,times=DF[k]) )
new_isa_factor <- c(new_isa_factor,rep(FALSE,times=DF[k]))
}
else
{
new_variable_names <- c(new_variable_names, variable_names[k])
new_DF <- c(new_DF,DF[k])
new_isa_factor <- c(new_isa_factor,isa_factor[k])
}
i <- i +DF[k]
}
variable_names <- new_variable_names
nvars <- length(variable_names)
isa_factor <- new_isa_factor
DF <- new_DF
}
## end of mixed terms
#----------------------------------------------
## check that number of parameters matches number of degrees of freedom
## if not something serious has gone wrong.
if(sum(DF) != npars ){
##browser()
message("Internal checksum fail. DF",paste0("(",sum(DF),")"),
" & no. parameters",paste0("(",npars,")")," mismatch. Contact developer")
message("quitting")
return("fail")
}
##=============================================================================
n_interaction_rows <- n_interaction_vars
##factorvarnames <- names(xlevels)
## number of main effect parameters
n_main_pars <- npars - n_interaction
n_factor_rows <- sum(isa_factor)
n_non_factor_rows <- sum(DF[which(!isa_factor)])
## set up the rows
n_non_factors <- npars - sum(DF,na.rm=TRUE)
## make sure factors main effcts (they might not be, could be just interaction)
index_factornames <- which(is.element(factorvarnames,variable_names))
levels <- xlevels[ index_factornames]
## number of factors
n_main_factors <- length(levels)
if(n_main_pars ==0){
message("This model has no main effects")
}
n_factors <- length(xlevels)
nrows <- nvars
row_names <- variable_names
inv <- checkvarsnew(row_names,FALSE)
raw_row_names <- unlist(inv[1])
inv_row_names <- unlist(inv[2])
wisa_pseudo <- which(isa_pseudo==TRUE)
if(length(wisa_pseudo) >0 ){
## which pseudo variables are specified in formula e.g bs(age))
## these shown on nomogram with raw names
## w <- which(kernel_varname != "")
## raw_row_names[w] <- kernel_varname[w]
raw_row_names[wisa_pseudo] <- kernel_varname[wisa_pseudo]
## browser()
## inv_row_names[w] <- "xxx"
##message(paste( paste(" ",raw_row_names[wisa_pseudo]), " not clickable") )
}
##
## possibility that functional formal may not correspond to a data variable
## eg log(X1 + X2)
test <- which(is.element(kernel_varname,c("",colnames(rawdata) ) ) == FALSE)
if(length(test)>0){
##message("no variable named ", kernel_varname[test]," in data")
## but may still be computable ?? Strange code?
## later failing for log(X2+10) in formula. Because X2 not in data.
## actaully will pass this test but fails in look_up when clicking
## on a variable value. But thought I had issue lof log(X +c) sorted
## before look_up system introduced.
## return( message("no variable named ", kernel_varname[test]," in data") )
##browser()
test2 <- which(is.element(raw_variable_names[test],c("",colnames(rawdata) ) ) == FALSE)
if(length(test2) > 0){
message("Function ", raw_variable_names[test]," problem")
message("quitting")
return("fail")
}
#
}
vact <- vector(length = npars )
isadummy <- vector(length = npars )
vact_names <- vector(length = npars )
vact <- seq(1:npars)
isadummy <- rep(TRUE,times=npars)
i <- 0
k <- 0
for(k in 1:nvars ){
nlev <- DF[k]
## need to create vact() giving variable number of each parameter (i)
for(j in 1:nlev){
i <- i + 1
vact[i] <- k
if(isa_factor[k]) {
isadummy[i] <- TRUE
vact_names[i] <- variable_names[k]
}
else
{
isadummy[i] <- FALSE
vact_names[i] <- variable_names[k]
}
}
}
S <- X
## --------------------------------------------------------
## extract scores S[,i] associated with each beta-coefficient
## --------------------------------------------------------
for (i in 1:npars){
S[,i] <- X[,i]*betas[i]
}
##------------------------------------------------------------------------------------
## center the scores and data if center
if(center){m <- colSums(X)/nrow(X)
}
else
## default align to minimum values?
## do I want this?? Later. No align to zero so that actual
## beta*X scores or shown
{ #m <- apply(X,2,min)
m <- rep(0,times=ncol(X))
}
## exclude factor variables from centering, make 0 always the reference
m[which(isadummy)] <- 0
##
X <- scale(X,center=m,scale=FALSE)
bm <- m*betas
S <- scale(S,center=bm,scale=FALSE)
intercept <- intercept +sum(bm, na.rm=TRUE)
SS <- matrix(nrow=nrow(X),ncol=nrows)
XX <- matrix(nrow=nrow(X),ncol=nrows)
i <- 1
k <- 1
while (i <= npars) {
dummys <- which(vact==k)
if(length(dummys) >1) {
SS[,k] <- rowSums( S[,dummys], na.rm=TRUE)
XX[,k] <- SS[,k]
i <- i + length(dummys)
k <- k+1
}
else
{
SS[,k] <- S[,i]
XX[,k] <- X[,i]
## if isa_pseudo need to ensure XX is betaX ie. =SS
## why? usually with isa_pseudo e.g bs() dummys is
## multivalued
if(isa_pseudo[k] ) XX[,k] <- SS[,k]
i <- i+1
k <- k+1
}
} ##while (i <= npars)
colnames(SS) <- row_names
## replace S and X by collapsed values
S <- SS
X <- XX
## number panels show. Normally =nrows, except when interactions
npanels <- nrows
##------------------------------------------------------------
## set graphic boundaries. Max and min of beta-scores or fraction thereof??
L <- min(S[,],na.rm=TRUE)
M <- max(S[,],na.rm=TRUE)
diff <- (M-L)
tickl <- (M-L)/50
tot_score <- rowSums(S, na.rm=TRUE)
##factr <- vector(length=nrows) + for possibility of mixed
isinteraction_row <- vector(length= (nrows ))
if(lme4){
#
# ## for lmer, ignore tot_score as computed, and use
# ## predict() in P-ranef values, noting tot_score is minus intercept
# ## can calculate actual random effects by difference of predict()
# ## and sum of fixed effects
random_effects <- pred_tot_score - (tot_score + intercept)
RER <- range(random_effects)
## adjust L and M boundary markers to accommodate a (possibly greater)
## spread of random effects
L <- min(L,RER[1])
M <- max(M,RER[2])
## override tot_score with predicted (less intercept, added back in later)
## to use Lm_scale() nomogram function
tot_score <- pred_tot_score - intercept
## add additional panel at top for random effects
npanels <- npanels +1
}
} ##@if(FIRSTRUN)
##==================================================================
nstrata <- 1
if(!is.null(strata_levels)) nstrata <- length(strata_levels)
#
make_space <- 2 +(ntimes-1)*0.6
if(FIRSTRUN) {
if(dev.cur() > 1) dev.off()
plot.new()
## not sure about plotmargin, value seems to have no effect
## whatsoever.
# mai A numerical vector of the form c(bottom, left, top, right)
# which gives the margin size specified in inches. Figure: mai.png
#
# mar A numerical vector of the form c(bottom, left, top, right)
# which gives the number of lines of margin to be specified on
# the four sides of the plot. The default is c(5, 4, 4, 2) + 0.1.
plotmargin <- 5
par(mar=c(0,0,0,0) + plotmargin, mai=c(0,0,0,0),bg = "#FFFFFF")
plot(c(L-0.3*(M-L),M+0.1*(M-L)), c(0.5,npanels+2 + make_space),col="white",
bty="n",yaxt="n",xaxt="n")
## "n" supresses border and x and y axes.
##nrows <- npars
## delta controls positioning of scale annotation and spacing, length of ticks etc
## one delta unit below
delta <- 0.05 + (0.2-0.05)*(npanels)/10
## long and short tick length
ltick <- 0.4*delta
stick <- 0.2*delta
##factr <- vector(length=nrows) + for possibility of lme4
##isinteraction_row <- vector(length= (nrows + 1))
}
else ##if(FIRSTRUN)
{
if(nudist ){
plot(c(L-0.3*(M-L),M+0.1*(M-L)), c(0.5,npanels+2 + make_space),col="white" )
}
else
{
replayPlot(bareplot)
}
}
sumcheck <- 0
aspect <- 1.4*(M-L)/(npanels+2 + make_space -0.5)
# #try scaling point axis 0,-100 (L,M)
if(FIRSTRUN){
lscale <- max(S[,],na.rm=TRUE)- min(S[,],na.rm=TRUE)
## set up ranking permutation for rank=TRUE
## (random effects not included in this)
if(rank){
Rng <- vector(length=nrows)
for(i in 1:nrows){
if(howrank=="sd"){
Rng[i] <- sd(S[,i])
}
else
{
r<- range(S[,i])
Rng[i] <- r[2]-r[1]
}
}
irank <- order(Rng,decreasing=TRUE)
}
else ## if(rank)
{irank <- seq(1:nrows)}
row_of_panel <- vector(length=(nrows ))
## checks on plot types
plot.type <- rep(plots[1],times <- npanels + 2)
plot.typef <- rep(plots[2],times <- npanels + 2)
selected <- rep(FALSE,times <- npanels + 2)
} ##if(FIRSTRUN)
#=================================================================
## main graphics loop over potential nrows of the graphic.
## Those shown are "panels"
#=================================================================
if(nudist ){
none <- all(plot.type=="no plot")
g <- 1
## note: this "ghost" loop for(ghost in 1:g) is device to add confidence intervals
## to beta-coefficients. It repeats the same drawing code
## but with betas replaced by Lbetas (ghost=2) and
## Ubetas (ghost=3). U and L beta coefficients are upper and lower
## intervals, as previously calculated.
## the confidence bars are actually added on when ghost=3.
##
if(coeffCI) g <- 3
for(ghost in 1:g){
CIcol <- "#999999"
lwid <- 1
if(ghost==1) {gbetas <- betas
axcol <- "black"
subt <- showsubticks
cexc <- cexcats
reftick <- vector(length=nrows)
reftickval <- vector(length=nrows)
center_tickpos <- vector(length=nrows)
ref_ticks_pos <- vector(length=nrows)
beta_order <- list(length=nrows)
}
if(ghost==2) {gbetas <- Lbetas
## lower confidence scale
lowerCI <- vector(length=nrows)
axcol <- "white"
subt <- FALSE
cexc <- cexcats*0.75 }
if(ghost==3) {gbetas <- Ubetas
## upper confidence scale
axcol <- "white"
subt <- FALSE
cexc <- cexcats*0.75
}
npanel <- 0
spacegap <- paste(rep(" ",times=8),collapse="")
## try a trick to only do matrix work in caxisX()
## on the first run
if(FIRSTRUN){BXrawX <- list()
}
for (row_num in 1:nrows){
i <- irank[row_num]
## deal with stat significance, choose largest
## note: max("","*","**","***") returns "***"!
iv <- which(vact==i)
if(length(iv)==1){
ss <- sigcodes[iv]
}
else
{
if(all(is.na(sigcodes[iv]))){
ss <- NA
}
else
{
ss <- max(sigcodes[iv], na.rm=TRUE)
}
}
isint <- isinteraction[iv[1]]
## isint <- (row_num > nmain_effects)
isinteraction_row[i] <- isint
## patching switch to omit showing interaction panels
## showpanel <- ( !isinteraction_row[i] | showi )
## prior showi=TRUE ensures showpanel always TRUE
showpanel <- TRUE
isNApanel <- is.na(S[,i][1])
##row_names may include fuction e.g log(age) (race="black") etv
v <- row_names[i]
gX <- getX(v)
## except for logicals () use actual variable name
## eg as.factor(sex) makes v = sex
if( gX[2] !="()"){ v <- raw_row_names[i]}
numeric <- !isa_factor[i] | isa_pseudo[i]
if(showpanel) {npanel <- npanel + 1
row_of_panel[npanel] <- row_num}
ipos <- npanel+0.5 +make_space
if(numeric ){
##====================================================================
### numeric (non-factor variable) or interaction
## make a distribution type
##====================================================================
## isNApanel indicates NA beta coefficient for a panel. Use to skip
## drawing distribution
if(!isNApanel){
uniq <- sort(unique(S[,i]))
Xuniq <- sort(unique(X[,i]))
luniq <- length(Xuniq)
B <- gbetas[which(vact==i)]
PT <- plot.type[npanel]
if( PT !="no plot" & showpanel & ghost==1){
if(PT=="density") { myd <- mydensity(S[,i],ipos,dencol)
if(myd) {
message(paste("singular density for ", row_names[i],"?"))}
}
if(PT=="bars") { myd <- myhist(S[,i],ipos,dencol)
if(myd) {
message(paste("singular value for ", row_names[i],"?"))}
}
if(PT=="ecdf") { mycumul(S[,i],ipos,tickl,dencol) }
## dencol or boxcol?? Maybe dencol?
if(PT=="boxes"){ myboxes(S[,i],ipos,dencol,aspect,L,M)}
if(PT=="spikes"){ myspikes(S[,i],ipos,spkcol,aspect,L,M) }
## trialling myhist
if(PT=="bars"){ myhist(S[,i],ipos,dencol) }
if(PT=="violin"){
bandwidth <- 0.01*(max(S[,i])-min(S[,i]))
box <- vioplot(S[,i], add=TRUE,at=ipos+0.3,horizontal=TRUE,wex=0.5, border="blue",
h=bandwidth, colMed="black", pchMed=21, col=dencol)
## bug: vioplot() outputs NULL whether assigned box <- or not.
## must be an internal to vioplot output
## cant see how to supress
}
if(PT=="bean" ){
bandwidth <- 0.01*(max(S[,i])-min(S[,i]))
box <- beanplot(S[,i], add=TRUE,at=ipos+0.3,horizontal=TRUE, maxwidth=0.5,
what = c(0,1,0,1), bw=bandwidth, log="", overallline="median",beanlinewd=0,
kernel="epanechnikov", axes=FALSE, border="blue", col=dencol,method="overplot",
ll=0.1 , frame.plot=FALSE, pars = list(boxwex = 0.8, staplewex = 0.5,
outwex = 0.5,cex=0.5))
}
if(PT=="boxplot"){
box <- boxplot(S[,i], add=TRUE,at=ipos+0.3,horizontal=TRUE, border="blue",col=dencol,
pars = list(boxwex = 0.8, staplewex = 0.5, outwex = 0.5,cex=0.5))
}
} ##if(!none
## try adding confdence scale
## now add the variable scale ruler
## extract beta value for this numeric variable.
Beta <- gbetas[which(vact==i)]
##
mean <- m[which(vact==i)]
## nmax parameter used in pretty()
## what is 10? Maximum ticks, allows more of scale spaced?
nmax <- round(min(8, 1+10*(max(S[,i]) -min(S[,i]))/lscale ),0)
##nmax <- round(min(5, 1+5*(max(S[,i]) -min(S[,i]))/lscale ),0)
## fix nmax?
## continuous case
## if(showpanel){
## draw axis of continuous data, tranferring code. Messy, but trying to isolate
#if(ghost==1){
if(ghost==1) {
##@
## fudge this for the case when i is isa_pseudo, in which case
## X[,i] is sum of beta*x over the components of X
if(!isa_pseudo[i] ){
## is it a functional value?? kernel_varname[i] != ""
if(kernel_varname[i]==""){
axe <- plot_caxis(X,i,mean,nmax,Beta, isint,row_names,
raw_row_names,inv_row_names,gX, subt,
L,M,ipos,ltick,stick,delta,cexc,axcol)}
else
{
## eg. function such as log(),
## rawX <- rawdata[,kernel_varname[i]]
## test for variable in data
if(!is.element(raw_variable_names[i],colnames(rawdata)) ){
return(message(paste("variable ",raw_variable_names[i],
"expected in data")))
}
rawX <- rawdata[,raw_variable_names[i]]
BXrawX_i <- NA
if(!FIRSTRUN){
BXrawX_i <- BXrawX[[i]]
}
axe <- plot_caxisX(X[,i]*Beta, raw_variable_names[i],row_names[i],
L,M,ipos,ltick,delta,cexc,axcol,rawX,splineplot,FIRSTRUN,BXrawX_i)
if(FIRSTRUN){
BXrawX[[i]] <- axe[[6]]
}
}
} ##if(!isa_pseudo[i])
else ##if(!isa_pseudo[i])
## isa_pseudo, ie. possible spline, bs or poly, rcs()
{
if(kernel_varname[i]==""){
## no attached raw variable, i.e. function defined external to
## regression call
axe <- plot_caxis(X,i,sum(mean*Beta),nmax,1.0, isint,row_names,
raw_row_names,inv_row_names,gX, subt,
L,M,ipos,ltick,stick,delta,cexc,axcol)}
else ##if(kernel_varname[i]==""
{
# associated raw variable name, try to form axis from raw data values
rawX <- rawdata[,kernel_varname[i]]
## X[,i] is sum of b*X's over the splines
BXrawX_i <- NA
if(!FIRSTRUN){
BXrawX_i <- BXrawX[[i]]
##BWrawX_i <- matrix(BXrawX_i,ncol=2,nrow=length(BXrawX_i)/2)
##
}
axe <- plot_caxisX(X[,i], raw_row_names[i],row_names[i],
L,M,ipos,ltick,delta,cexc,axcol,rawX,splineplot,FIRSTRUN, BXrawX_i )
if(FIRSTRUN){
## save the returned X,rawX matrix for subsequent use
## avoiding recalculating
BXrawX[[i]] <- axe[[6]]
}
}
raw_row_names[i] <- kernel_varname[i]
##fill in with any non-NA value
inv_row_names[i] <- "xxx"
} ##if(!isa_pseudo[i])
ticks_pos <- axe[[1]]
tickval <- axe[[2]]
ticks <- axe[[3]]
mid <- floor(length(ticks)*.5) +1
ref_ticks_pos[row_num] <- list(ticks_pos)
##
reftick[row_num] <- list(ticks)
reftickval[row_num] <- list(tickval)
## reftickpos[row_num] <- ticks_pos[mid]
center_tickpos[row_num] <- sum(mean*Beta)
##
##if(row_num==1)
}
#}
likeafunction <- (!isa_pseudo[i] & kernel_varname[i]!="")
##-------------------------------------------------------------
## note to myself. Current architecture looping over ghost,1,2,3 makes it impossible to
## get confidence intervals for "likeafactor" variable e.g rcs() or bs()
## because the matrix X, previously built, has beta-values built into it
## ie. X[,i] is sum(beta*S) for splines S. The ghost=1,2,3 loops over
## lower(ghost=2) and upper (ghost=3) values of beta. I cannot see how to update
## the X matrix . Hence the !isa_pseudo[i] exclusion for ghost 2 and 3
##-------------------------------------------------------------
if(ghost==2 & !isa_pseudo[i] ){
if(likeafunction) {
pos <- NULL
ticks <- unlist(reftick[row_num])
## use look up table to get graph positions for untransformed "ticks"
## eg. ticks are actual Ages of log(Age)
## do reorder once out of the loop, for efficiency
XB <- X[,i]*Beta
Raw <- rawX
o <- reorderxy(Raw,XB)
Raw <- unlist(o[1])
XB <- unlist(o[2])
for(j in 1:length(ticks) ){
pos <- c(pos,lookup(Raw,XB,ticks[j],reorder=FALSE))
}
lowerCI[row_num] <- list( pos )
} ##if(likeafunction)
else
{
lowerCI[row_num] <- list(unlist(reftick[row_num])*Beta - center_tickpos[row_num])
} ##else if(likeafunction)
}#if(ghost==2 & !isa_pseudo[i]
if(ghost==3 & !isa_pseudo[i]){
if(likeafunction) {
ticks <- unlist(reftick[row_num])
pos <- NULL
# ## eg. ticks are actual Ages of log(Age)
XB <- X[,i]*Beta
Raw <- rawX
o <- reorderxy(Raw,XB)
Raw <- unlist(o[1])
XB <- unlist(o[2])
for(j in 1:length(ticks) ){
pos <- c(pos,lookup(Raw,XB, ticks[j],reorder=FALSE))
}
upperCI <- pos
}
else
{
upperCI <- unlist(reftick[row_num])*Beta - center_tickpos[row_num]
}
low <- unlist(lowerCI[row_num])
##diff <- upperCI -low
nints <- length(low)
## draw intervals on beta*X at axis main tick marks
## at most 5 intervals
everyother <- ceiling(nints/5)
sq <- seq(from=1,to=nints,by=everyother)
ibar <- 0
for(intvx in 1:length(sq)){
intv <- sq[intvx]
## bar of continuous variables
## draw CI bar if wide enough?
## browser()
if( abs(low[intv]-upperCI[intv]) > 0.001 ){
ibar <- ibar +1
about <- ibar*0.07
# refv <- reftickval[row_num]
segments(low[intv], ipos+about, upperCI[intv], ipos+about,col=CIcol,lwd=lwid)
midp <- (low[intv]+upperCI[intv])*0.5
points(midp, ipos+about, col=CIcol,pch=19,cex=0.8,bg=CIcol)
}}
} ##if(ghost==3 & !isa_pseudo[i]
} ##if(!isNApanel)
##left hand varaibles
if(ghost==1){
# put in left hand axis labels
lcol <- "black"
## if(showpanel){
## use italics, font=3, for interactions and
## possible formula function gX[2] !=""
##if(isint | gX[2] != ""){
if(isint ){
## try this to determine whether a factor*numeric interaction.
## factor will be of for sexF for sex so not in variable_names
ivars <- unlist(strsplit(v,":"))
## what is the logic of this? I think only want to
## invoke revo() for numeric*factor interactions.
## if numeric*numeric ivars match the
## variable_names??
if(!all(is.element(ivars,variable_names)) ) { v <- revo(v,variable_names) }
## replace * by : in variable names?
v <- gsub(":", " * ", v, fixed = TRUE)
if(leftlabel){ text(x=L-0.27*(M-L),y= ipos+0.4,paste(v,ss,sep=""),cex=cexvars, font=3,adj=c(0,1),col=lcol)
}
else
{
text(x=min(ticks_pos),y= ipos,paste0(v,ss,spacegap),cex=cexvars, font=3,adj=c(1,0),col=lcol)
}
}
else #if(isint | gX[2] != "")
{
if(leftlabel){
text(x=L-0.27*(M-L),y= ipos+0.4,paste(v,ss,sep=""),cex=cexvars,
adj=c(0,1),col=lcol,font=1)
}
else
{
text(x=min(ticks_pos),y= ipos,paste0(v,ss,spacegap),cex=cexvars,
adj=c(1,0),col=lcol,font=1)
}
}
## } ##if(showpanel)
} ##if(ghost==1)
##} #if(showpanel)
if(pointscale){
## points corresponding to main ticks (for output, no other use):
tick_points <- round(100*(ticks_pos -L)/(M-L))
## assume non-negative X values not allowed
positv <- which(tickval >=0)
pointsframe <- as.data.frame( cbind(tickval[positv], tick_points[positv]) )
colnames(pointsframe) <- c(v,"Points")
points_values[[nrows+1-i]] <- pointsframe
} ##if(pointscale)
} ##if(numeric)
else ##if(numeric)
{
##================================================
## draw frequencies of factors as boxes or spikes
##================================================
PT <- plot.typef[npanel]
## patch for default "boxes" if first run, with value density
## which has been default assigned for all plots
par_nos <- which(vact_names==variable_names[i])
if(!isNApanel){
B <- gbetas[par_nos]
## make the labels for the categories
if(!isint){
## use levels of factor variables unless an interaction
cats <- unlist(xlevels[variable_names[i]])
}
else
{
## factor-by-factor interaction term:
## using beta coefficient names as labels
## and set a baseline "ref" label
## remove variable names from combination
## eg sexF:EthnicPacific <- F:Pacific
## (nb:variable_names[i] does not have category values attached)
##
cats <- remo(variable_names[i],names(B),xlevels)
}
## augment B with baseline value zero
if(nointercept & firstfactor==i){
beta_values <- B}
else
{
beta_values <- c(0,B)
## artifical beta=0 for ref category
## pick up baseline category name
names(beta_values[1]) <- names(cats[1])
}
if(showpanel & ghost==1 ){
lb <- length(which(!is.na(beta_values) ) )
## can only do discrete boxes, bars, spikes or no plot
## so if selected, need to revert to previous type
## ensure that plot.type is saved
##
# if(!is.element(PT,c("spikes","boxes","bars","no plot"))){
# PT <- plot.typef[npanel]
# plot.type[npanel] <- PT }
PT <- plot.typef[npanel]
if(PT == "spikes" | PT == "boxes" | PT == "bars" ){
## need to order betas to match order of frequencies (frq) below,
## which are in increasing order of beta coefficents
ob <- order(beta_values)
beta_values <- beta_values[ob]
cats <- cats[ob]
##
## need to save if confince interavl on coefficients are display
## save anyway!
oblist <- as.list(ob)
beta_order[[row_num]] <- oblist
}
if(PT=="spikes" ){
## frq is ordered by increasing S[,i] ie.
## by increasing betas
frq <- as.numeric(table(S[,i]))
tot <- sum(frq)
frqx <- max(frq)
sqx <- tot/frqx
dy <- 0.6*(frq/tot)*sqx
dx <- 0.001* diff
xleft <- beta_values - dx
xright <- beta_values + dx
ybottom <- c(rep(ipos,times=lb))
ytop <- c(rep(ipos,times=lb)) + dy
segments(beta_values, ybottom,beta_values, ytop, lwd=3,col=spkcol)
## also add an axis
segments( min(beta_values),ipos,max(beta_values),ipos)
## put black labels at top of spike
text(beta_values,ipos+dy+0.15,paste(cats),cex=cexc,col="black")
}
## bar essentially same as spike but with a box.
if(PT=="bars" ){
## frq is ordered by increasing S[,i] ie.
## by increasing betas
frq <- as.numeric(table(S[,i]))
tot <- sum(frq)
frqx <- max(frq)
sqx <- tot/frqx
dy <- 0.6*(frq/tot)*sqx
halfwide <- min(diff(beta_values))/2.2
## but not too wide ??
halfwide <- min(halfwide, (M-L)/40)
## or fix width?? Yes, I think better
halfwide <- (M-L)/50
xleft <- beta_values - halfwide
xright <- beta_values + halfwide
ybottom <- c(rep(ipos,times=lb))
ytop <- c(rep(ipos,times=lb)) + dy
## sort by height, draw with largest first
o <- order(ytop, decreasing=TRUE)
ytop <- ytop[o]
ybottom <- ybottom[o]
xleft <- xleft[o]
xright <- xright[o]
rect(xleft, ybottom, xright,ytop,border="blue",col= boxcol)
## mark axis at actual value
points(beta_values,ybottom,pch=21,col="blue")
## also add an axis
segments( min(beta_values),ipos,max(beta_values),ipos)
## put black labels at top of spike
text(beta_values,ipos+dy+0.15,paste(cats),cex=cexc,col="black")
}
if(PT=="boxes") {
## draw boxes
myboxes(S[,i],ipos,boxcol,aspect,L,M)
## boxes drawn without labels, need to add labels, alternate positions
frq <- as.numeric(table(S[,i]))
tot <- sum(frq)
frqx <- max(frq)
sqx <- sqrt(tot/frqx)
dy <- 0.30*sqrt(frq/tot)*sqx
## NB NTM There is a problem here, identified with interaction
## when there are zero-frequencies of an interaction.
## if this happens the length of frq is same as number of non-zero
# interactions and is not the same as length(betavalues).
## will trow a warning in
## (dy+0.8*delta)*plmi since length(dy != lenght(plmi)
## could truncate length(plmi). But doesnt fix
## as labels (cats) wont necessarily match.
## How to fix??
##lb <- min(lb,length(frq))
plmi <- rep(c(-1,1), times=lb)
adj <- rep(c(1,0), times=lb)
plmi <- plmi[1:lb]
if(length(plmi) != length(dy)){
message("checksum error for plmi and dy. Contact developer")
}
posn <- ipos - (dy+0.8*delta)*plmi
text(beta_values,posn,paste(cats),cex=cexc,col="black")
}
if(PT=="no plot"){
{dx <- 0.01
dy <- 0.01
xleft <- beta_values - dx
xright <- beta_values + dx
ybottom <- c(rep(ipos,times=lb ) ) - dy
ytop <- c(rep(ipos,times=lb) ) + dy
ob <- order(beta_values)
plmi <- rep(c(-1,+1), times=lb)
plmi <- plmi[1:lb]
posn <- ipos - delta*plmi
##
## if(is.null(val)) val <- " "
##
text(beta_values[ob],posn,paste(cats[ob]),cex=cexc,col="black")
size <- 0.3
## draw vertical up/down ticks
## also add an axis
##
segments( min(beta_values),ipos,max(beta_values),ipos,col=axcol)
segments(beta_values[ob], ipos, beta_values[ob], ipos - 0.4*delta*plmi,col=axcol)
}
}
} ##showpanel & g=1 & lb >0
if(ghost==2){
lowerCI[row_num] <- list(beta_values )
}
if(ghost==3){
## coefficient confidence of factors
lwid <- 1
upperCI <- beta_values
lb <- length(beta_values)
low <- unlist(lowerCI[row_num])
plmi <- rep(c(1,-1), times=lb)
if(PT == "boxes" ){
##recall code for box position
frq <- as.numeric(table(S[,i]))
tot <- sum(frq)
frqx <- max(frq)
sqx <- sqrt(tot/frqx)
dy <- 0.3*sqrt(frq/tot)*sqx
#
}
if(PT == "spikes"){
frq <- as.numeric(table(S[,i]))
tot <- sum(frq)
frqx <- max(frq)
sqx <- tot/frqx
dy <- 0.6*(frq/tot)*sqx
}
if(PT == "boxes" | PT == "spikes" ) {
## need to reorder low[] and upperCI[] to agree with
## ordering of frequencies (saved as list beta_order)
ob <- unlist(beta_order[row_num])
low <- low[ob]
upperCI <- upperCI[ob]
}
for(cat in 1:lb){
## set about for "no plot"
about <- ipos + 0.1*cat
## if boxes put bar to align with box top or bottom
if(PT == "boxes" | PT == "spikes" ) {
## use dy[] to position vertically to meet with top of spike
## or bottom/top of box
if(PT == "boxes"){
about <- ipos - 1.2*( plmi[cat]*(dy[cat]) )}
if(PT == "spikes" ) {about <- ipos + dy[cat]}
}
## dont draw interval for baseline ?? or not NA
if(!is.na(low[cat])){
if(low[cat] != upperCI[cat]){
segments(low[cat],about, upperCI[cat], about,col=CIcol,lwd=lwid)
midp <- 0.5*(low[cat] + upperCI[cat])
## segments(midp, ipos-1.3*about, midp, ipos-0.3*about,col=CIcol)
points(midp,about,pch=19,cex=0.8, col=CIcol,bg=CIcol)
## text(low[cat],ipos-about,paste(cats[cat]),cex=cexc,col="black")
## text(upperCI[cat],ipos-about,paste(cats[cat]),cex=cexc,col="black")
}}
}
}
} ##if(!isNApanel)
if(ghost==1){
lcol <- "black"
if(showpanel){
if(isint){
##reduce font size for complexity of interaction (number of colons)??
ncolons <- length(unlist (gregexpr(":",v) ))
## prefer to show * instead of : for interactions? Nah
##v <- gsub(":", " * ", v, fixed = TRUE)
size <- cexvars- ncolons*0.1
if(leftlabel){ text(x=L-0.27*(M-L),y= ipos+0.4,paste(v,ss,sep=""),cex=cexvars, font=3,adj=c(0,1),col=lcol)
}
else
{text(x=min(beta_values),y= ipos,paste0(v,ss,spacegap),cex=cexvars, font=3,adj=c(1,0),col=lcol)
}
}
else #if(isint)
{
if(leftlabel){ text(x=L-0.27*(M-L),y= ipos+0.4,paste(v,ss,sep=""),cex=cexvars, adj=c(0,1),col=lcol)
}
else {text(x=min(beta_values),y= ipos,paste0(v,ss,spacegap),cex=cexvars, adj=c(1,0),col=lcol)
}
}
## if(showP) text(x=L-0.27*(M-L),y= ipos,paste(ss),cex=cexvars, col=lcol)
} #if(showpanel)
## points table on ghost==1
if(pointscale){
tick_points <- round(100*(beta_values-L)/(M-L))
##
pointsframe <- as.data.frame(cbind(cats,tick_points) )
colnames(pointsframe) <- c(v,"Points")
## make sure output ordered list same as graphic
points_values[[nrows + 1-i]] <- pointsframe
}
} ##if(ghost==1
} ##else of if(numeric
}
## end of main graphics loop for (row_num in 1:(nrows))
} ## for(ghost in 1:g loop
##-----------------------------------------------------------
## if mixed model add additional panel at the top.
##--------------------------------------------------------
if( lme4 ){
ipos <- npanel+1 +0.5 +make_space
## make pretty scale. May be small REs, so dont want bunching on axis?
nmax <- 4
eps <- 0.0001
## very small range of random_effects
if((RER[2]-RER[1])/(M-L) < eps) {
ticks <- signif( (RER[2]+RER[1])/2,4)
}
else
{nmax <- round( 10* (RER[2]-RER[1])/(M-L) )
ticks <- pretty(random_effects,n=nmax)
}
segments(min(ticks),ipos,max(ticks),ipos,col="black",)
segments(ticks,ipos,ticks,ipos - ltick,col="black",)
if(showsubticks) subticksf(ticks,ticks,ipos,stick)
text(x=ticks,y= ipos-delta,paste(ticks),cex=cexcats,col="black",)
if(leftlabel){ text(x=L-0.27*(M-L), adj=c(0,1),y= ipos+0.4,
paste0("[RE:",randomv,"]"),cex=cexvars, font=3, col="black")
}
else
{
text(x=min(ticks),y= ipos,
paste(paste0("[RE:",randomv,"]"),spacegap),cex=cexvars, font=3, adj=c(1,0),col="black")
}
if(selected[npanels] != "no plot"){
## put on graphic of random effect distribution
allfact <- all(isa_factor)
PT <- plot.type[npanels]
RE_distribution(L,M,pointscale, random_effects,
PT,nrows,ltick, stick, center,delta,dencol,boxcol,
spkcol,tickl,aspect,ipos+1,showsubticks,allfact,cexscales,lme4)
}
} ##if(lme4)
#-----------------------------------------------------
# add beta X or points scale contributions at top of graphic
#-----------------------------------------------------
tickp <- pretty(S[,],n=8)
tickp <- pretty(c(S[,],L,M),n=8)
# values of the ticks, keep in tickval
tickval <- tickp
# change output scale to points,
if(pointscale) {
tickval <- 100*(tickp -L)/(M-L)
tickval <- pretty(tickval,n=8)
tickval <- tickval[which(tickval >= 0 & tickval <=100 )]
tickp <- ((M-L)*tickval)/100 +L
}
## limit extent to graph boundaries
limits <- which(tickp > L-0.1*(M-L) & tickp < M+0.11*(M-L) )
if(length(limits) > 2){
tickp <- tickp[limits]
tickval <- tickval[limits]
}
##
sv <- sieve(tickp,tickval, 0.04*(M-L))
tickp <- unlist(sv[1])
tickval <- unlist(sv[2])
npos <- npanels+1 + make_space
ypos <- npos+0.4
segments(min(tickp),ypos,max(tickp),ypos)
segments(tickp,ypos,tickp,ypos-ltick)
text(x=tickp,y= ypos- delta,paste(signif(tickval,3)),cex=cexscales)
# subticks?
if(showsubticks) subticksf(tickp, tickp,ypos,stick)
if(pointscale) {ex <- "Points "
}
else
{
if(center){ex <- expression(italic(paste(beta,"(X-m) terms ")))}
else
{ex <- expression(italic(paste(beta,"X terms ")))}
}
text(x=L-0.27*(M-L), adj=c(0,1), ypos+1.5*delta ,ex ,cex=cexscales, font=3)
#write title heading above upper scale
##titlepos <- min(ypos + 2.5*delta,npars+2+ make_space)
titlepos <- min(ypos + 2.5*delta,npanels+2+ make_space)
text((M+L)/2,titlepos, title ,cex=1.05 ,font=4)
# =======================================================
# add Total points scale and its distribution
## use yref_pos as reference point of "Total-points-to outcome"
#========================================================
yref_pos <- make_space + 1
allfact <- all(isa_factor)
PT <- plot.type[npanels + 2]
##
Total_distribution(L,M,pointscale, tot_score,none,
PT,nrows,ltick, stick, center,delta,dencol,boxcol,
spkcol,tickl,aspect, yref_pos, showsubticks,allfact,
cexscales,lme4)
#==========================================================
Max_tot_score <- max(tot_score)
Min_tot_score <- min(tot_score)
Range <- Max_tot_score - Min_tot_score
tickvalues <- pretty(tot_score,n=6)
if(!pointscale){
tickv <- tickvalues
tickp <- (M-L)* (tickv -Min_tot_score)/Range + L
## fix positions of the scale relative to values of pretty values
## draw line to fit L-M boundary, ticks having screen coordinates
}
else
{
## require "pts" for pointscale
tickv <- tickvalues
## convert to points,
tickv <- 100*(tickv -L*nrows)/(M-L)
tickv <- pretty(tickv,n=8)
## convert back into actual total scores
pts <- ((M-L)*tickv)/100 +L*nrows
tickp <- (M-L)* (pts -Min_tot_score)/Range + L
}
##==============================================================
## add score-to-probability nomograms scales at the bottom
##==============================================================
#
pos_of_nomo <- yref_pos-1.8
pos_of_nomo <- yref_pos-0.9*make_space
# ------logistic nomogram-----------------------------------------
if(logistic | logisticer ){
time_gap <- make_space /(ntimes +1)
for(time in 1:ntimes){
Y <- yvar
intercept_temp <- intercept
if(polr_logistic){intercept_temp <- intercepts[time] -intercept
## if(center) intercept_temp <- intercept_temp -intercept
Y <- names(intercepts)[time]
pos_of_nomo <- time*time_gap
}
Logistic_scale(tot_score,intercept_temp,L,M,Range,Min_tot_score,delta,
ltick,stick,pos_of_nomo,Y,odds,showsubticks,cexscales,polr)
} ## for(time in 1:ntimes)
} ## if(logistic | logisticer
##----------probit polr nomogram----------------------
if(polr_probit){
## set up sclaes for probit polr
time_gap <- make_space /(ntimes +1)
for(time in 1:ntimes){
intercept_temp <- intercepts[time] -intercept
## if(center) intercept_temp <- intercept_temp -intercept
Y <- names(intercepts)[time]
pos_of_nomo <- time*time_gap
probit_scale(tot_score,intercept_temp,L,M,Range,Min_tot_score,delta,
ltick,stick,pos_of_nomo,Y,odds,showsubticks,cexscales)
}
} ##if(polr_probit)
# ------lm nomogram-----------------------------------------
if(lm | ols | lmer | olser){
Lm_scale(tot_score,intercept,L,M,Range,Min_tot_score,delta,
ltick,stick,pos_of_nomo,yvar,showsubticks,cexscales)
}
#
# ------poisson nomogram-----------------------------------------
if(poisson | negbin | poissoner | negbiner ){
Poisson_scale(tot_score,intercept,L,M,Range,Min_tot_score,delta,
ltick,stick,pos_of_nomo,yvar,negbin,showsubticks,cexscales)
}
# ------Cox or survreg nomogram-----------------------------------------
if(cox | survreg){
time_gap <- make_space /(ntimes +1)
strata_gap <- time_gap/(nstrata +1)
for(time in 1:ntimes){
## this establishes position of each time point nomogram
## and if strata the first stratum scale
pos_of_nomo <- time*time_gap
ftime <- tcut[time]
base <- baseS[time]
if(medsurv) Spt <- Spercent[time]
if(cox){
## use basehaz() to get hazards at X=0. Only do FIRSTRUN as it
## may be time consuming
if(FIRSTRUN & is.null(baseS) & time==1) {
h <- basehaz(reg, centered=FALSE)
if(rms & ncol(h)==3 & names(h)[3] == "strata"){
## rms strata have strata name (3rd col) as e.g. Sex="F"
## but coxph/survreg returm just "F"
## need to strip away "Sex=" of rms basehaz()
## get kernel from strat(x), find number of characters + 1 for the = sign
## NTM: this works ok for single valued strat. Not sure how it will work
## with multivalued e.g strat(age,sex). But I cannot get rms to do
## multivalued strata anyway!.
nc <- nchar(getX(vstrat)[1] ) + 1
h[,3] <- substring(h[,3],first=nc+1)
}
hmax <- h$time[nrow(h)]
## NTM: should h() be reduced taking out flat areas where hazard constant onver times??
## how?
if(medsurv){
message("survival quantiles beyond time ", signif(hmax,3), " un-estimable")
}
## ?? patch in final time point. Current last row of h is max observed time
##rowh <- nrow(h)
## h <- rbind( h, c(h$hazard[rowh],5*h$time[rowh]) )
## note: if strata() in model h has 3rd column of "strata" value.
}
if(!is.null(baseS) & cox & nstrata >1){
return(message("cox model with strata and baseS non-null not allowed"))
}
if(!medsurv) {
cxp <- Cox_scale(reg, tot_score,intercept,L,M,Range,Min_tot_score,delta,
ltick,stick,pos_of_nomo,yvar,odds,base,h,s5,betas,center,m,bm,ftime,
fail,showsubticks,cexscales,slevels,strata_levels,strata_gap,vstrat)
## time is the time cutoff in loop for(time in 1:ntimes)
## s5 the returned single values base survival for that time
##
s5[time] <- cxp$s5}
else
{cxp <- Cox_scalemedsurv(reg, tot_score,intercept,L,M,Range,Min_tot_score,delta,
ltick,stick,pos_of_nomo,yvar,odds,base,h,betas,center,m,bm,ftime,
fail,showsubticks,cexscales,slevels,strata_levels,strata_gap,Spt,vstrat)}
## strata_levels <- cxp$strata_levels
}
## survreg nomogram -------------------------------------------
if(survreg){
p <- 1/reg$scale
## TRIAL PATCHING FOR MEDIAN SURVIVAL OUTCOME
if(medsurv){
Survreg_scale_medsurv(tot_score,intercept,L,M,Range,Min_tot_score,delta,
ltick,stick,pos_of_nomo,yvar,betas,m,p,dist,showsubticks
,cexscales,slevels,strata_gap,Spt,vstrat)
}
else
{
##note: p can be multivalued if strata() specified, one parameter per strata
Survreg_scale(tot_score,intercept,L,M,Range,Min_tot_score,delta,
ltick,stick,pos_of_nomo,yvar,betas,m,ftime,fail,p,dist,odds,showsubticks
,cexscales,slevels,strata_gap,vstrat)
}
}
} ##if cox|survreg
}
##================================================
## finalise last total points table
if(FIRSTRUN & pointscale){
# make table for first [1] element of s5 and tcut
points_values[[nrows + 1]] <-
points_tables(survreg,dist,p[1],cox,s5[1],fail,tcut[1],poisson,
negbin,lm,ols,lmer,logistic,pts,intercept,npars,yvar,tickv,medsurv)
}
##======================================================
##save the "bare" plot, i.e. without any red observation data
bareplot <- recordPlot()
##========================================================
} #if(nudist)
nudist <- FALSE
##-===============================================================
## start overlay in red (default obscol) observation dots, text and lines
#=================================================================
if(person){
npanel <- 0
for (row_num in 1:(nrows)){
i <- irank[row_num]
# add in RED points of observation and value
dummys <- which(vact==i)
if(isa_factor[i]){
value_one <- which(newX[dummys] ==1)
## NTM need a patch here to retain name of value_one
## when only one single factor variable in model
## horribly messy
##
if(length(newX)==1){
if(class(newX)=="matrix" ) nms <- colnames(newX)[value_one]
if(class(newX)=="numeric") nms <- names(newX)[value_one]
names(value_one) <- nms}
levels <- unlist(xlevels[variable_names[i]])
if(length(value_one)==0){
level <- levels[1]
## clicked on reference category
addX <- 0
}
else
{
if(nointercept & firstfactor==i)
{
level <- levels[value_one]
}
else
{
level <- levels[1+value_one]}
addX <- betas[names(value_one)]
}
##write label text in bold (font=2)
plmi <- rep(c(1,-1), times=length(levels))
ilev <- which(levels==level)
plmi <- plmi[ilev]
## rather messy trying to overwrite black with red.
## do I need red text? Maybe fixed position
posn <- ipos + 1.5*delta
##or suppress altogether??
##text(addX,posn,paste(level),cex=0.7,font=2,col=obscol)
size <- 0.3
} ##if(isa_factor
else # of if(isa_factor
{
addX <- sum( betas[dummys]*(newX[dummys]-m[dummys]) )
}
## add RED dot point at observation value
npos <- npanels+1 + make_space
##
## if( !isinteraction_row[i]){
npanel <- npanel +1
ipos <- npanel+0.5 +make_space
cex <- 1
col <- obscol
##
if(isinteraction_row[i]){
## interaction red cross
##pch=4 is a multiplication X sign
points(addX,ipos, col=obscol,cex=cex, pch=4, lwd=2, bg=obscol)
## make dot a little smaller if many rows
}
else
{
## if(nrows>10){cex <- 0.8}
## output red dot
points(addX,ipos, col=obscol,cex=cex, pch=21, bg=obscol)
##
}
## also add red point to top of beta X points scale
## join by vertical dropline??
if(droplines){
segments(addX,ipos+0.02,addX,npos+0.38, col=obscol,lty="dotted")
}
if(pointscale){
sumcheck <- sumcheck + 100*( addX-L )/( M-L )
}
else
{ sumcheck <- sumcheck +addX}
cex <- 1
if(isinteraction_row[i]){
##add a X in red
points(addX,npos+0.4, col=obscol,cex=cex, pch=4,lwd=2, bg=obscol)}
else
{
##
points(addX,npos+0.4, col=obscol,cex=cex, pch=21, bg=obscol)}
} ## end of adding in RED points loop over row_num
##--------------------------------------------------------------------
if(lme4){
ipos <- npanel+1 +0.5 +make_space
## get the random effects using predict() with random.only
## for observation
RE <- predict(reg,type="link",random.only=TRUE, newdata=observation)
points(RE,ipos, col=obscol,cex=cex, pch=8, bg=obscol)
if(droplines){
segments(RE,ipos+0.02,RE,npos+0.38, col="pink")
}
## another point on top
points(RE,npos+0.4, col=obscol,cex=cex, pch=8, bg=obscol)
## output in RED random effects values of observation
ul = unlist(strsplit(randomv, split = ","))
reffs <- NULL
for(i in 1:length(ul)){
O <- observation[,ul[i]]
if(is.numeric(O)){
reffs <- c(reffs, paste0(ul[i],"=",signif(O,3) ))
}
else
{reffs <- c(reffs, paste0(ul[i],"=",O ))
}
}
reffs <- paste(reffs,collapse=" ")
## output in RED of random effect
text(x=L-0.27*(M-L), adj=c(0,1) ,y= ipos+2*delta,
reffs,cex=cexvars, font=3, col=obscol)
} ##if(lme4)
##-------------------------------------------------------------------
npos <- 1
if(length(newX) != length(betas)){
## this may be a problem with NAs in beta.
## Pick out only newX elements referring to m
newX <- newX[names(m)]
##message("Data inconsistency . Possible beta NAs, attempt to fix")
}
totS <- sum((newX-m) *betas, na.rm=TRUE)
if(lme4){
## use predict() instead to get total score instead
## invokes predict.merMod
## get the linear score (type="link" with random.only=FALSE
## random.only makes prediction using only random effects)
totS <- predict(reg,type="link", newdata=observation) - intercept
}
totSpos <- (M-L)* (totS -Min_tot_score)/Range + L
## filled diamond pch=23
sz <- 1.3
if(npars > 10) {sz <- 0.8}
##position of total score scale (same as in Total_distribution() function
Total_pos <- yref_pos - 1
## cant figure out exact positions of plot edge? Extends beyond
## L-0.3*(M-L) M+0.1*(M-L)
## but how much and why? Trial and error gives 0.36 and 0.16 about right
# print(totSpos)
# print((totSpos-M)/(M-L))
# print((L-totSpos)/(M-L))
# if(totSpos < L-0.36*(M-L) | totSpos > M+0.16*(M-L))
# {message("outcome arrow off screen")}
## drawing pointer red arrow scale in red
## aaaah can get from par()$user!
if(totSpos < par()$usr[1] | totSpos > par()$usr[2])
{message("outcome arrow off screen")}
points(totSpos,Total_pos, col=obscol,cex=sz, pch=23, bg=obscol)
## draw downward arrowhead pch=25 from total scale, arrow cursor
## total in red obscol
exval <- paste(signif(totS,3),sep="")
if(pointscale) {
## need to frig around to get the outputted points value
## using the checksum variable, which is increemtal sum of
## points of each variable. Messy!
exval <- paste(signif(sumcheck,3), sep="")
}
## output total score in red - exval - in red of red diamond on the total score axis
text(totSpos,Total_pos + delta, exval,cex=cexscales ,font=3,col=obscol )
## add in red output probabilities/means
## loop over ntime - the number of scales. ntimes=1 except
## for survival models where different failtimes may have
## been specified.
if(cox | survreg){
for(time in 1:ntimes){
ftime <- tcut[time]
pos_of_nomo <- time*time_gap ## value position just above the pos-of-nomo-scale
redscore_ypos <- pos_of_nomo + 0.75*delta
##=============================================
## for Cox model
if(cox){
if(!medsurv){
ftime <- tcut[time]
stratum <- 1
if(nstrata>1){
## get stratum number of the data in "observation"
##
stratum <- which_stratum(strata_levels, vstrat, observation)
}
## need to identify which statrum "observation" is in.
if(is.null(baseS) ){
##get baseline survival for this stratum
surv <- baseSt(tcut[time],h,stratum, betas,m)
}
else
{
if(center){
surv <- baseS[time] ^exp(sum(betas*m))}
else
{surv <- baseS[time]}
}
pred_mean <- (surv^exp(totS))
if(odds){
pred_mean <- pred_mean /(1-pred_mean)
if(fail){pred_mean <- 1/pred_mean}
}
else
{
##S(t)^exp(BX)=1-P
if(fail) { pred_mean <- 1-pred_mean }
}
## pos=2 to the left of, adjusted for strata scales if present
## yposx <- redscore_ypos + (stratum-1)*(1/(nstrata+1)) -ltick
yposx <- pos_of_nomo + (stratum-1)*strata_gap
## output RED value
text(totSpos,yposx + 0.1*strata_gap + delta, paste(signif(pred_mean,3)),pos=2,
cex=cexscales,font=3,col=obscol)
##-----------------------------------------------------------------
## For Cox model must coerce observation have specified failtime
## in order that the predict() call with newdata=observation
## in pcheck() correctily fixes estimated probabilitites to tcut[time].
observation[yvar] <- tcut[time]
##----------------------------------------------------------------------
## cox arrowhead in red for Cox and vertical pointer
segments(totSpos,Total_pos, totSpos, yposx, col=obscol,cex=1.)
points(totSpos, yposx +0.1*strata_gap, col=obscol,cex=sz, pch=25, bg=obscol)
## put the stratum value of the observation in red
## this as at
if(nstrata > 1){
##text(L -0.3*(M-L) ,yposx+0.5*delta,adj=0,paste(stratum),font=4,cex=cexscales, col=obscol)
}
## cox
if( confI | predI ) {
warn <- pcheck(new_obs,reg,confI,predI, observation,
intercept, lm,ols,cox,logistic,poisson,negbin,survreg,pred_mean,
odds,L,M,Range, Min_tot_score,delta,yposx,obscol,dist,tcut[time],
fail,p,surv,bm)
if(warn & !is.null(base)){
message("non null baseS parameter likely reason")
}
}
} ##if(!medsurv)
else
{
Sprop <- Spercent[time]/100
stratum <- 1
if(nstrata>1){
stratum <- which_stratum(strata_levels, vstrat,observation)
}
H <- 1/( exp(totS) / (-log(Sprop) ) )
#
tS <- baseht(H,h,stratum, betas,m)
##
yposx <- pos_of_nomo + (stratum-1)*strata_gap
## output RED value
text(totSpos,yposx+0.1*strata_gap + delta, paste0(tS),pos=2,cex=cexscales,font=3,col=obscol )
## cox medsurv arrowhead in red
segments(totSpos,Total_pos, totSpos, yposx, col=obscol,cex=1.)
points(totSpos, yposx+0.1*strata_gap , col=obscol,cex=sz, pch=25, bg=obscol)
## cox
## overwrite statrum number in red survreg . Supress??
# if(nstrata>1){
# text(L -0.3*(M-L) ,yposx+0.5*delta,adj=0,paste(stratum),font=4,cex=cexscales, col=obscol)
# }
} ##else if(!medsurv)
}##if(cox)
##--------------------------------------------------------
if(survreg){
## NTM found this line?? It is doing nothing?
## tcut[time] <- tcut[time]
px <- p
stratum <- 1
if(!is.null(strata_levels)){
stratum <- which_stratum(strata_levels, vstrat, observation)
px <- p[stratum]}
if(!medsurv){
if(dist=="loglogistic"){
pred_mean <- 1/ (1+(exp(-totS-intercept)*ftime)^px) }
if(dist=="lognormal"){
pred_mean <- 1- pnorm(px*log( exp(-totS-intercept)*tcut[time]) )}
if(dist=="gaussian"){
pred_mean <- 1- pnorm(px*(-totS-intercept + tcut[time]) )}
if(dist=="weibull"| dist=="exponential"){
lam <- exp(-totS-intercept)
pred_mean <- exp( -(lam*tcut[time])^px )
}
if(fail) { pred_mean <- 1-pred_mean }
if(odds) {pred_mean <- pred_mean/(1-pred_mean)}
yposx <- pos_of_nomo +(stratum-1)*strata_gap
text(totSpos,yposx+delta, paste(signif(pred_mean,3)),pos=2,cex=0.9,font=3,col=obscol )
##survreg arrowhead
segments(totSpos,Total_pos, totSpos, yposx, col=obscol,cex=1.)
points(totSpos, yposx + 2*ltick , col=obscol,cex=sz, pch=25, bg=obscol)
# ## overwrite statrum number in red survreg
# if(nstrata>1){
# text(L -0.3*(M-L) ,yposx+0.5*delta,adj=0,paste(stratum),font=4,cex=cexscales, col=obscol)
# }
if( confI | predI ) pcheck(new_obs,reg,confI,predI, observation,intercept,
lm,ols,cox,logistic,poisson,negbin,survreg,pred_mean,odds,
L,M,Range, Min_tot_score,delta, yposx, obscol,
dist,tcut[time],fail,px,s5,bm)
}
else ##if(!medsurv
{
Sprop <- Spercent[time]/100
if(dist=="loglogistic"){
lambda <- exp(-totS-intercept)
pred_T <- ( ((1-Sprop)/Sprop) ^ (1/px) )/lambda
}
if(dist=="lognormal"){
pred_T <- exp( qnorm(1-Sprop)/px + (totS+intercept) )}
if(dist=="gaussian"){
pred_T <- qnorm(1-Sprop)/px + totS+intercept}
if(dist=="weibull"| dist=="exponential"){
lambda <- exp(-totS-intercept)
pred_T <- ( (-log(Sprop))^(1/px) ) / lambda
}
## medsurv=TRUE
##survreg arrowhead for medsurv
yposx <- pos_of_nomo +(stratum-1)*strata_gap
segments(totSpos,Total_pos, totSpos, yposx, col=obscol,cex=1.)
points(totSpos, yposx + 2*ltick , col=obscol,cex=sz, pch=25, bg=obscol)
text(totSpos,yposx+delta, paste(signif(pred_T,3)),pos=2,cex=0.9,font=3,col=obscol )
## overwrite stratum number in red, suppress??
## if(nstrata>1){
##text(L -0.3*(M-L) ,yposx+0.5*delta,adj=0,paste(stratum),font=4,cex=cexscales, col=obscol)
##}
}
} ##if(survreg)
} #for(time i 1:ntimes)
} ##if(cox | survreg
##================================================
if(logistic | polr_logistic){
time_gap <- make_space /(ntimes +1)
if(polr) totS <- -totS
for(time in 1:ntimes){
intercept_temp <- intercept
if(polr) {intercept_temp <- intercepts[time] -intercept
##if(center)intercept_temp <- intercept_temp - intercept
}
if(odds){
pred_mean <- exp(totS+intercept_temp)
}
else
{
pred_mean <- exp(totS+intercept_temp)/(1 + exp(totS+intercept_temp))
}
## browser()
##text(totSpos,redscore_ypos, paste(signif(pred_mean,3)),pos=2,cex=0.9,font=3,col=obscol )
if(polr)pos_of_nomo <- time*time_gap
text(totSpos,pos_of_nomo+delta, paste(signif(pred_mean,3)),pos=2,cex=0.9,font=3,col=obscol )
## logistic/polr arrowhead
segments(totSpos,Total_pos, totSpos, pos_of_nomo, col=obscol,cex=1.)
points(totSpos, pos_of_nomo + delta , col=obscol,cex=sz, pch=25, bg=obscol)
if(confI | predI ) pcheck(new_obs,reg,confI , predI , observation,intercept,
lm,ols,cox,logistic,poisson,negbin,survreg,pred_mean,odds,
L,M,Range, Min_tot_score,delta,pos_of_nomo, obscol,
dist,ftime,fail,p,NA,bm)
} #for(time in
} ##if(logistic polr)
#---polr probit observation----------------------------------------------
if( polr_probit){
time_gap <- make_space /(ntimes +1)
## reverse scale required for polr
totS <- -totS
for(time in 1:ntimes){
intercept_temp <- intercepts[time] -intercept
##if(center)intercept_temp <- intercept_temp - intercept
P <- pnorm(totS+intercept_temp)
pred_mean <- P
if(odds){
pred_mean <- P/(1-P)
}
## browser()
##text(totSpos,redscore_ypos, paste(signif(pred_mean,3)),pos=2,cex=0.9,font=3,col=obscol )
pos_of_nomo <- time*time_gap
text(totSpos,pos_of_nomo+delta, paste(signif(pred_mean,3)),pos=2,cex=0.9,font=3,col=obscol )
## logistic/polr arrowhead
segments(totSpos,Total_pos, totSpos, pos_of_nomo, col=obscol,cex=1.)
points(totSpos, pos_of_nomo + delta , col=obscol,cex=sz, pch=25, bg=obscol)
} #for(time in
} ##if(logistic polr)
##-------------------------------------------
if(logisticer){
## lme4, use predict()
pred_mean <- predict(reg,newdata=observation,type="response")
if(odds) {pred_mean < pred_mean/(1-pred_mean)}
text(totSpos,pos_of_nomo+delta, paste(signif(pred_mean,3)),pos=2,cex=0.9,font=3,col=obscol )
## logisticer arrowhead
segments(totSpos,Total_pos, totSpos, pos_of_nomo, col=obscol,cex=1.)
points(totSpos, pos_of_nomo + delta , col=obscol,cex=sz, pch=25, bg=obscol)
if(confI | predI ) pcheck(new_obs,reg,confI , predI , observation,intercept,
lm,ols,cox,logistic,poisson,negbin,survreg,pred_mean,odds,
L,M,Range, Min_tot_score,delta,pos_of_nomo,obscol,
dist,ftime,fail,p,NA,bm)
}
##------------------------------------------
if(poisson | negbin | poissoner | negbiner){
if(poisson | negbin ){
pred_mean <- exp(totS+intercept)
}
else
{pred_mean <- predict(reg,newdata=observation,type="response")
}
text(totSpos,pos_of_nomo + delta, paste(signif(pred_mean,3)),pos=2,cex=0.9,font=3,col=obscol )
## poisson arrowhead
segments(totSpos,Total_pos, totSpos, pos_of_nomo, col=obscol,cex=1.)
points(totSpos, pos_of_nomo + 2*ltick , col=obscol,cex=sz, pch=25, bg=obscol)
if(confI | predI ) pcheck(new_obs,reg,confI , predI , observation,intercept,
lm,ols,cox,logistic,poisson,negbin,survreg,pred_mean,odds,
L,M,Range, Min_tot_score,delta,pos_of_nomo,obscol,
dist,ftime,fail,p,NA,bm)
}
if(lm | ols | lmer | olser ){
if(lm | ols){
pred_mean <- totS+intercept
}
else
{
## for lmer | olser uses predict.merMod
pred_mean <- predict(reg,newdata=observation,type="response")
}
text(totSpos,pos_of_nomo+ delta, paste(signif(pred_mean,4)),pos=2,cex=0.9,font=3,col=obscol )
## lm arrowhead
segments(totSpos,Total_pos, totSpos, pos_of_nomo, col=obscol,cex=1.)
points(totSpos, pos_of_nomo + 2*ltick , col=obscol,cex=sz, pch=25, bg=obscol)
if( confI | predI ) pcheck(new_obs,reg,confI , predI , observation,intercept,
lm,ols,cox,logistic,poisson,negbin,survreg,pred_mean,odds,
L,M,Range, Min_tot_score,delta,pos_of_nomo,obscol,
dist,ftime,fail,p,NA,bm)
}
if(lmer){
pred_mean <- predict(reg,newdata=observation,type="response")
text(totSpos,pos_of_nomo + delta, paste(signif(pred_mean,4)),pos=2,cex=0.9,font=3,col=obscol )
}
## } ##for(time in 1:ntimes)
## else
### {
} ##if(person)
## end of overlay red (obscol) observation section
##==========================================================
if(FIRSTRUN & clickable) {message('Click on graphic expected. To quit click Esc')}
##=========================================================
##function clicked_action to return action of a graphic mouse click
##==========================================================
esc <- TRUE
if(clickable) {
##
pre_splineplot <- splineplot
act <- clicked_action(npanels,row_of_panel,nrows,L,M,S,isadummy,
isinteraction_row, isa_factor,isa_pseudo,xlevels,row_names,vact,
vact_names, nointercept,firstfactor,betas,m,person,nudist,irank,
make_space,variable_names,lme4,selected,plot.type,plot.typef,
haslikefactors,splineplot,X,rawdata, kernel_varname,raw_variable_names)
##---------------------------------------------------------------
## escaping clickable graphic
esc <- act[[1]]
if(esc){
## add title on Esc
text((L+M)/2,titlepos, title ,cex=1.05 ,font=4)
if(points) { output <- points_values }
else
{ output <- paste("note: points tables not constructed unless points=TRUE ")
}
return(output)
} #if(esc)
##---------------------------------------------------------------
nudist <- act[[2]]
Rvar <- act[[3]]
## keep copy of Rvar
##RvarX <- Rvar
Rval <- act[[4]]
##RvalX <- Rval
selected <- act[[5]]
plot.type <- act[[6]]
plot.typef <- act[[7]]
splineplot <- act[[8]]
##
new_obs <- FALSE
update <- FALSE
## allow update of observed (red-dot) points subject to condition
## allowed clickable values
## if cannot be updated (and havent clicked to toggle splineplot)
#
# if(is.na(Rval[2] ) & splineplot==pre_splineplot & !nudist){
#
# message("Unable to update clicked value")}
## temporary overide this
if((person & !is.na(Rval) & !is.nan(Rval) & !is.na(inverse(Rvar,FALSE)[2] ) ) ){
## update "observation". Need to invert any functions
update <- TRUE
#########################################
gX <- getX(Rvar)
inv <- inverse(Rvar,TRUE)
##browser()
if(gX[2]!=""){
## elements of inverse: [1] is raw variable name
## [2] is the inverted expression in terms of "x"
if(!is.na(inv[2])){
##
## has to ""see" x in the calculation of the inverse
## so inv[2] should have x in it
assign("x",Rval)
##RvalX <- Rval
Rval <- eval(parse(text=inv[2]) )
Rvar <- inv[1]
}
}
###########################################
if(is.na(isa_factor[Rvar]) | class(Rval)=="logical"){
observation[Rvar] <- Rval
}
else
{
if(isa_factor[Rvar]){
## ensure observation is made into factor,
## with levels as in xlevels()
## is it a cut()??
if(getX(Rvar,outer=TRUE)[2]=="cut()"){
XX <- factor(Rval, levels=unlist(xlevels[Rvar]))
## underlying variable of cut() is inv[1]
Rvar <- inv[1]
## how to assign a value for level XX??
## has form (zz,ww] . Arbirtary assign ww?
## whittle down with strsplit()
XX <- as.character(XX)
XX <- unlist(strsplit(XX,","))[2]
XX <- as.numeric(unlist(strsplit(XX,"]")))
## Note: this assumes that label=FALSE has been used in cut()
## otherwise label integer values, it wont work.
observation[Rvar] <- XX
}
else
{
observation[Rvar] <- factor(Rval, levels=unlist(xlevels[Rvar]))
} ##if(getX(Rvar,outer=TRUE)[2]=="cut()")
}
else
{observation[Rvar] <- Rval }
}
#}
if(is.numeric(Rval)) Rval <- signif(Rval,3)
new_obs <- TRUE
prev_adddata <- adddata
## update the adddata array, of possibly transformed variables
##print(observation)
ao <- added.obs( reg,observation,rawdata_original,actualdata ,pseud,FORMULA)
adddata <- ao[[1]]
## possibility that clicked value has changed if "snapped" to observed sline datum
observation <- ao[[2]]
##browser()
## need double [[]] here!!
if(is.numeric( observation[[Rvar]] ) ) Rval <- signif( observation[Rvar], 3)
## no change is a possibility
new_obs <- !identical(adddata,prev_adddata)
if(new_obs ){
note_inv <- ""
clickedon <- paste("Clicked on ",paste0(Rvar,"=",Rval) )
if( !identical(clickedon,clickedonP)) message(clickedon)
clickedonP <- clickedon
} ##if(new_obs)
if(!update ) {message(paste("Non-clickable. Cannot update ", Rvar) ) }
} ##if(person & !is.na(Rval) & !is.nan(Rval))
##--------------------------------------------
} #if(clickable)
## Escape to close the program.
#
if(esc){
text((L+M)/2,titlepos, title ,cex=1.05 ,font=4)
if(points) { output <- points_values}
else
{output <- paste("note: points tables not constructed unless points=TRUE ")
}
return(output)
} #if(esc)
FIRSTRUN <- FALSE
} ##WHILE(TRUE)
} ## END OF MAIN reggplot function
##############################################################################
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.