gjamPredict: Predict gjam data

View source: R/gjamPredict.r

gjamPredictR Documentation

Predict gjam data

Description

Predicts data from a gjam object, including conditional and out-of-sample prediction.

Usage

  gjamPredict(output, newdata = NULL, y2plot = NULL, ylim = NULL, 
              FULL = FALSE)

Arguments

output

object of class "gjam".

newdata

a list of data for prediction, see Details.

y2plot

character vector of columns in output$y to plot.

ylim

vector of lower and upper bounds for prediction plot

FULL

will return full chains for predictions as output$ychains

Details

If newdata is not specified, the response is predicted from xdata as an in-sample prediction. If newdata is specified, prediction is either conditional or out-of-sample.

Conditional prediction on a new set of y values is done if newdata includes the matrix ycondData, which holds columns to condition on. ycondData must be a matrix and have column names matching those in y that it will replace. ycondData must have at least one column, but fewer than ncol(y) columns. Columns not included in ycondData will be predicted conditionally. Note that conditional prediction can be erratic if the observations on which the prediction is conditioned are unlikely given the model.

Alternatively, the list newdata can include a new version of xdata for out-of-sample prediction. The version of xdata passed in newdata has the columns with the same names and variable types as xdata passed to gjam. Note that factor levels must also match those included when fitting the model. All columns in y will be predicted out-of-sample.

For count composition data the effort (total count) is 1000.

Because there is no out-of-sample effort for 'CC' data, values are predicted on the [0, 1] scale.

See examples below.

Value

x

design matrix.

sdList

list of predictive means and standard errors includes yMu, yPe (predictive mean, SE), wMu, wSe (mean latent states and SEs)

piList

predictive intervals, only generated if length(y) < 10000, includes yLo, yHi (0.025, 0.975) prediction interval, wLo, wHi (0.025, 0.975) for latent states

prPresent

n x S matrix of probabilities of presence

ematrix

effort

ychains

full prediction chains if FULL = T

Author(s)

James S Clark, jimclark@duke.edu

References

Clark, J.S., D. Nemergut, B. Seyednasrollah, P. Turner, and S. Zhang. 2016. Generalized joint attribute modeling for biodiversity analysis: Median-zero, multivariate, multifarious data. Ecological Monographs, 87, 34-56.

See Also

gjamSimData simulates data

A more detailed vignette is can be obtained with:

browseVignettes('gjam')

web site 'http://sites.nicholas.duke.edu/clarklab/code/'.

Examples

## Not run: 
S   <- 10
f   <- gjamSimData(n = 200, S = S, Q = 3, typeNames = 'CC') 
ml  <- list(ng = 1500, burnin = 50, typeNames = f$typeNames, holdoutN = 10)
out <- gjam(f$formula, f$xdata, f$ydata, modelList = ml)

# predict data
cols <- c('#018571', '#a6611a')
par(mfrow=c(1,3),bty='n')             
gjamPredict(out, y2plot = colnames(f$ydata)) # predict the data in-sample
title('full sample')

# out-of-sample prediction
xdata     <- f$xdata[1:20,]
xdata[,3] <- mean(f$xdata[,3])     # mean for x[,3]
xdata[,2] <- seq(-2,2,length=20)   # gradient x[,2]
newdata   <- list(xdata = xdata, nsim = 50 )
p1 <- gjamPredict( output = out, newdata = newdata)

# plus/minus 1 prediction SE, default effort = 1000
x2   <- p1$x[,2]
ylim <- c(0, max(p1$sdList$yMu[,1] + p1$sdList$yPe[,1]))
plot(x2, p1$sdList$yMu[,1],type='l',lwd=2, ylim=ylim, xlab='x2',
     ylab = 'Predicted', col = cols[1])
lines(x2, p1$sdList$yMu[,1] + p1$sdList$yPe[,1], lty=2, col = cols[1])
lines(x2, p1$sdList$yMu[,1] - p1$sdList$yPe[,1], lty=2, col = cols[1])

# upper 0.95 prediction error
lines(x2, p1$piList$yLo[,1], lty=3, col = cols[1])
lines(x2, p1$piList$yHi[,1], lty=3, col = cols[1])
title('SE and prediction, Sp 1')

# conditional prediction, assume first species is absent
ydataCond <- out$inputs$y[,1,drop=FALSE]*0                 # set first column to zero
newdata   <- list(ydataCond = ydataCond, nsim=50)
p0        <- gjamPredict(output = out, newdata = newdata)

ydataCond <- ydataCond + 10                                # first column is 10
newdata   <- list(ydataCond = ydataCond, nsim=50)
p1        <- gjamPredict(output = out, newdata = newdata)

s    <- 4         # species chosen at random to compare
ylim <- range( p0$sdList$yMu[,s], p1$sdList$yMu[,s] )
plot(out$inputs$y[,s],p0$sdList$yMu[,s], cex=.2, col=cols[1],
     xlab = 'Observed', ylab = 'Predicted', ylim = ylim)
abline(0,1,lty=2)
points(out$inputs$y[,s],p1$sdList$yMu[,s], cex=.2, col=cols[2])
title('Cond. on 1st Sp')
legend( 'topleft', c('first species absent', 'first species = 10'), 
        text.col = cols, bty = 'n')

# conditional, out-of-sample
n   <- 1000
S   <- 10
f   <- gjamSimData(n = n, S = S, Q = 3, typeNames = 'CA') 

holdOuts <- sort( sample(n, 50) )

xdata <- f$xdata[-holdOuts,] # fitted data
ydata <- f$ydata[-holdOuts,]

xx <- f$xdata[holdOuts,]     # use for prediction
yy <- f$ydata[holdOuts,]

ml  <- list(ng = 2000, burnin = 500, typeNames = f$typeNames)   # fit the non-holdouts
out <- gjam(f$formula, xdata, ydata, modelList = ml)

cdex <- sample(S, 4)        # condition on 4 species
ndex <- c(1:S)[-cdex]       # conditionally predict others

newdata <- list(xdata = xx, ydataCond = yy[,cdex], nsim = 200) # conditionally predict out-of-sample
p2      <- gjamPredict(output = out, newdata = newdata)

par(bty='n', mfrow=c(1,1))
plot( as.matrix(yy[,ndex]), p2$sdList$yMu[,ndex], 
      xlab = 'Observed', ylab = 'Predicted', cex=.3, col = cols[1])
abline(0,1,lty=2)
title('RMSPE')
mspeC <- sqrt( mean(  (as.matrix(yy[,ndex]) - p2$sdList$yMu[,ndex])^2 ) )

#predict unconditionally, out-of-sample
newdata   <- list(xdata = xx, nsim = 200 ) 
p1 <- gjamPredict(out, newdata = newdata)

points( as.matrix(yy[,ndex]), p1$sdList$yMu[,ndex], col=cols[2], cex = .3)
mspeU <- sqrt( mean(  (as.matrix(yy[,ndex]) - p1$sdList$yMu[,ndex])^2 ) )

e1 <- paste( 'cond, out-of-sample =', round(mspeC, 2) )
e2 <- paste( 'uncond, out-of-sample =', round(mspeU, 2) )

legend('topleft', c(e1, e2), text.col = cols, bty = 'n')

## End(Not run)

gjam documentation built on May 24, 2022, 1:06 a.m.