Detection of outliers in time courses of an experiment in PhenoArch greenhouse. In this vignette, we use a toy data set of the phisStatR library (anonymized real data set).
The example is conducted on plantHeight parameter, we modelise the plantHeight taking into account the temporality (thermal time) and the spatiality (lattice structure) with a bayesian spatio-temporal ANOVA model [3].
To detect outlier points, we retrieve the standardised residuals computed by the model and a point is an outlier if abs(res) > threshold (here threshold==4). the user will be able to change this threshold.
We produced the output of the model as well as some graphics.
The input dataset must contain some predefined columns, have a look to the structure of mydata:
library(lubridate) library(dplyr) library(phisStatR) myReport<-substr(now(),1,10) mydata<-plant1 str(mydata)
A bayesian approach to model these data where the spatio-temporal structure is modelled via sets of autocorrelated random effets. Conditional autoregressive (CAR) priors and spatio-temporal extensions thereof are typically assigned to these random effects to capture the autocorrelation, which are special cases of a Gaussian Markov random Filed (GMRF) [3].
ST.CARanova() decomposes the spatio-temporal variation into 3 components:
we can add others factors (here the scenario (quali) ...)
The spatio-temporal auto-correlation is modelled by a common set of spatial random effect and a common set of temporal random effects and both are modelled by the CAR prior (Conditional AutoRegressive).
Description of the results's table:
Model fit criteria
In model's comparison, the best fitting model is the one that minimises the DIC and WAIC but maximises the LMPL.
As the bayesian model can take a while, in this example, we initialise burnin=10 and n.sample=110 - not enough in real analysis!
# We suppress observations with missing data in time variable (here thermalTime) mydata<-filter(mydata,!is.na(mydata$thermalTime)) model<-fitCARBayesST(datain=mydata,xvar="thermalTime",trait="plantHeight",k=2, graphDist=TRUE,burnin=10,n.sample=110, formulaModel=as.formula(plantHeight~scenario+genotypeAlias),typeModel="anova",verbose=TRUE)
# print the result of the bayesian modelling printCARBayesST(modelin=model[[1]])
outtmp<-outlierCARBayesST(modelin=model[[1]],datain=model[[2]],threshold=4,trait="plantHeight")
The output report can be over-sized (more than 1Mb), for size of sub-directories in packages purposes, I choose to represent only the first genotypes...
mygeno<-as.character(unique(model[[2]][,"genotypeAlias"])) mygeno<-mygeno[1:6] for (i in seq(1,length(mygeno),by=15)){ myvec<-seq(i,i+14,1) plotCARBayesST(datain=model[[2]],outlierin=outtmp,myselect=myvec,trait="plantHeight",xvar="thermalTime") }
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.