Objective

Detection of outliers in time courses of an experiment in PhenoArch greenhouse. In this vignette, we use a toy data set of the phisStatR library (anonymized real data set).

Please, have a look on the names of the columns of the input data set. The function is not generic and needs specific columns names in the input data set: * Line: the x-coordinate in the lattice (numeric) * Position: the y-coordinate in the lattice (numeric) * Ref: a unique identifiant by pot, repetition and alias * scenario: the scenario applied to the experiment * genotypeAlias: the used genotypes in the experiment

The example is conducted on plantHeight parameter, we modelise the plantHeight taking into account the temporality (thermal time) and the spatiality (lattice structure) with a bayesian spatio-temporal ANOVA model [3].

To detect outlier points, we retrieve the standardised residuals computed by the model and a point is an outlier if abs(res) > threshold (here threshold==4). the user will be able to change this threshold.

We produced the output of the model as well as some graphics.

Import of data

The input dataset must contain some predefined columns, have a look to the structure of mydata:

  library(lubridate)
  library(dplyr)
  library(phisStatR)

  myReport<-substr(now(),1,10)

  mydata<-plant1
  str(mydata)

CARBayesST library

A bayesian approach to model these data where the spatio-temporal structure is modelled via sets of autocorrelated random effets. Conditional autoregressive (CAR) priors and spatio-temporal extensions thereof are typically assigned to these random effects to capture the autocorrelation, which are special cases of a Gaussian Markov random Filed (GMRF) [3].

ST.CARanova() decomposes the spatio-temporal variation into 3 components:

we can add others factors (here the scenario (quali) ...)

The spatio-temporal auto-correlation is modelled by a common set of spatial random effect and a common set of temporal random effects and both are modelled by the CAR prior (Conditional AutoRegressive).

Description of the results's table:

Model fit criteria

In model's comparison, the best fitting model is the one that minimises the DIC and WAIC but maximises the LMPL.

As the bayesian model can take a while, in this example, we initialise burnin=10 and n.sample=110 - not enough in real analysis!

  # We suppress observations with missing data in time variable (here thermalTime)
  mydata<-filter(mydata,!is.na(mydata$thermalTime))
  model<-fitCARBayesST(datain=mydata,xvar="thermalTime",trait="plantHeight",k=2,
          graphDist=TRUE,burnin=10,n.sample=110,
          formulaModel=as.formula(plantHeight~scenario+genotypeAlias),typeModel="anova",verbose=TRUE)
  # print the result of the bayesian modelling
  printCARBayesST(modelin=model[[1]])
  outtmp<-outlierCARBayesST(modelin=model[[1]],datain=model[[2]],threshold=4,trait="plantHeight")

Outlier detection

The output report can be over-sized (more than 1Mb), for size of sub-directories in packages purposes, I choose to represent only the first genotypes...

  mygeno<-as.character(unique(model[[2]][,"genotypeAlias"]))
  mygeno<-mygeno[1:6]
  for (i in seq(1,length(mygeno),by=15)){ 
      myvec<-seq(i,i+14,1)
      plotCARBayesST(datain=model[[2]],outlierin=outtmp,myselect=myvec,trait="plantHeight",xvar="thermalTime")
  }

Session info

  sessionInfo()

References

  1. R Development Core Team (2015). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL http://www.R-project.org.
  2. Roger S. Bivand, Edzer Pebesma, Virgilio Gomez-Rubio, 2013. Applied spatial data analysis with R, Second edition. Springer, NY. http://www.asdar-book.org/
  3. Duncan Lee, Alastair Rushworth and Gary Napier (2017). CARBayesST: Spatio-Temporal Generalised Linear Mixed Models for Areal Unit Data. R package version 2.5. https://CRAN.R-project.org/package=CARBayesST


sanchezi/phisStatR documentation built on Nov. 14, 2019, 7:10 p.m.