Objective

Detection of outlier points in time courses of an experiment in PhenoArch greenhouse. Use locfit smoothing function from the locfit library [2]. For each time course of a dataset, a locfit smoothin is applied, predictive confidence interval calculated (Y$_$hat +/- threshold*Y$_$hat$_$se). Points are declared outlier if outside this confidence interval. the user choose the threshold.

flagPointLocfit: detection of outlier points in time courses * @param datain input dataframe of parameters * @param trait character, parameter of interest (ex: plantHeight) * @param xvar character, time variable (ex: thermalTime) * @param loopID character, ID on which to make the loop * @param locfit.h numeric, the constant component of the smoothing parameter * @param threshold numeric, threshold to detect on the prediction interval @return a list: * 1 prediction and detection of outlier on observed data * 2 prediction on regular abscissa data * 3 time courses with not enough point to be evaluated
  library(lubridate)
  library(dplyr)
  library(locfit)
  library(phisStatR)

Import of data

In this vignette, we use a toy data set of the phisStatR library (anonymized real data set).

  mydata<-plant1
  str(mydata)

  mydata<-filter(mydata,!is.na(mydata$thermalTime))

Outlier points detection

I have chosen a smoothing parameter of 30 and a threshold of 8 to detect the outlier points.

  resu1<-flagPointLocfit(datain=mydata,trait="biovolume",xvar="thermalTime",loopID="Ref",
                         locfit.h=30,threshold=8)

The output report can be over-sized (more than 1Mb), for size of sub-directories in packages purposes, I choose to represent only the first genotypes...

  myindex<-as.character(unique(resu1[[1]][,"Ref"]))
  myindex<-myindex[1:30]
  for (i in seq(1,length(myindex),by=15)){ 
      myvec<-myindex[seq(i,i+14,1)]
      plotFlagPoint(smoothin=resu1[[1]],loopID="Ref",myselect=myvec)
  }
  filter(resu1[[1]],outlier==1)

Times courses with not enough points to be evaluated

  # Please change the Ref column by the one in your dataframe
  if(is.null(resu1[[3]])){
    print("All the time courses have more than 4 points.")
  } else {
    ggplot(data=resu1[[3]],aes(x=x,y=y)) +
    geom_point() + facet_wrap(~Ref)
  }

Session info

  sessionInfo()

References

  1. R Development Core Team (2015). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL http://www.R-project.org.
  2. Catherine Loader (2013). locfit: Local Regression, Likelihood and Density Estimation.. R package version 1.5-9.1. https://CRAN.R-project.org/package=locfit


sanchezi/phisStatR documentation built on Nov. 14, 2019, 7:10 p.m.