library(dplyr) library(tidyr) library(openSilexStatR) library(ggplot2)
This vignette deals with the detection of outlier plants in a lattice experiment using a spatial model using splines (SpATS library) [3]. Here the following steps of this procedure developed for Maize experiment but easily adaptable to others species:
We so have a dataset with one row for each plant in the experiment containing the three phenotypes: biomass24, PH24 and Phy.
On the residuals, we can detect outlier plant(s) with a combined physiological criterion applying the following rules:
raw procedure: at a threshold=0.95 (can be modified)
Use the FuncDetectOutlierPlanMaize()
and plotDetectOutlierPlantMaize()
functions to do the previous steps.
In this vignette, we use a toy data set of the openSilexStatR library (anonymized real data set).
This data set was obtained from an experiment of maize performed in the Phenoarch greenhouse composed of a conveyor belt structure of 28 lanes carrying 60 carts with one pot each (i.e. 1680 pots) (Cabrera-Bosquet et al. 2016). The data contains one experiment with 90 genotypes (Genotype) from two genotypic panels (Population) and two water scenarios (Treatment): well watered (WW) and water deficit (WD).
The leaf area and the biomass of individual plants are estimated from images taken in 13 directions. Briefly, pixels extracted from RGB images are converted into biomass and leaf area using linear models derived from regression of data from multiple side view images and destructive measurements performed at different phenological stages, from 5 to 14 appeared leaves (i.e. from 15 to 50 days at 20°C after emergence). Time courses of biomass (Biomass_Estimated) and leaf area (LA_Estimated) are expressed as a function of thermal time (TT). The height of each plants (Height_Estimated) is also estimated from the pictures. The number of visible leaves (count_leaf) is counted at least once a week on each plant. To prevent errors in leaf counting, leaves 5 and 10 of each plant are marked soon after appearance. The phyllocron is calculated as the slope of the linear regression bewtween the number of leaves and the thermal time at 2017-04-27 day, before the beginning of the water deficit. The unique ID of the plant is recorded (plantId), together with the pot position in row (Row) and in column (Col).
mydata<-PAdata str(mydata)
test<-FuncDetectOutlierPlantMaize(datain=mydata,dateBeforeTrt="2017-04-27", param1="Biomass_Estimated",param2="Height_Estimated", param3="phyllocron",paramGeno="Genotype", paramCol="Col",paramRow="Row", threshold=0.95,nCol=28,nRow=60,genotype.as.random=FALSE, timeColumn = "Time")
The FuncDetectOutlierPlantMaize() returns a list of 6 elements :
plot(test$m1, spaTrend = "percentage")
plot(test$m2, spaTrend = "percentage")
plot(test$m2, spaTrend = "percentage")
ggplot(data=test$outputDataframe,aes(x=fittedP1,y=devResP1)) + geom_point()
test$smallOutlier test$bigOutlier
The user can save the residuals and detected outliers in an output file, using write.table()
function.
plotDetectOutlierPlantMaize(datain=PAdata, outmodels=test$smallOutlier, x="Time", y="Biomass_Estimated", genotype="Genotype", idColor="Treatment", idFill="plantId")
plotDetectOutlierPlantMaize(datain=PAdata, outmodels=test$bigOutlier, x="Time", y="Biomass_Estimated", genotype="Genotype", idColor="Treatment", idFill="plantId")
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.