knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
set.seed(56)
The choice of the settings used for the gap filling is important for the quality of the estimation. It is specific to the study and the dataset of the user, so it is interesting to compare different settings and to choose the one that performs the best:
For this, we use the function CompareSim. This function "masks" the identification of the trees fully identified and test if it can be successfully retrieved with the setting chosen. The proportion of correct association is called accuracy.
We first prepare the data as in the Introduction vignette.
library(vernabota) data(Paracou6_2016) Data2fill <- Paracou6_2016[Paracou6_2016$SubPlot==1,] Data2fill <- PrepData(Data2fill)
data(PriorAllFG_20220126) PriorAllFG <- PriorAllFG_20220126 PriorAllFG <- PrepPrior(PriorAllFG) data(PriorParacouNew_20220126) PriorParacouNew <- PriorParacouNew_20220126 PriorParacouNew <- PrepPrior(PriorParacouNew)
DataAsso <- Paracou6_2016 DataAsso <- PrepData(DataAsso)
NB: for these examples, a low number of simulations is used. For real tests, a higher number of simulations should be performed.
We first create lists for the priors and the observation data that we want to test:
priors <- list(PriorAllFG, PriorParacouNew) # priors DAsso <- list(NULL, DataAsso) # observation data
Then we create the Param dataframe to explicit the different scenarios to test:
priors : a vector with the rank of the priors to use in the priors list
dataAsso : a vector with the rank of the observation data to use in
the DAsso list (if Dasso is not provided, put 1)
weights : a vector with the weights of the priors
eps : a vector with the epsilon value for each scenario
Determ : a vector with the value of Determ (boolean)
Param <- data.frame(priors = c(1,1,2,1,1), # here, we used the first prior # of the list for scenario 1, 2, 4 and 5 and the second for scenario 3 dataAsso = c(2,1,2,2,2), # for the second scenario dataAsso is NULL (the data to gapfill are used) weights = c(0.5,0.5,0.5,0.2,0.8), eps = c(0.01,0.01,0.01,0.01,0.01), Determ = c(FALSE,FALSE,FALSE,FALSE,FALSE)) Param
We can then run CompareSim, visualise the scenarios and their results using summary and plot their accuracy using plot or autoplot. You can use the parameter parallel of the CompareSim function to speed up the loop, in this case, you have to first load the doParallel library.
system.time(VBS_test <- CompareSim(Param = Param , priors = priors, D2fill = Data2fill, DAsso = DAsso, pc2fill = 10, pcFamilyDet = 25, pcGenusDet = 25, NbSim = 10, Results_Simulations = FALSE, parallel = FALSE)) # utilisateur système écoulé # 19.55 0.20 19.86 # system.time(VBS_test <- CompareSim(Param = Param , # priors = priors, D2fill = Data2fill, DAsso = DAsso, # pc2fill = 10, pcFamilyDet = 25, pcGenusDet = 25, # NbSim = 10, Results_Simulations = FALSE, parallel = TRUE)) # utilisateur système écoulé # 0.19 0.56 8.96 summary(VBS_test) autoplot(VBS_test)
Param <- data.frame(priors = c(1,1,2,1,1), dataAsso = c(2,1,2,2,2), weights = c(0.5,0.5,0.5,0.2,0.8), eps = c(0.01,0.01,0.01,0.01,0.01), Determ = c(TRUE,TRUE,TRUE,TRUE,TRUE)) Param
Here there will be lots of warning messages (not displayed here) in cases when two associations are equality likely. This also explain the variability of the accuracy plotted.
VBS_test <- CompareSim(Param = Param , priors = priors, D2fill = Data2fill, DAsso = DAsso, pc2fill = 10, pcFamilyDet = 25, pcGenusDet = 25, NbSim = 10, Results_Simulations = FALSE) summary(VBS_test) autoplot(VBS_test)
We can plot the variability of the accuracy with increasing number of simulations used: 1, ..., NbSim, with the function stabletest.
Param <- data.frame(priors = c(2), dataAsso = c(2), weights = c(0.5), eps = c(0.01), Determ = c(FALSE)) Param
VBS_test <- CompareSim(Param = Param , priors = priors, D2fill = Data2fill, DAsso = DAsso, pc2fill = 10, pcFamilyDet = 25, pcGenusDet = 25, NbSim = 10, Results_Simulations = TRUE) stabletest(VBS_test,1)
Here we see that we could increase the number of simulations, but it seems to be stabilized after 6 simulations.
As CompareSim uses only one test dataset, we want to check the potential variability resulting from the sampling of the test dataset. This can be done with the CompareSample function. This function samples NbSamples test datasets and performs a comparison between scenarios for each test dataset. As for the function CompareSim, the option parallel can be set to TRUE to speed up the loop.
Param <- data.frame(priors = c(2,1,2), dataAsso = c(1,1,2), weights = c(0.5,0.5,0.5), eps = c(0.01,0.01,0.01), Determ = c(FALSE,FALSE,FALSE)) system.time(VBS_test2 <- CompareSample(NbSamples = 3,Param = Param , priors = priors, D2fill = Data2fill, DAsso = DAsso, pc2fill = 10, pcFamilyDet = 25, pcGenusDet = 25, NbSim = 10, Results_Simulations = FALSE, parallel = FALSE)) # utilisateur système écoulé # 35.08 0.42 36.48 # system.time(VBS_test2 <- CompareSample(NbSamples = 3,Param = Param , # priors = priors, D2fill = Data2fill, DAsso = DAsso, # pc2fill = 10, pcFamilyDet = 25, pcGenusDet = 25, # NbSim = 10, Results_Simulations = FALSE, parallel = TRUE)) # utilisateur système écoulé # 0.48 0.61 21.78 # results with the first dataset autoplot(VBS_test2[[1]][[1]]) # results with the second dataset are a bit different autoplot(VBS_test2[[1]][[2]]) # all results together library(ggplot2) ggplot(aes(y = accuracy, x = sample, fill = scenario), data = VBS_test2[[2]]) + geom_boxplot()
Here, the ranking is preserved between different test datasets: scenario c gives better results. This confirms that the variability between scenarios is higher than the one between samplings. We can then choose a scenario according to this ranking.
Param <- data.frame(priors = c(2), dataAsso = c(2), weights = c(0.5), eps = c(0.01), Determ = c(FALSE)) Param
We now want to examine the association tree by tree. The simulations of each scenario (here just one) can be retrieved as a list of data.table.
In cases where the original data contained several censuses of a same plots (i.e. several lines per individual trees), the output contained in results keeps only one line per individual, and only a subset of colums from the original dataset (the ones that don't change between censuses).
We can then look at each of the simulations: TestData indicates if the tree was used for test subset, and ValidAsso if the botanical association was correct.
VBS_test <- CompareSim(Param = Param , priors = priors, D2fill = Data2fill, DAsso = DAsso, pc2fill = 10, pcFamilyDet = 25, pcGenusDet = 25, NbSim = 10, Results_Simulations = TRUE) ResL <- VBS_test@results[[1]] # here we get all the simulation of scenario 1 str(ResL[[1]]) # first simulation
We can also calculate the percentage of good association for each tested tree, for this scenario:
library(data.table) Res <- rbindlist(ResL) # combine them in a single data.table # calculate the percentage of good association for each tested tree PropGood <- Res[TestData==TRUE & ValidAsso==TRUE, list(propOK=.N/length(ResL)), # number of correct association divided by number of association (NbSim) by=idTree] head(PropGood)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.