knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
set.seed(56)

The choice of the settings used for the gap filling is important for the quality of the estimation. It is specific to the study and the dataset of the user, so it is interesting to compare different settings and to choose the one that performs the best:

For this, we use the function CompareSim. This function "masks" the identification of the trees fully identified and test if it can be successfully retrieved with the setting chosen. The proportion of correct association is called accuracy.

Preparing the data

We first prepare the data as in the Introduction vignette.

library(vernabota)
data(Paracou6_2016)
Data2fill <- Paracou6_2016[Paracou6_2016$SubPlot==1,]
Data2fill <- PrepData(Data2fill)
data(PriorAllFG_20220126)
PriorAllFG <- PriorAllFG_20220126
PriorAllFG <- PrepPrior(PriorAllFG)

data(PriorParacouNew_20220126)
PriorParacouNew <- PriorParacouNew_20220126
PriorParacouNew <- PrepPrior(PriorParacouNew)
DataAsso <- Paracou6_2016
DataAsso <- PrepData(DataAsso)

Comparing different settings for the simulations using the function CompareSim

NB: for these examples, a low number of simulations is used. For real tests, a higher number of simulations should be performed.

Comparing different settings

We first create lists for the priors and the observation data that we want to test:

priors <- list(PriorAllFG, PriorParacouNew) # priors
DAsso <- list(NULL, DataAsso)               # observation data

Then we create the Param dataframe to explicit the different scenarios to test:

Param <- data.frame(priors = c(1,1,2,1,1),  # here, we used the first prior 
                    # of the list for scenario 1, 2, 4 and 5 and the second for scenario 3
                    dataAsso = c(2,1,2,2,2), # for the second scenario dataAsso is NULL (the data to gapfill are used)
                    weights = c(0.5,0.5,0.5,0.2,0.8),
                    eps = c(0.01,0.01,0.01,0.01,0.01),
                    Determ = c(FALSE,FALSE,FALSE,FALSE,FALSE))
Param

We can then run CompareSim, visualise the scenarios and their results using summary and plot their accuracy using plot or autoplot. You can use the parameter parallel of the CompareSim function to speed up the loop, in this case, you have to first load the doParallel library.

system.time(VBS_test <- CompareSim(Param = Param ,
                       priors = priors, D2fill = Data2fill, DAsso = DAsso,
                       pc2fill = 10, pcFamilyDet = 25, pcGenusDet = 25, 
                       NbSim = 10, Results_Simulations = FALSE, parallel = FALSE))
# utilisateur     système      écoulé 
#       19.55        0.20       19.86 

# system.time(VBS_test <- CompareSim(Param = Param ,
#                        priors = priors, D2fill = Data2fill, DAsso = DAsso,
#                        pc2fill = 10, pcFamilyDet = 25, pcGenusDet = 25,
#                        NbSim = 10, Results_Simulations = FALSE, parallel = TRUE))
# utilisateur     système      écoulé 
#        0.19        0.56        8.96 

summary(VBS_test)
autoplot(VBS_test)

Testing deterministic associations

Param <- data.frame(priors = c(1,1,2,1,1),  
                    dataAsso = c(2,1,2,2,2), 
                    weights = c(0.5,0.5,0.5,0.2,0.8),
                    eps = c(0.01,0.01,0.01,0.01,0.01),
                    Determ = c(TRUE,TRUE,TRUE,TRUE,TRUE))
Param

Here there will be lots of warning messages (not displayed here) in cases when two associations are equality likely. This also explain the variability of the accuracy plotted.

VBS_test <- CompareSim(Param = Param ,
                       priors = priors, D2fill = Data2fill, DAsso = DAsso,
                       pc2fill = 10, pcFamilyDet = 25, pcGenusDet = 25, 
                       NbSim = 10, Results_Simulations = FALSE)
summary(VBS_test)
autoplot(VBS_test)

Checking stability of association accuracy

We can plot the variability of the accuracy with increasing number of simulations used: 1, ..., NbSim, with the function stabletest.

Param <- data.frame(priors = c(2),
                    dataAsso = c(2),
                    weights = c(0.5),
                    eps = c(0.01),
                    Determ = c(FALSE))
Param
VBS_test <- CompareSim(Param = Param ,
                       priors = priors, D2fill = Data2fill, DAsso = DAsso,
                       pc2fill = 10, pcFamilyDet = 25, pcGenusDet = 25, 
                       NbSim = 10, Results_Simulations = TRUE)

stabletest(VBS_test,1)

Here we see that we could increase the number of simulations, but it seems to be stabilized after 6 simulations.

Comparing different sampled datasets

As CompareSim uses only one test dataset, we want to check the potential variability resulting from the sampling of the test dataset. This can be done with the CompareSample function. This function samples NbSamples test datasets and performs a comparison between scenarios for each test dataset. As for the function CompareSim, the option parallel can be set to TRUE to speed up the loop.

Param <- data.frame(priors = c(2,1,2),  
                    dataAsso = c(1,1,2), 
                    weights = c(0.5,0.5,0.5),
                    eps = c(0.01,0.01,0.01),
                    Determ = c(FALSE,FALSE,FALSE))

system.time(VBS_test2 <- CompareSample(NbSamples = 3,Param = Param ,
                       priors = priors, D2fill = Data2fill, DAsso = DAsso,
                       pc2fill = 10, pcFamilyDet = 25, pcGenusDet = 25, 
                       NbSim = 10, Results_Simulations = FALSE, parallel = FALSE))
# utilisateur     système      écoulé 
#      35.08        0.42       36.48 

# system.time(VBS_test2 <- CompareSample(NbSamples = 3,Param = Param ,
#                        priors = priors, D2fill = Data2fill, DAsso = DAsso,
#                        pc2fill = 10, pcFamilyDet = 25, pcGenusDet = 25, 
#                        NbSim = 10, Results_Simulations = FALSE, parallel = TRUE))
# utilisateur     système      écoulé 
#       0.48        0.61       21.78 

# results with the first dataset
autoplot(VBS_test2[[1]][[1]])
# results with the second dataset are a bit different
autoplot(VBS_test2[[1]][[2]])

# all results together
library(ggplot2)
ggplot(aes(y = accuracy, x = sample, fill = scenario), data = VBS_test2[[2]]) + geom_boxplot()

Here, the ranking is preserved between different test datasets: scenario c gives better results. This confirms that the variability between scenarios is higher than the one between samplings. We can then choose a scenario according to this ranking.

Examining results associations

Param <- data.frame(priors = c(2), 
                    dataAsso = c(2), 
                    weights = c(0.5),
                    eps = c(0.01),
                    Determ = c(FALSE))
Param

We now want to examine the association tree by tree. The simulations of each scenario (here just one) can be retrieved as a list of data.table.

In cases where the original data contained several censuses of a same plots (i.e. several lines per individual trees), the output contained in results keeps only one line per individual, and only a subset of colums from the original dataset (the ones that don't change between censuses).

We can then look at each of the simulations: TestData indicates if the tree was used for test subset, and ValidAsso if the botanical association was correct.

VBS_test <- CompareSim(Param = Param ,
                       priors = priors, D2fill = Data2fill, DAsso = DAsso,
                       pc2fill = 10, pcFamilyDet = 25, pcGenusDet = 25, 
                       NbSim = 10, Results_Simulations = TRUE)
ResL <- VBS_test@results[[1]] # here we get all the simulation of scenario 1
str(ResL[[1]]) # first simulation

We can also calculate the percentage of good association for each tested tree, for this scenario:

library(data.table)
Res <- rbindlist(ResL) # combine them in a single data.table
# calculate the percentage of good association for each tested tree
PropGood <- Res[TestData==TRUE & ValidAsso==TRUE,
                list(propOK=.N/length(ResL)), # number of correct association divided by number of association (NbSim)
                by=idTree]
head(PropGood)


EcoFoG/vernabota documentation built on Feb. 15, 2023, 6:40 p.m.