knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
This document describes the methods and use of the package vernabota to gapfill missing botanical names using vernacular names, in the case of Guyafor census data. The objective is to obtain a chosen number of simulated communities for which individuals only identified with a vernacular name are given a botanical name based on probabilities of association of vernacular and botanical names. It is largely based on the work and codes from @Aubry-Kientz2013 and @Mirabel2018a.
The models are described here.
We set a seed for reproducibility.
set.seed(56)
Some of the examples below require the package data.table.
library(data.table)
This algorithm works on a dataset formatted as it is when obtained using the function EcoFoG::Guyafor2df or from the online data platform of Paracou.
There can be several censuses for a same plot (i.e. several lines per individual trees).
Here, we take the example of data from plot 6, census of 2016, and use subplot 1 as the dataset that we want to gapfill. We call this dataset Data2fill.
In this dataset, the column VernName should not contain any special character such as é, è or œ (data from the Guyafor database should not have these special characters).
We use the function PrepData to prepare the data.
library(vernabota) data(Paracou6_2016) Data2fill <- Paracou6_2016[Paracou6_2016$SubPlot==1,] Data2fill <- PrepData(Data2fill) str(Data2fill)
The prior is a dataframe with vernacular names in columns and botanical names in rows (given in 3 column Family
, Genus
and Species.
For a given vernacular name and a given botanical name, the value is 1 if the association is possible, according to expert knowledge, and 0 if not.
We propose three prior files resulting from the work of Jean-Maurice @Madkaud2012, updated using the code 5_Dev/Prior_Verna_Bota_Name_Cleaning/Prior_Verna_Bota_Name_Cleaning.Rmd in January 2022.
data(PriorAllFG_20220126) PriorAllFG <- PriorAllFG_20220126 str(PriorAllFG[,1:10]) data(PriorParacouNew_20220126) PriorParacouNew <- PriorParacouNew_20220126 # str(PriorParacouNew[,1:10]) # data(PriorParacouOld_20220126) # PriorParacouOld <- PriorParacouOld_20220126 # str(PriorParacouOld[,1:10])
We use the function PrepPrior to prepare the prior. Here we use the default settings because we want to remove the botanical names with non-determined species, and the botanical names not in Guyafor from the prior. The reason is that these names would always lead to incorrect association when using the CompareSim function (see below). However, one may decide to keep them, this would lead to
BotaCodeCor="AssoByGenus"
or ="AssoByFam"
(with RemoveIndetSp==TRUE
) .RemoveNotGuyafor==TRUE
).In these latter cases, only the prior information would be used.
PriorAllFG <- PrepPrior(PriorAllFG) str(PriorAllFG[,1:10]) PriorParacouNew <- PrepPrior(PriorParacouNew) # str(PriorParacouNew[,1:10]) # # PriorParacouOld <- PrepPrior(PriorParacouOld) # str(PriorParacouOld[,1:10])
To build the matrix of association between vernacular and scientific names, we can either use the same dataset than the one for which we want to perform the association or another dataset. The user needs to carefully think this choice through. Using the same dataset can lead to underestimating diversity as it consider that there cannot be any dispersal of species from outside. Using a too wide data set could lead to associating species that are not present in the area.
There can be several censuses for a same plot (i.e. several lines per individual trees).
In this dataset, the column VernName
should not contain any special character such as é, è or œ (data from the Guyafor database should not have these special characters).
Here, we use data from plot 6 (all four subplots), census of 2016. We call this dataset DataAsso.
We use the function PrepData to prepare the data.
DataAsso <- Paracou6_2016 DataAsso <- PrepData(DataAsso) str(DataAsso)
NB: for these examples, a low number of simulations is used. For real tests, a higher number of simulations should be performed.
The SimFullCom function returns the original dataset with two additional columns:
GensSpCor
: The Genus and species after gap filling.BotaCorCode
: the type of correction (see section Possible types of gapfilling
in this vignette, and the help of the SimFullCom function).In cases where the original data contained several censuses of a same plots (i.e. several lines per individual trees), the output keeps the several censuses. For a given simulations, all observations of a same tree have the same botanical names associated.
DataNSim <- SimFullCom(Data2fill, NSim=2, eps=0.01) str(DataNSim, max.level = 1) colnames(DataNSim[[1]]) table(DataNSim[[1]]$BotaCorCode)
Here we have a weight of 0.2 for the prior and of 0.8 for the observations.
DataNSim <- SimFullCom(Data2fill=Data2fill, DataAsso=DataAsso, prior=PriorAllFG, wp=0.2, NSim=2, eps=0.01) #str(DataNSim, max.level = 1) #colnames(DataNSim[[1]]) table(DataNSim[[1]]$BotaCorCode)
Determ=TRUE
)As we want to simulate the more likely associations, we set NSim
to 1.
DataNSim <- SimFullCom(Data2fill=Data2fill, DataAsso=DataAsso, prior=PriorAllFG, wp=0.2, NSim=1, eps=0.01, Determ=TRUE) #str(DataNSim, max.level = 1) #colnames(DataNSim[[1]]) table(DataNSim[[1]]$BotaCorCode)
See the article.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.