In EcoFoG/vernabota: Association between vernacular and botanical names for Guyafor data

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

This document describes the methods and use of the package vernabota to gapfill missing botanical names using vernacular names, in the case of Guyafor census data. The objective is to obtain a chosen number of simulated communities for which individuals only identified with a vernacular name are given a botanical name based on probabilities of association of vernacular and botanical names. It is largely based on the work and codes from @Aubry-Kientz2013 and @Mirabel2018a.

The models are described here.

We set a seed for reproducibility.

set.seed(56)

Some of the examples below require the package data.table.

library(data.table)

Preparing the data

Data that we want to gapfill

This algorithm works on a dataset formatted as it is when obtained using the function EcoFoG::Guyafor2df or from the online data platform of Paracou.

There can be several censuses for a same plot (i.e. several lines per individual trees).

Here, we take the example of data from plot 6, census of 2016, and use subplot 1 as the dataset that we want to gapfill. We call this dataset Data2fill.

In this dataset, the column VernName should not contain any special character such as é, è or œ (data from the Guyafor database should not have these special characters).

We use the function PrepData to prepare the data.

library(vernabota)
data(Paracou6_2016)
Data2fill <- Paracou6_2016[Paracou6_2016$SubPlot==1,]
Data2fill <- PrepData(Data2fill)
str(Data2fill)

Prior: expert knowledge on possible associations

The prior is a dataframe with vernacular names in columns and botanical names in rows (given in 3 column Family, Genus and Species. For a given vernacular name and a given botanical name, the value is 1 if the association is possible, according to expert knowledge, and 0 if not.

We propose three prior files resulting from the work of Jean-Maurice @Madkaud2012, updated using the code 5_Dev/Prior_Verna_Bota_Name_Cleaning/Prior_Verna_Bota_Name_Cleaning.Rmd in January 2022.

data(PriorAllFG_20220126)
PriorAllFG <- PriorAllFG_20220126
str(PriorAllFG[,1:10])

data(PriorParacouNew_20220126)
PriorParacouNew <- PriorParacouNew_20220126
# str(PriorParacouNew[,1:10])

# data(PriorParacouOld_20220126)
# PriorParacouOld <- PriorParacouOld_20220126
# str(PriorParacouOld[,1:10])

We use the function PrepPrior to prepare the prior. Here we use the default settings because we want to remove the botanical names with non-determined species, and the botanical names not in Guyafor from the prior. The reason is that these names would always lead to incorrect association when using the CompareSim function (see below). However, one may decide to keep them, this would lead to

possible associations with a botanical name of the form Genus-Indet. with a BotaCodeCor="AssoByGenus" or ="AssoByFam" (with RemoveIndetSp==TRUE) .
possible associations with a botanical name that has never been observed in Guyafor (with RemoveNotGuyafor==TRUE).

In these latter cases, only the prior information would be used.

PriorAllFG <- PrepPrior(PriorAllFG)
str(PriorAllFG[,1:10])

PriorParacouNew <- PrepPrior(PriorParacouNew)
# str(PriorParacouNew[,1:10])
# 
# PriorParacouOld <- PrepPrior(PriorParacouOld)
# str(PriorParacouOld[,1:10])

Observation data to update the prior

To build the matrix of association between vernacular and scientific names, we can either use the same dataset than the one for which we want to perform the association or another dataset. The user needs to carefully think this choice through. Using the same dataset can lead to underestimating diversity as it consider that there cannot be any dispersal of species from outside. Using a too wide data set could lead to associating species that are not present in the area.

There can be several censuses for a same plot (i.e. several lines per individual trees).

In this dataset, the column VernName should not contain any special character such as é, è or œ (data from the Guyafor database should not have these special characters).

Here, we use data from plot 6 (all four subplots), census of 2016. We call this dataset DataAsso.

We use the function PrepData to prepare the data.

DataAsso <- Paracou6_2016
DataAsso <- PrepData(DataAsso)
str(DataAsso)

Running some simulations using the function SimFullCom

NB: for these examples, a low number of simulations is used. For real tests, a higher number of simulations should be performed.

The SimFullCom function returns the original dataset with two additional columns:

GensSpCor: The Genus and species after gap filling.
BotaCorCode : the type of correction (see section Possible types of gapfilling in this vignette, and the help of the SimFullCom function).

In cases where the original data contained several censuses of a same plots (i.e. several lines per individual trees), the output keeps the several censuses. For a given simulations, all observations of a same tree have the same botanical names associated.

Example 1: using the same dataset for Data2fill and DataAsso, without prior

DataNSim <- SimFullCom(Data2fill, NSim=2, eps=0.01)
str(DataNSim, max.level = 1)
colnames(DataNSim[[1]])
table(DataNSim[[1]]$BotaCorCode)

Example 2: using different dataset for Data2fill and DataAsso, with a prior (different weighing of the prior and the observations)

Here we have a weight of 0.2 for the prior and of 0.8 for the observations.

DataNSim <- SimFullCom(Data2fill=Data2fill, DataAsso=DataAsso, 
                       prior=PriorAllFG, wp=0.2, NSim=2, eps=0.01)
#str(DataNSim, max.level = 1)
#colnames(DataNSim[[1]])
table(DataNSim[[1]]$BotaCorCode)

Example 3: getting the more likely associations (using `Determ=TRUE`)

As we want to simulate the more likely associations, we set NSim to 1.

DataNSim <- SimFullCom(Data2fill=Data2fill, DataAsso=DataAsso, 
                       prior=PriorAllFG, wp=0.2, NSim=1, eps=0.01, Determ=TRUE)
#str(DataNSim, max.level = 1)
#colnames(DataNSim[[1]])
table(DataNSim[[1]]$BotaCorCode)

Comparing different settings for the simulations using the function CompareSim

See the article.

Bibliography

EcoFoG/vernabota documentation built on Feb. 15, 2023, 6:40 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

EcoFoG/vernabota
Association between vernacular and botanical names for Guyafor data

In EcoFoG/vernabota: Association between vernacular and botanical names for Guyafor data

Preparing the data

Data that we want to gapfill

Prior: expert knowledge on possible associations

Observation data to update the prior

Running some simulations using the function SimFullCom

Example 1: using the same dataset for Data2fill and DataAsso, without prior

Example 2: using different dataset for Data2fill and DataAsso, with a prior (different weighing of the prior and the observations)

Example 3: getting the more likely associations (using `Determ=TRUE`)

Comparing different settings for the simulations using the function CompareSim

Bibliography

R Package Documentation

Browse R Packages

We want your feedback!

EcoFoG/vernabota Association between vernacular and botanical names for Guyafor data

In EcoFoG/vernabota: Association between vernacular and botanical names for Guyafor data

Preparing the data

Data that we want to gapfill

Prior: expert knowledge on possible associations

Observation data to update the prior

Running some simulations using the function SimFullCom

Example 1: using the same dataset for Data2fill and DataAsso, without prior

Example 2: using different dataset for Data2fill and DataAsso, with a prior (different weighing of the prior and the observations)

Example 3: getting the more likely associations (using Determ=TRUE)

Comparing different settings for the simulations using the function CompareSim

Bibliography

R Package Documentation

Browse R Packages

We want your feedback!

EcoFoG/vernabota
Association between vernacular and botanical names for Guyafor data

Example 3: getting the more likely associations (using `Determ=TRUE`)