# Species-richness prediction and diversity estimation with R

### Description

Provides simple functions to compute various biodiversity indices and related (dis)similarity measures based on individual-based (abundance) data or sampling-unit-based (incidence) data taken from one or multiple communities/assemblages.

This package contains six main functions:

1. `ChaoSpecies`

(estimating species richness for one community).

2. `Diversity`

(estimating a continuous diversity profile and various diversity indices in one community including species richness, Shannon
diversity and Simpson diversity). This function also features plots of empirical and estimated continuous diversity profiles.

3. `ChaoShared`

(estimating the number of shared species between two communities).

4. `SimilartyPair`

(estimating various similarity indices between two assemblages). Both richness- and abundance-based two-community similarity indices are included.

5. `SimilarityMult`

(estimating various similarity indices among *N* communities). Both richness- and abundance-based *N*-community similarity indices are included.

6. `Genetics`

(estimating allelic dissimilarity/differentiation among sub-populations based on multiple-subpopulation genetics data).

Except for the `Genetics`

function, there are at least three types of data are supported for each function.

### Details

Data are generally classified as abundance data and incidence data and there are five types of data input formats options (datatype="abundance", "abundance_freq_count", "incidence_freq", "incidence_freq_count", "incidence_raw").

- A.
Individual-based abundance data when a sample of individuals is taken from each community.

**Type (1) abundance data** (datatype = "abundance"): Input data consist of species (in rows) by community (in columns) matrix. The entries of each row are the observed abundances of a species in *N* communities.

**Type (1A) abundance-frequency counts data** only for a single community (datatype = "abundance_freq_count"): input data are arranged as (1 *f_1 \ 2 \ f_2 \ ... \ r \ f_r*)(each number needs to be separated by at least one blank space or separated by rows), where *r* denotes the maximum frequency and *f_k* denotes the number of species represented by exactly *k* individuals/times in the sample. Here the data (*f_1, f_2, ..., f_r*) are referred to as "abundance-frequency counts".

- B.
Sampling-unit-based incidence data when a number of sampling units are randomly taken from each community. Only the incidence (detection/non-detection) of species is recorded in each sampling unit. There are three data formats options.

**Type (2) incidence-frequency data** (datatype="incidence_freq"): The first row of the input data must be the number of sampling units in each community. Beginning with the second row, input data consist of species (in rows) by community (in columns) matrix. The entries of each row are the observed incidence frequencies (the number of detections or the number of sampling units in which a species are detected) of a species in *N* communities.

**Type (2A) incidence-frequency counts data** only for a single community (datatype="incidence

_freq_count"): input data are arranged as (*T \ 1 \ Q_1 \ 2 \ Q_2 \ ... \ r \ Q_r*) (each number needs to be separated by at least one blank space or separated by rows), where *Q_k* denotes the number of species that were detected in exactly *k* sampling units, while *r* denotes the number of sampling units in which the most frequent species were found. The first entry must be the total number of sampling units, *T*. The data (*Q_1, Q_2, ..., Q_r*) are referred to as "incidence frequency counts".

**Type (2B) incidence-raw data** (datatype="incidence_raw"): Data consist of a species-by-sampling-unit incidence (detection/non-detection) matrix; typically "1" means a detection and "0" means a non-detection. Each row refers to the detection/non-detection record of a species in *T* sampling units. Users must specify the number of sampling units in the function argument "units". The first *T_1* columns of the input matrix denote species detection/non-detection data based on the *T_1* sampling units from Community 1, and the next *T_2* columns denote the detection/non-detection data based on the *T_2* sampling units from Community 2, and so on, and the last *T_N* columns denote the detection/non-detection data based on *T_N* sampling units from Community *N*, *T_1 + T_2 + ... + T_N = T*.

An Online version of SpadeR is also available for users without an R background:

http://chao.stat.nthu.edu.tw/wordpress/software_download/softwarespader_online/.

In the detailed Online SpadeR User's Guide, we illustrate all the running procedures in an easily
accessible way through numerical examples with proper interpretations of portions of the output.
All the data of those illustrative examples are included in this package.

functions: ChaoSpecies, Diversity, ChaoShared, SimilarityPair, SimilarityMult, Genetics

### Author(s)

Anne Chao, K. H. Ma, T. C. Hsieh and Chun-Huo Chiu

Maintainer: Anne Chao <chao@stat.nthu.edu.tw>