diversity: *Main* function to compute diversity measures

Description Usage Arguments Details Value References Examples

View source: R/diversity.R

Description

Main function of the package. The diversity function computes diversity measures for a dataset with entities, categories and values.

Usage

1
2
diversity(data, type = "all", category_row = FALSE, dis = NULL,
  method = "euclidean", q = 0, alpha = 1, beta = 1, base = exp(1))

Arguments

data

A numeric matrix with entities i in the rows and categories j in the columns. Cells show the respective value (value of abundance) of entity i in the category j. It can also be a transpose of the previous matrix, that is, a matrix with categories in the rows and entities in the columns. Yet in that case, the argument "category_row" has to be set to TRUE. The matrix must include names for the rows and the columns. The argument "data", also accepts a dataframe with three columns in the following order: entity, category and value.

type

A string or a vector of strings of nemonic strings referencing to the available diversity measures. The available measures are: "variety", (Shannon) "entropy", "blau","gini-simpson", "simpson", "hill-numbers", "herfindahl-hirschman", "berger-parker", "renyi", (Pielou) "evenness", "rao", "rao-stirling". A list of short mnemonics for each measure: "v", "e", "gs", "s", "td", "hh", "bp", "re", "ev", "r", and "rs". The default for type is "all" which computes all available formulas.

category_row

A flag to indicate that categories are in the rows. The analysis assumes that the categories are in the columns of the matrix. If the categories are in the rows and the entities in the columns, then the argument "category_row" has to be set to TRUE. The default value is FALSE.

dis

Optional square matrix of distances or dissimilarities between categories. It allows the user to provide her own matrix of dissimilarities between categories. The category names have to be both in the rows and in the columns, and these must be the exact same names used by the categories in the argument "data". Only the upper triangle will be used. If the argument "dis" is not defined, and the user requires a measure that uses disparities (e.g. Rao), then a matrix of disparities is computed internally using the method defined by the argument 'method'. The default value is NULL.

method

The "rao-stirling" and "rao"-diversity indices use a disparity function to measure the distance between objects. If the user does not provide a matrix with disparities by using the argument 'dis', then a matrix of disparities is computed using the method specified in this argument (method). Possible values for this argument are distance or dissimilarity methods available in "proxy" package as for example "Euclidean", "Kullback" or "Canberra". This argument also accepts a similarity method available in the "proxy" package, as for example: "cosine", "correlation" or "Jaccard" among others. In the latter case, a correspondent transformation to a dissimilarity measure will be retrieved. A list of available methods can be queried by using the function pr_DB. e.g. summary(pr_DB). The default value is Euclidean distance.

q

The parameter used for the hill numbers. This argument is also used for the Renyi entropy and HCDT entropy. The default value is 0.

alpha

Parameter for Rao-Stirling diversity. The default value is 1.

beta

Parameter for Rao-Stirling diversity. The default value is 1.

base

Base of the logarithm. Used in Entropy calculations. The default value is exp(1).

Details

Notation used in the following formulas: N, category count; p_i, proportion of entity comprises category i; d_{ij}, disparity between i and j; q,α and β, arguments.

The available diversity measures included in the package are listed above. The titles of the formulas are the possible mnemonic values that the argument "type" might take to compute that formula (i.e. diversity(data, type='variety') or diversity(data, type='v'):

variety, v: Category counts per entity [MacArthur 1965]

∑_i(p_i^0)

.

entropy, e: Shannon entropy per entity [Shannon 1948]

- ∑_i(p_i \log p_i)

Herfindahl-Hirschman, hh, hhi: The Herfindahl-Hirschman Index used in economy to measure the concentration of markets.

∑_i(p_i^2)

gini-simpson, gs: Gini-Simpson index per object [Gini 1912]. This measure is also known as the Gibbs-Martin index or the Blau index in sociology, psychology and management studies.

1 - ∑_i(p_i^2)

simpson, s: Simpson index per entity [Simpson 1949].

D = ∑_i n_i(n_i-1) / N(N-1)

When this measure is required, then also associated variations Simpson's Index of Diversity 1-D and the Reciprocal Simpson 1/D will be computed.

hill-numbers, td,hn: Hill Numbers [Hill 1973]. This measure is q parameterized. When q=1, it results in the exponential of Shannon Entropy. Default for q is 0, this is the variety or richness.

(∑_ip_{i}^q)^{1/(1-q)}

berger-parker, bp: Berger-Parker index is equals to the maximum p_i value in the entity, i.e. the proportional abundance of the most abundant type. When this measure is required, the reciprocal measure is also computed.

renyi, re: Renyi entropy per object. This measure is a generalization of the Shannon entropy parameterized by q. It corresponds to the logarithm of the hill numbers. The default value for q is 0.

(1-q)^{-1} \log(∑_i p_i^q)

evenness, ev: Pielou evenness per object across categories [Pielou, 1969]. It is based in Shannon Entropy

-∑_i(p_i \log p_i)/\log{v}

rao: Rao diversity.

∑_{ij}d_{ij} p_i p_j

rao-stirling, rs: Rao-Stirling diversity per object across categories [Stirling, 2007]. Default values are α=1 and β=1. For the pairwise disparities the measure allows to consider the Jaccard Index, Euclidean distances, Cosine Similarity among others.

∑_{ij}{d_{ij}}^α {(p_i p_j )}^β

Value

A data frame with diversity measures as columns for each entity.

References

Gini, C. (1912). "Italian: Variabilita e mutabilita" 'Variability and Mutability', Memorie di metodologica statistica.

Hill, M. (1973). "Diversity and evenness: a unifying notation and its consequences". Ecology 54: 427-432.

MacArthur, R. (1965). "Patterns of Species Diversity". Biology Reviews 40: 510-533.

Pielou, E. (1969). "An Introduction to Mathematical Ecology". Wiley.

Shannon, C. (1948). "A Mathematical Theory of Communication". Bell entity Technical Journal 27 (3): 379-423.

Simpson, A. (1949). "Measurement of Diversity". Nature 163: 41-48.

Stirling, A. (2007). "A General Framework for Analysing Diversity in Science, Technology and Society". Journal of the Royal Society Interface 4: 707-719.

Rafols, I., & Meyer, M. (2009). Diversity and network coherence as indicators of interdisciplinarity: case studies in bionanoscience. Scientometrics, 82(2), 263-287.

Rafols, I. (2014). Knowledge Integration and Diffusion: Measures and Mapping of Diversity and Coherence. In Y. Ding, R. Rousseau, & D. Wolfram (Eds.), Measuring Scholarly Impact (pp. 169-190). Springer International Publishing.

Chavarro, D., Tang, P., & Rafols, I. (2014). Interdisciplinarity and research on local issues: evidence from a developing country. Research Evaluation, 23(3), 195-209.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
data(pantheon)
diversity(pantheon)
diversity(pantheon, type='variety')
diversity(geese, type='berger-parker', category_row=TRUE)
#reading csv data matrix
path_to_file <- system.file("extdata", "PantheonMatrix.csv", package = "diverse")
X <- read_data(path = path_to_file)
diversity(data=X, type="gini")
diversity(data=X, type="rao-stirling", method="cosine")
diversity(data=X, type="all", method="jaccard")

#reading csv dataframe
path_to_file <- system.file("extdata", "PantheonEdges.csv", package = "diverse")
X <- read_data(path = path_to_file)
#hill numbers
diversity(data=X, type="td", q=1)
#rao stirling with differente arguments
diversity(data=X, type="rao-stirling", method="euclidean", alpha=0, beta=1)
#more than one diversity measure
diversity(data=X, type=c('e','ev','bp','s'))

Example output

Loading required package: proxy

Attaching package: 'proxy'

The following objects are masked from 'package:stats':

    as.dist, dist

The following object is masked from 'package:base':

    as.matrix

Loading required package: reshape2
Loading required package: foreign
             variety  entropy       HHI blau.index gini.simpson gini.simpson.C
Canada            27 2.559899 0.1391628  0.8608372    0.8608372      0.1391628
Chile              7 1.626709 0.2396450  0.7603550    0.7603550      0.2396450
China             24 2.340469 0.1831446  0.8168554    0.8168554      0.1831446
Latvia            10 1.889159 0.2345679  0.7654321    0.7654321      0.2345679
New Zealand        9 2.106577 0.1326531  0.8673469    0.8673469      0.1326531
Portugal          11 1.610050 0.2899660  0.7100340    0.7100340      0.2899660
Saudi Arabia       7 1.331425 0.3453061  0.6546939    0.6546939      0.3453061
South Africa      16 2.440072 0.1211073  0.8788927    0.8788927      0.1211073
Uruguay            4 1.135551 0.3739612  0.6260388    0.6260388      0.3739612
Vietnam            4 1.168518 0.3719008  0.6280992    0.6280992      0.3719008
             gini.simpson.R  simpson.D simpson.I simpson.R hill.numbers
Canada             7.185827 0.13174182 0.8682582  7.590604           27
Chile              4.172840 0.20923077 0.7907692  4.779412            7
China              5.460167 0.17480932 0.8251907  5.720519           24
Latvia             4.263158 0.18954248 0.8104575  5.275862           10
New Zealand        7.538462 0.06593407 0.9340659 15.166667            9
Portugal           3.448680 0.28141136 0.7185886  3.553517           11
Saudi Arabia       2.895981 0.32605042 0.6739496  3.067010            7
South Africa       8.257143 0.09447415 0.9055258 10.584906           16
Uruguay            2.674074 0.33918129 0.6608187  2.948276            4
Vietnam            2.688889 0.30909091 0.6909091  3.235294            4
             berger.parker.D berger.parker.I renyi.entropy  evenness
Canada             0.3162393        3.162162      3.295837 0.7767069
Chile              0.3846154        2.600000      1.945910 0.8359632
China              0.3838384        2.605263      3.178054 0.7364472
Latvia             0.4444444        2.250000      2.302585 0.8204514
New Zealand        0.2142857        4.666667      2.197225 0.9587447
Portugal           0.4285714        2.333333      2.397895 0.6714431
Saudi Arabia       0.4857143        2.058824      1.945910 0.6842170
South Africa       0.2647059        3.777778      2.772589 0.8800700
Uruguay            0.5263158        1.900000      1.386294 0.8191267
Vietnam            0.5454545        1.833333      1.386294 0.8429079
             hcdt.entropy rao.stirling
Canada                 27     12.01241
Chile                   7     14.91499
China                  24     15.06282
Latvia                 10     14.96385
New Zealand             9     12.94902
Portugal               11     14.54679
Saudi Arabia            7     14.40618
South Africa           16     15.07070
Uruguay                 4     12.17966
Vietnam                 4     14.55674
             variety
Canada            27
Chile              7
China             24
Latvia            10
New Zealand        9
Portugal          11
Saudi Arabia       7
South Africa      16
Uruguay            4
Vietnam            4
     berger.parker.D berger.parker.I
1996       0.7124871        1.403534
1997       0.7247953        1.379700
1998       0.7451379        1.342033
1999       0.7674295        1.303051
2000       0.7860407        1.272199
2001       0.8026238        1.245914
2002       0.8127443        1.230399
2003       0.8175550        1.223159
2004       0.8382420        1.192973
2005       0.8392823        1.191494
2006       0.8446653        1.183901
Using Country as id variables
data frame with 0 columns and 10 rows
             rao.stirling
Canada          0.1335890
Chile           0.1773750
China           0.1502522
Latvia          0.2057661
New Zealand     0.1894725
Portugal        0.1341566
Saudi Arabia    0.1472416
South Africa    0.2471609
Uruguay         0.1433775
Vietnam         0.2205377
             variety  entropy       HHI blau.index gini.simpson gini.simpson.C
Canada            27 2.559899 0.1391628  0.8608372    0.8608372      0.1391628
Chile              7 1.626709 0.2396450  0.7603550    0.7603550      0.2396450
China             24 2.340469 0.1831446  0.8168554    0.8168554      0.1831446
Latvia            10 1.889159 0.2345679  0.7654321    0.7654321      0.2345679
New Zealand        9 2.106577 0.1326531  0.8673469    0.8673469      0.1326531
Portugal          11 1.610050 0.2899660  0.7100340    0.7100340      0.2899660
Saudi Arabia       7 1.331425 0.3453061  0.6546939    0.6546939      0.3453061
South Africa      16 2.440072 0.1211073  0.8788927    0.8788927      0.1211073
Uruguay            4 1.135551 0.3739612  0.6260388    0.6260388      0.3739612
Vietnam            4 1.168518 0.3719008  0.6280992    0.6280992      0.3719008
             gini.simpson.R  simpson.D simpson.I simpson.R hill.numbers
Canada             7.185827 0.13174182 0.8682582  7.590604           27
Chile              4.172840 0.20923077 0.7907692  4.779412            7
China              5.460167 0.17480932 0.8251907  5.720519           24
Latvia             4.263158 0.18954248 0.8104575  5.275862           10
New Zealand        7.538462 0.06593407 0.9340659 15.166667            9
Portugal           3.448680 0.28141136 0.7185886  3.553517           11
Saudi Arabia       2.895981 0.32605042 0.6739496  3.067010            7
South Africa       8.257143 0.09447415 0.9055258 10.584906           16
Uruguay            2.674074 0.33918129 0.6608187  2.948276            4
Vietnam            2.688889 0.30909091 0.6909091  3.235294            4
             berger.parker.D berger.parker.I renyi.entropy  evenness
Canada             0.3162393        3.162162      3.295837 0.7767069
Chile              0.3846154        2.600000      1.945910 0.8359632
China              0.3838384        2.605263      3.178054 0.7364472
Latvia             0.4444444        2.250000      2.302585 0.8204514
New Zealand        0.2142857        4.666667      2.197225 0.9587447
Portugal           0.4285714        2.333333      2.397895 0.6714431
Saudi Arabia       0.4857143        2.058824      1.945910 0.6842170
South Africa       0.2647059        3.777778      2.772589 0.8800700
Uruguay            0.5263158        1.900000      1.386294 0.8191267
Vietnam            0.5454545        1.833333      1.386294 0.8429079
             hcdt.entropy rao.stirling
Canada                 27    0.2535743
Chile                   7    0.1798306
China                  24    0.2527998
Latvia                 10    0.2331496
New Zealand             9    0.2200316
Portugal               11    0.1956445
Saudi Arabia            7    0.2116735
South Africa           16    0.2618224
Uruguay                 4    0.1273084
Vietnam                 4    0.2341598
             hill.numbers
Canada          12.934513
Chile            5.087107
China           10.386105
Latvia           6.613805
New Zealand      8.220059
Portugal         5.003062
Saudi Arabia     3.786435
South Africa    11.473870
Uruguay          3.112887
Vietnam          3.217222
             rao.stirling
Canada          0.4304186
Chile           0.3801775
China           0.4084277
Latvia          0.3827160
New Zealand     0.4336735
Portugal        0.3550170
Saudi Arabia    0.3273469
South Africa    0.4394464
Uruguay         0.3130194
Vietnam         0.3140496
              entropy  evenness berger.parker.D berger.parker.I  simpson.D
Canada       2.559899 0.7767069       0.3162393        3.162162 0.13174182
Chile        1.626709 0.8359632       0.3846154        2.600000 0.20923077
China        2.340469 0.7364472       0.3838384        2.605263 0.17480932
Latvia       1.889159 0.8204514       0.4444444        2.250000 0.18954248
New Zealand  2.106577 0.9587447       0.2142857        4.666667 0.06593407
Portugal     1.610050 0.6714431       0.4285714        2.333333 0.28141136
Saudi Arabia 1.331425 0.6842170       0.4857143        2.058824 0.32605042
South Africa 2.440072 0.8800700       0.2647059        3.777778 0.09447415
Uruguay      1.135551 0.8191267       0.5263158        1.900000 0.33918129
Vietnam      1.168518 0.8429079       0.5454545        1.833333 0.30909091
             simpson.I simpson.R
Canada       0.8682582  7.590604
Chile        0.7907692  4.779412
China        0.8251907  5.720519
Latvia       0.8104575  5.275862
New Zealand  0.9340659 15.166667
Portugal     0.7185886  3.553517
Saudi Arabia 0.6739496  3.067010
South Africa 0.9055258 10.584906
Uruguay      0.6608187  2.948276
Vietnam      0.6909091  3.235294

diverse documentation built on May 29, 2017, 3:31 p.m.

Related to diversity in diverse...