BinaryProximities: Proximity Measures for Binary Data

View source: R/BinaryProximities.R

BinaryProximitiesR Documentation

Proximity Measures for Binary Data

Description

Calculation of proxymities among rows or columns of a binary data matrix or a data frame that will be converted into a binary data matrix.

Usage

BinaryProximities(x, y = NULL, coefficient = "Jaccard", transformation =
                 NULL, transpose = FALSE, ...)

Arguments

x

A data frame or a binary data matrix. Proximities among the rows of x will be calculated

y

Supplementary data. The proximities amond the rows of x and the rows of y will be also calculated

coefficient

Similarity coefficient. Use the number or the name (see details)

transformation

Transformation of the similarities. Use the number or the name (see details)

transpose

Logical. If TRUE, proximities among columns are calculated

...

Used to provide additional parameters for the conversion of the dataframe into a binary matrix

Details

A binary data matrix is a matrix with values 0 or 1 coding the absence or presence of several binary characters. When a data frame is provided, every variable in the data frame is converted to a binary variable using the function Dataframe2BinaryMatrix. Factors with two levels are converted directly to binary variables, factors with more than two levels are converted to a matrix with as meny columns as levels and numerical variables are converted to binary variables using a cut point that can be the median, the mean or a value provided by the user.

The following coefficients are calculated

1.- Kulezynski = a/(b + c)

2.- Russell_and_Rao = a/(a + b + c+d)

3.- Jaccard = a/(a + b + c)

4.- Simple_Matching = (a + d)/(a + b + c + d)

5.- Anderberg = a/(a + 2 * (b + c))

6.- Rogers_and_Tanimoto = (a + d)/(a + 2 * (b + c) + d)

7.- Sorensen_Dice_and_Czekanowski = a/(a + 0.5 * (b + c))

8.- Sneath_and_Sokal = (a + d)/(a + 0.5 * (b + c) + d)

9.- Hamman = (a - (b + c) + d)/(a + b + c + d)

10.- Kulezynski = 0.5 * ((a/(a + b)) + (a/(a + c)))

11.- Anderberg2 = 0.25 * (a/(a + b) + a/(a + c) + d/(c + d) + d/(b + d))

12.- Ochiai = a/sqrt((a + b) * (a + c))

13.- S13 = (a * d)/sqrt((a + b) * (a + c) * (d + b) * (d + c))

14.- Pearson_phi = (a * d - b * c)/sqrt((a + b) * (a + c) * (d + b) * (d + c))

15.- Yule = (a * d - b * c)/(a * d + b * c)

The following transformations of the similarity3 are calculated

1.- 'Identity' dis=sim

2.- '1-S' dis=1-sim

3.- 'sqrt(1-S)' dis = sqrt(1 - sim)

4.- '-log(s)' dis=-1*log(sim)

5.- '1/S-1' dis=1/sim -1

6.- 'sqrt(2(1-S))' dis== sqrt(2*(1 - sim))

7.- '1-(S+1)/2' dis=1-(sim+1)/2

8.- '1-abs(S)' dis=1-abs(sim)

9.- '1/(S+1)' dis=1/(sim)+1

Note that, after transformation the similarities are converted to distances except for "Identity". Not all the transformations are suitable for all the coefficients. Use them at your own risk. The default values are admissible combinations.

Value

An object of class proximities.This has components:

TypeData

Binary, Continuous or Mixed. Binary in this case.

Coefficient

Coefficient used to calculate the proximities

Transformation

Transformation used to calculate the proximities

Data

Data used to calculate the proximities

SupData

Supplementary Data, if any

Proximities

Proximities among rows of x. May be similarities or dissimilarities depending on the transformation

SupProximities

Proximities among rows of x and y.

Author(s)

Jose Luis Vicente-Villardon

References

Gower, J. C. (2006) Similarity dissimilarity and Distance, measures of. Encyclopedia of Statistical Sciences. 2nd. ed. Volume 12. Wiley

See Also

BinaryDistances, Dataframe2BinaryMatrix

Examples

data(spiders)
D=BinaryProximities(spiders, coefficient="Jaccard", transformation="sqrt(1-S)")
D2=BinaryProximities(spiders, coefficient=3, transformation=3)

MultBiplotR documentation built on Nov. 21, 2023, 5:08 p.m.