dist_SDA: distance measurement for symbolic data

View source: R/dist_SDA.r

dist_SDAR Documentation

distance measurement for symbolic data

Description

calculates distances between symbolic objects described by interval-valued, multinominal and multinominal with weights variables

Usage

dist_SDA(table.Symbolic,type="U_2",subType=NULL,gamma=0.5,power=2,probType="J",
probAggregation="P_1",s=0.5,p=2,variableSelection=NULL,weights=NULL)

Arguments

table.Symbolic

symbolic data table

type

distance measure for boolean symbolic objects: H, U_2, U_3, U_4, C_1, SO_1, SO_2, SO_3, SO_4, SO_5; mixed symbolic objects: L_1, L_2

subType

comparison function for C_1 and SO_1: D_1, D_2, D_3, D_4, D_5

gamma

gamma parameter for U_2 and U_3, gamma [0, 0.5]

power

power parameter for U_2 and U_3; power [1, 2, 3, ..]

probType

distance measure for probabilistic symbolic objects: J, CHI, REN, CHER, LP

probAggregation

agregation function for J, CHI, REN, CHER, LP: P_1, P_2

s

parameter for Renyi (REN) and Chernoff (CHE) distance, s [0, 1)

p

parameter for Minkowski (LP) metric; p=1 - manhattan distance, p=2 - euclidean distance

variableSelection

numbers of variables used for calculation or NULL for all variables

weights

weights of variables for Minkowski (LP) metrics

Details

Distance measures for boolean symbolic objects:

H - Hausdorff's distance for objects described by interval-valued variables, U_2, U_3, U_4 - Ichino-Yaguchi's distance measures for objects described by interval-valued and/or multinominal variables, C_1, SO_1, SO_2, SO_3, SO_4, SO_5 - de Carvalho's distance measures for objects described by interval-valued and/or multinominal variables.

Distance measurement for probabilistic symbolic objects consists of two steps: 1. Calculation of distance between objects for each variable using componentwise distance measures: J (Kullback-Leibler divergence), CHI (Chi-2 divergence), REN (Renyi's divergence), CHER (Chernoff's distance), LP (modified Minkowski metrics). 2. Calculation of aggregative distance between objects based on componentwise distance measures using objectwise distance measure: P_1 (manhattan distance), P_2 (euclidean distance).

Distance measures for mixed symbolic objects - modified Minkowski metrics: L_1 (manhattan distance), L_2 (euclidean distance).

See file ../doc/dist_SDA.pdf for further details

NOTE !!!: In previous version of package this functian has been called dist.SDA.

Value

distance matrix of symbolic objects

Author(s)

Andrzej Dudek andrzej.dudek@ue.wroc.pl, Justyna Wilk justyna.wilk@ue.wroc.pl Department of Econometrics and Computer Science, Wroclaw University of Economics, Poland http://keii.ue.wroc.pl/symbolicDA/

References

Billard L., Diday E. (eds.) (2006), Symbolic Data Analysis, Conceptual Statistics and Data Mining, John Wiley & Sons, Chichester.

Bock H.H., Diday E. (eds.) (2000), Analysis of Symbolic Data. Explanatory methods for extracting statistical information from complex data, Springer-Verlag, Berlin.

Diday E., Noirhomme-Fraiture M. (eds.) (2008), Symbolic Data Analysis with SODAS Software, John Wiley & Sons, Chichester.

Ichino, M., & Yaguchi, H. (1994),Generalized Minkowski metrics for mixed feature-type data analysis. IEEE Transactions on Systems, Man, and Cybernetics, 24(4), 698-708. Available at: doi: 10.1109/21.286391.

Malerba D., Espozito F, Giovalle V., Tamma V. (2001), Comparing Dissimilarity Measures for Symbolic Data Analysis, "New Techniques and Technologies for Statistcs" (ETK NTTS'01), pp. 473-481.

Malerba, D., Esposito, F., Monopoli, M. (2002), Comparing dissimilarity measures for probabilistic symbolic objects, In: A. Zanasi, C.A. Brebbia, N.F.F. Ebecken, P. Melli (Eds.), Data Mining III, "Series Management Information Systems", Vol. 6, WIT Press, Southampton, pp. 31-40.

See Also

DClust, index.G1d; dist.Symbolic in clusterSim library

Examples

# LONG RUNNING - UNCOMMENT TO RUN
#data("cars",package="symbolicDA")
#dist<-dist_SDA(cars, type="U_3", gamma=0.3, power=2)
#print(dist)

symbolicDA documentation built on May 28, 2022, 1:08 a.m.