NC.dist: Normalized Compression Distance
In shipunov: Miscellaneous Functions from Alexey Shipunov

NC.dist

R Documentation

Normalized Compression Distance

Description

Calculates the normalized compression distance

Usage

NC.dist(data, method="gzip", character=TRUE)

Arguments

`data`	Matrix (or data frame) with variables that should be used in the computation of the distance between rows.
`method`	Taken from memCompress(): either "gzip", or "bzip2", or "xz"; the last is very slow
`character`	Convert to character mode (default), or use as raw?

Details

NC.dist() computes the distance based on the sizes of the compressed vectors. It is calculated as

dissimilarity(x, y) = B(x, y) - max(B(x), B(y)) / min(B(x), B(y))

where B(x) and B(y) are the bytesizes of the compressed 'x' and 'y', and B(x, y) is the comressed bytesize of concatenated 'x' and 'y'. The algorithm uses basic memCompress() function.

If argument is the data frame, NC.dist() internally converts it into the matrix. All columns by default will be converted into character mode (and if 'character=FALSE', into raw). This default behavior allows NC.dist() to be the universal distance which also does not mind NAs and zeroes.

Value

Distance object with distances among rows of 'data'

Author(s)

Alexey Shipunov

References

Cilibrasi, R., & Vitanyi, P. M. (2005). Clustering by compression. Information Theory, IEEE Transactions on, 51(4), 1523-1545.

Examples


## converts variables into character, universal method
iris.nc <- NC.dist(iris[, -5])
iris.hnc <- hclust(iris.nc, method="ward.D2")
## amazingly, it works even for vectors with length=4 (iris data rows)
plot(prcomp(iris[, -5])$x, col=cutree(iris.hnc, 3))

## using variables as raw, it is good when they are uniform
iris.nc2 <- NC.dist(iris[, -5], character=FALSE)
iris.hnc2 <- hclust(iris.nc2, method="ward.D2")
plot(prcomp(iris[, -5])$x, col=cutree(iris.hnc2, 3))

## bzip2 uses Burrows-Wheeler transform
NC.dist(matrix(runif(100), ncol=10), method="bzip2")

shipunov documentation built on Feb. 16, 2023, 9:05 p.m.

shipunov index

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

shipunov
Miscellaneous Functions from Alexey Shipunov

NC.dist: Normalized Compression Distance
In shipunov: Miscellaneous Functions from Alexey Shipunov

Normalized Compression Distance

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Related to NC.dist in shipunov...

R Package Documentation

Browse R Packages

We want your feedback!

shipunov Miscellaneous Functions from Alexey Shipunov

NC.dist: Normalized Compression Distance In shipunov: Miscellaneous Functions from Alexey Shipunov

Normalized Compression Distance

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Related to NC.dist in shipunov...

R Package Documentation

Browse R Packages

We want your feedback!

shipunov
Miscellaneous Functions from Alexey Shipunov

NC.dist: Normalized Compression Distance
In shipunov: Miscellaneous Functions from Alexey Shipunov