The fasthplus
R package provides fast approximations for metrics of
discordance or dissimilarity.
The metric G+ was introduced by W. T. Williams in 1971 as a way to measure the discordance or dissimilarity between two different classifications (where the classification consists of distance matrix and a set of predicted labels for each observation).
Here, we introduce the H+, a discordance metric modified from G+. This metric can be used (1) to evaluate the discordance between two arbitrary sets or (2) to evaluate label fitness (clustering) for a generalized dissimilarity matrix.
At present, our package is available only via github installation using the devtools
package.
library(devtools)
install_github(repo="ntdyjack/fasthplus", ref = "main")
After installation, the package can be loaded into R
library(fasthplus)
The main functions in the fasthplus package are hpe()
and hpb()
.
The hpe()
function accepts either (1) two sets (A
and B
) or
(2) a distance matrix (D
) and set of labels (L
).
With additional arguments alg
(algorithm choice) alpha
and gammas
(see vignette).
The hpb()
function accepts (1) a data matrix (D
) and set of labels (L
).
To run the hpe()
function with two sets (A
and B
) and the number of p
+ 1 percentiles:
a <- rnorm(n=500,mean=0)
b <- rnorm(n=500,mean=1)
h <- hpe(A=a,B=b,p=101,alg=1)
To run the hpe()
and hpb()
with D
(dissimilarity or data respectively) and set of labels (L
):
# Two sets
a <- sapply(1:100, function(i) rnorm(n=50,mean=0.0,sd=1))
b <- sapply(1:100, function(i) rnorm(n=50,mean=0.0,sd=1))
x <- t(cbind(a,b))
# Create a set of labels
l <- c(rep(0,100),rep(1,100))
#hpb estimate
hpb(D=x,L=l,t=10,r=10)
# Calculate dissimilarity matrix
d <- dist(x)
#hpe estimate
hpe(D=d,L=l,p=251)
Please use https://github.com/ntdyjack/fasthplus/issues to submit issues, bug reports, and comments.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.