QuadratiK-package | R Documentation |
Collection of Methods Constructed using the Kernel-Based Quadratic Distances
QuadratiK
provides the first implementation, in R and Python, of a
comprehensive set of goodness-of-fit tests and a clustering technique for
d
-dimensional spherical data d \ge 2
using kernel-based quadratic
distances. It includes:
Goodness-of-Fit Tests: The software implements one, two, and
k-sample tests for goodness of fit, offering an efficient and
mathematically sound way to assess the fit of probability distributions.
Our tests are
particularly useful for large, high dimensional data sets where the
assessment of fit of probability models is of interest. Specifically, we
offer tests for normality, as well as two- and k-sample tests, where
testing equality of two or more distributions is of interest, that is
H_0: F_1 = F_2
and H_0: F_1 = \ldots = F_k
respectively.
The proposed tests perform well in terms of level and power for contiguous
alternatives, heavy tailed distributions and in higher dimensions.
Expanded capabilities include supporting tests for uniformity on the
d-dimensional Sphere based on the Poisson kernel, exhibiting excellent
results especially in the case of multimodal distributions.
Poisson kernel-based distribution (PKBD): the package offers
functions for computing the density value and for generating random samples
from a PKBD. The Poisson kernel-based densities are based on the normalized
Poisson kernel and are defined on the d
-dimensional unit sphere.
Given a vector \mu \in \mathcal{S}^{d-1}
, and a parameter \rho
such that 0 < \rho < 1
, the probability density function of a
d
-variate Poisson kernel-based density is defined by:
f(\mathbf{x}|\rho, \mathbf{\mu}) = \frac{1-\rho^2}{\omega_d
||\mathbf{x} - \rho \mathbf{\mu}||^d},
where \mu
is a vector orienting the center of the distribution,
\rho
is a parameter to control the concentration of the distribution
around the vector \mu
and it is related to the variance of the
distribution. Furthermore, \omega_d = 2\pi^{d/2} [\Gamma(d/2)]^{-1}
is the surface area of the unit sphere in \mathbb{R}^d
(see Golzy and Markatou, 2020).
Clustering Algorithm for Spherical Data: the package incorporates a
unique clustering algorithm specifically tailored for d
-dimensional
spherical data and it is especially useful in the presence of noise in the
data and the presence of non-negligible overlap between clusters. This
algorithm leverages a mixture of Poisson kernel-based densities on the
Sphere, enabling effective clustering of spherical data or data that has
been spherically transformed.
Additional Features: Alongside these functionalities, the software includes additional graphical functions, aiding users in validating and representing the cluster results as well as enhancing the interpretability and usability of the analysis.
For an introduction to QuadratiK
see the vignette
Introduction to the QuadratiK Package.
The work has been supported by Kaleida Health Foundation and the National Science Foundation.
The QuadratiK
package is also available in Python on PyPI
https://pypi.org/project/QuadratiK/ and also as a Dashboard application.
Usage instruction for the Dashboard can be found at
https://quadratik.readthedocs.io/en/latest/user_guide/dashboard_application_usage.html.
Giovanni Saraceno, Marianthi Markatou, Raktim Mukhopadhyay, Mojgan Golzy
Maintainer: Giovanni Saraceno gsaracen@buffalo.edu
Saraceno, G., Markatou, M., Mukhopadhyay, R. and Golzy, M. (2024). Goodness-of-Fit and Clustering of Spherical Data: the QuadratiK package in R and Python. arXiv preprint arXiv:2402.02290.
Ding, Y., Markatou, M. and Saraceno, G. (2023). “Poisson Kernel-Based Tests for Uniformity on the d-Dimensional Sphere.” Statistica Sinica. doi: doi:10.5705/ss.202022.0347.
Golzy, M. and Markatou, M. (2020) Poisson Kernel-Based Clustering on the Sphere: Convergence Properties, Identifiability, and a Method of Sampling, Journal of Computational and Graphical Statistics, 29:4, 758-770, DOI: 10.1080/10618600.2020.1740713.
Markatou, M. and Saraceno, G. (2024). “A Unified Framework for
Multivariate Two- and k-Sample Kernel-based Quadratic Distance
Goodness-of-Fit Tests.”
https://doi.org/10.48550/arXiv.2407.16374
Useful links:
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.