scorematchingtheory: Introduction to Score Matching
In scorematchingad: Score Matching Estimation by Automatic Differentiation

scorematchingtheory

R Documentation

Introduction to Score Matching

Description

This package includes score matching estimators for particular distributions and a general capacity to implement additional score matching estimators. Score matching is a popular estimation technique when normalising constants for the proposed model are difficult to calculate or compute. Score matching was first developed by \insertCitehyvarinen2005es;textualscorematchingad and was further developed for subsets of Euclidean space \insertCitehyvarinen2007ex,yu2019ge,yu2020ge,liu2019esscorematchingad, Riemannian manifolds \insertCitemardia2016sc,mardia2018nescorematchingad, and Riemannian manifolds with boundary \insertCitescealy2023scscorematchingad. In this help entry we briefly describe score matching estimation.

Score Matching in General

In the most general form (Riemannian manifolds with boundary) score matching minimises the weighted Hyvärinen divergence \insertCite@Equation 7, @scealy2023scscorematchingad

\phi(f, f_0) = \frac{1}{2} \int_M f_0(z)h(z)^2 \left\lVert P(z)\Big(\nabla_z \log(f) - \nabla_z \log(f_0)\Big)\right\rVert^2 dM(z),

where

M is the manifold, isometrically embedded in Euclidean space, and dM(z) is the unnormalised uniform measure on M.
P(z) is the matrix that projects points onto the tangent space of the manifold at z, which is closely related to to Riemannian metric of M.
f_0 is the density of the data-generating process, defined with respect to dM(z).
f is the density of a posited model, again defined with respect to dM(z).
h(z) is a function, termed the boundary weight function, that is zero on the boundary of M and smooth \insertCite@Section 3.2, @scealy2023scscorematchingad or potentially piecewise smooth.
\nabla_z is the Euclidean gradient operator that differentiates with respect to z.
\lVert \cdot \rVert is the Euclidean norm.

Note that, because P(z) is the projection matrix, \left\lVert P(z)\Big(\nabla_z \log(f) - \nabla_z \log(f_0)\Big)\right\rVert^2 is the natural inner product of the gradient of the log ratio of f and f_0.

When the density functions f and f_0 are smooth and positive inside M, and the boundary weight function is smooth or of particular forms for specific manifolds \insertCite@Section 3.2, @scealy2023scscorematchingad, then minimising the weighted Hyvärinen divergence \phi(f, f_0) is equivalent to minimising the score matching discrepancy \insertCite@Theorem 1, @scealy2023scscorematchingad

\psi(f, f_0) = \int f_0(z)\big(A(z) + B(z) + C(z)\big)dM(z),

where

A(z) = \frac{1}{2} h(z)^2 \left(\nabla_z \log(f)\right)^T P(z) \left(\nabla_z \log(f)\right),

B(z) = h(z)^2\Delta_z\log(f),

C(z) = \left(\nabla_z h(z)^2)\right)^T P(z) \left(\nabla_z \log(f)\right),

and \Delta is the Laplacian operator. We term

A(z) + B(z) + C(z)

the score matching discrepancy function.

We suspect that \insertCite@Theorem 1, @scealy2023scscorematchingad holds more generally for nearly all realistic continuous and piecewise-smooth boundary weight functions, although no proof exists to our knowledge.

When n independent observations from f_0 are available, the integration in \psi(f, f_0) can be approximated by an average over the observations,

\psi(f, f_0) \approx \hat\psi(f, f_0) = \frac{1}{n} \sum_{i = 1}^n A(z_i) + B(z_i) + C(z_i).

If we parameterise a family of models f_\theta according to a vector of parameters \theta, then the score matching estimate is the \theta that minimises \hat\psi(f_\theta, f_0). In general, the score matching estimate must be found via numerical optimisation techniques, such as in the function cppad_search(). However, when the family of models is a canonical exponential family then often \hat\psi(f_\theta, f_0) and the score matching discrepancy function is a quadratic function of \theta \insertCitemardia2018nescorematchingad and the minimum has a closed-form solution found by cppad_closed().

Note that when M has a few or more dimensions, the calculations of A(z), B(z) and C(z) can become cumbersome. This package uses CppAD to automatically compute A(z), B(z) and C(z), and the quadratic simplification if it exists.

Transformations

Hyvärinen divergence \phi(f, f_0) is sensitive to transformations of the measure dM(z), including transforming the manifold. That is, transforming the manifold M changes the divergence between distributions and changes the minimum of \hat\psi(f_\theta, f_0). The transformation changes measure dM(z), the divergence and the estimator but does not transform the data.

For example, many different transformations of the simplex (i.e. compositional data) are possible \insertCite@Appendix A.3, @scealy2024roscorematchingad. Hyvärinen divergences that use the sphere, obtained from the simplex by a square root, have different behaviour to Hyvärinen divergence using Euclidean space obtained from the simplex using logarithms \insertCitescealy2024roscorematchingad. The estimator for the latter does not apply logarithms to the observations, in fact the estimator involves only polynomials of the observed compositions \insertCitescealy2024roscorematchingad.

The variety of estimator behaviour available through different transformations was a major motivator for this package as each transformation has different A(z), B(z) and C(z), and without automatic differentiation, implementation of the score matching estimator in each case would require a huge programming effort.