An R Package for Density Ratio Estimation

knitr::opts_chunk$set(echo = TRUE, message = FALSE)
library(mvtnorm)

1. Overview

Density ratio estimation is described as follows: for given two data samples $x$ and $y$ from unknown distributions $p(x)$ and $q(y)$ respectively, estimate

$$ w(x) = \frac{p(x)}{q(x)} $$

where $x$ and $y$ are $d$-dimensional real numbers.

The estimated density ratio function $w(x)$ can be used in many applications such as anomaly detection [1] and covariate shift adaptation [2]. Other useful applications about density ratio estimation were summarized by Sugiyama et al. (2012) [3].

The package densratio provides a function densratio(). The function outputs an object that has a function to estimate density ratio.

For example,

set.seed(3)
x <- rnorm(200, mean = 1, sd = 1/8)
y <- rnorm(200, mean = 1, sd = 1/2)

library(densratio)
result <- densratio(x, y)

The function densratio() estimates the density ratio of $p(x)$ to $q(y)$, $$ w(x) = \frac{p(x)}{q(y)} = \frac{\rm{Norm}(1, 1/8)}{\rm{Norm}(1, 1/2)} $$ and provides a function to compute estimated density ratio. The result object has a function compute_density_ratio() that can compute the estimated density ratio $\hat{w}(x) \simeq p(x)/q(y)$ for any $d$-dimensional input $x$ (now $d=1$).

new_x <- seq(0, 2, by = 0.05)
w_hat <- result$compute_density_ratio(new_x)

plot(new_x, w_hat, pch=19)

In this case, the true density ratio $w(x) = p(x)/q(y) = \rm{Norm}(1, 1/8) / \rm{Norm}(1, 1/2)$ can be computed precisely. So we can compare $w(x)$ with the estimated density ratio $\hat{w}(x)$.

true_density_ratio <- function(x) dnorm(x, 1, 1/8) / dnorm(x, 1, 1/2)

plot(true_density_ratio, xlim=c(-1, 3), lwd=2, col="red", xlab = "x", ylab = "Density Ratio")
plot(result$compute_density_ratio, xlim=c(-1, 3), lwd=2, col="green", add=TRUE)
legend("topright", legend=c(expression(w(x)), expression(hat(w)(x))), col=2:3, lty=1, lwd=2, pch=NA)

2. How to Install

You can install the densratio package from CRAN.

install.packages("densratio")

You can also install the package from GitHub.

install.packages("devtools") # If you have not installed "devtools" package
devtools::install_github("hoxo-m/densratio")

The source code for densratio package is available on GitHub at

3. Details

3.1. Basics

The package densratio provides a function densratio(). The function outputs an object that has a function to estimate density ratio.

For data samples x and y,

library(densratio)

x <- rnorm(200, mean = 1, sd = 1/8)
y <- rnorm(200, mean = 1, sd = 1/2)

result <- densratio(x, y)

Here, result$compute_density_ratio() is the function to compute estimated density ratio.

new_x <- seq(0, 2, by = 0.05)
w_hat <- result$compute_density_ratio(new_x)

plot(new_x, w_hat, pch=19)

3.2. Methods

densratio() has method argument that you can pass "uLSIF" or "KLIEP".

The both methods assume that density ratio are represented by linear model

$$ w(x) = \alpha_1 K(x, c_1) + \alpha_2 K(x, c_2) + ... + \alpha_b K(x, c_b) $$

where

$$ K(x, c) = \exp\left(-\frac{\|x - c\|^2}{2 \sigma ^ 2}\right) $$

is the Gaussian RBF.

densratio() performs two main jobs:

$\sigma$ and $\alpha_i$ are saved into result objects of densratio(), and used to compute estimated density ratio in compute_density_ratio().

3.3. Result and Arguments

You can print() result objects of densratio() to see information. Moreover, you can change some conditions to specify arguments of densratio().

print(result)

4. Multi Dimensional Data Samples

So far, the input data samples x and y were one dimensional. densratio() allows to input multidimensional data samples as matrix.

For example,

library(densratio)
library(mvtnorm)

set.seed(71)
x <- rmvnorm(300, mean = c(1, 1), sigma = diag(1/8, 2))
y <- rmvnorm(300, mean = c(1, 1), sigma = diag(1/2, 2))

result <- densratio(x, y)
result

Also in this case, we can compare the true density ratio with the estimated density ratio.

true_density_ratio <- function(x) {
  dmvnorm(x, mean = c(1, 1), sigma = diag(1/8, 2)) /
    dmvnorm(x, mean = c(1, 1), sigma = diag(1/2, 2))
}

N <- 20
range <- seq(0, 2, length.out = N)
input <- expand.grid(range, range)
w_true <- matrix(true_density_ratio(input), nrow = N)
w_hat <- matrix(result$compute_density_ratio(input), nrow = N)

par(mfrow = c(1, 2))
contour(range, range, w_true, main = "True Density Ratio")
contour(range, range, w_hat, main = "Estimated Density Ratio")

The dimensions of x and y must be same.

5. References

[1] Hido, S., Tsuboi, Y., Kashima, H., Sugiyama, M., & Kanamori, T. Statistical outlier detection using direct density ratio estimation. Knowledge and Information Systems 2011.

[2] Sugiyama, M., Nakajima, S., Kashima, H., von B√ľnau, P. & Kawanabe, M. Direct importance estimation with model selection and its application to covariate shift adaptation. NIPS 2007.

[3] Sugiyama, M., Suzuki, T. & Kanamori, T. Density Ratio Estimation in Machine Learning. Cambridge University Press 2012.



Try the densratio package in your browser

Any scripts or data that you put into this service are public.

densratio documentation built on May 9, 2019, 1:03 a.m.