# An R Package for Density Ratio Estimation In densratio: Density Ratio Estimation

knitr::opts_chunk$set(echo = TRUE, message = FALSE) library(mvtnorm)  ## 1. Overview Density ratio estimation is described as follows: for given two data samples$x$and$y$from unknown distributions$p(x)$and$q(y)$respectively, estimate $$w(x) = \frac{p(x)}{q(x)}$$ where$x$and$y$are$d$-dimensional real numbers. The estimated density ratio function$w(x)$can be used in many applications such as anomaly detection [1] and covariate shift adaptation [2]. Other useful applications about density ratio estimation were summarized by Sugiyama et al. (2012) [3]. The package densratio provides a function densratio(). The function outputs an object that has a function to estimate density ratio. For example, set.seed(3) x <- rnorm(200, mean = 1, sd = 1/8) y <- rnorm(200, mean = 1, sd = 1/2) library(densratio) result <- densratio(x, y)  The function densratio() estimates the density ratio of$p(x)$to$q(y)$, $$w(x) = \frac{p(x)}{q(y)} = \frac{\rm{Norm}(1, 1/8)}{\rm{Norm}(1, 1/2)}$$ and provides a function to compute estimated density ratio. The result object has a function compute_density_ratio() that can compute the estimated density ratio$\hat{w}(x) \simeq p(x)/q(y)$for any$d$-dimensional input$x$(now$d=1$). new_x <- seq(0, 2, by = 0.05) w_hat <- result$compute_density_ratio(new_x)

plot(new_x, w_hat, pch=19)


In this case, the true density ratio $w(x) = p(x)/q(y) = \rm{Norm}(1, 1/8) / \rm{Norm}(1, 1/2)$ can be computed precisely. So we can compare $w(x)$ with the estimated density ratio $\hat{w}(x)$.

true_density_ratio <- function(x) dnorm(x, 1, 1/8) / dnorm(x, 1, 1/2)

plot(true_density_ratio, xlim=c(-1, 3), lwd=2, col="red", xlab = "x", ylab = "Density Ratio")
plot(result$compute_density_ratio, xlim=c(-1, 3), lwd=2, col="green", add=TRUE) legend("topright", legend=c(expression(w(x)), expression(hat(w)(x))), col=2:3, lty=1, lwd=2, pch=NA)  ## 2. How to Install You can install the densratio package from CRAN. install.packages("densratio")  You can also install the package from GitHub. install.packages("devtools") # If you have not installed "devtools" package devtools::install_github("hoxo-m/densratio")  The source code for densratio package is available on GitHub at • https://github.com/hoxo-m/densratio. ## 3. Details ### 3.1. Basics The package densratio provides a function densratio(). The function outputs an object that has a function to estimate density ratio. For data samples x and y, library(densratio) x <- rnorm(200, mean = 1, sd = 1/8) y <- rnorm(200, mean = 1, sd = 1/2) result <- densratio(x, y)  Here, result$compute_density_ratio() is the function to compute estimated density ratio.

new_x <- seq(0, 2, by = 0.05)
w_hat <- result$compute_density_ratio(new_x) plot(new_x, w_hat, pch=19)  ### 3.2. Methods densratio() has method argument that you can pass "uLSIF" or "KLIEP". • uLSIF (unconstrained Least-Squares Importance Fitting) is the default method. This algorithm estimates density ratio by minimizing the squared loss. You can find more information in Hido et al. (2011) [1]. • KLIEP (Kullback-Leibler Importance Estimation Procedure) is the anothor method. This algorithm estimates density ratio by minimizing Kullback-Leibler divergence. You can find more information in Sugiyama et al. (2007) [2]. The both methods assume that density ratio are represented by linear model $$w(x) = \alpha_1 K(x, c_1) + \alpha_2 K(x, c_2) + ... + \alpha_b K(x, c_b)$$ where $$K(x, c) = \exp\left(-\frac{\|x - c\|^2}{2 \sigma ^ 2}\right)$$ is the Gaussian RBF. densratio() performs two main jobs: • First, decide kernel parameter$\sigma$by cross validation. • Second, find the optimal kernel weights$\alpha_i$(in other words, find the optimal coefficients of the linear model).$\sigma$and$\alpha_i$are saved into result objects of densratio(), and used to compute estimated density ratio in compute_density_ratio(). ### 3.3. Result and Arguments You can print() result objects of densratio() to see information. Moreover, you can change some conditions to specify arguments of densratio(). print(result)  • Kernel type is fixed by Gaussian RBF. • Number of kernels is the number of kernels in the linear model. You can change by setting kernel_num argument. In default, kernel_num = 100. • Bandwidth(sigma) is the Gaussian kernel bandwidth. In default, sigma = "auto", the algorithms automatically select the optimal value by cross validation. If you set sigma a single number, it will be used. If you set a numeric vector, the algorithms select the optimal value in them by cross validation. • Centers are centers of Gaussian kernels in the linear model. These are selected at random from the data sample x underlying a numerator distribution$p(x)$. You can find the whole values in result$kernel_info$centers. • Kernel weights are the alpha parameters in the linear model. They have been optimaized by the algorithms. You can find the whole values in result$alpha.
• The funtion to estimate density ratio is named compute_density_ratio().

## 4. Multi Dimensional Data Samples

So far, the input data samples x and y were one dimensional. densratio() allows to input multidimensional data samples as matrix.

For example,

library(densratio)
library(mvtnorm)

set.seed(71)
x <- rmvnorm(300, mean = c(1, 1), sigma = diag(1/8, 2))
y <- rmvnorm(300, mean = c(1, 1), sigma = diag(1/2, 2))

result <- densratio(x, y)
result


Also in this case, we can compare the true density ratio with the estimated density ratio.

true_density_ratio <- function(x) {
dmvnorm(x, mean = c(1, 1), sigma = diag(1/8, 2)) /
dmvnorm(x, mean = c(1, 1), sigma = diag(1/2, 2))
}

N <- 20
range <- seq(0, 2, length.out = N)
input <- expand.grid(range, range)
w_true <- matrix(true_density_ratio(input), nrow = N)
w_hat <- matrix(result\$compute_density_ratio(input), nrow = N)

par(mfrow = c(1, 2))
contour(range, range, w_true, main = "True Density Ratio")
contour(range, range, w_hat, main = "Estimated Density Ratio")


The dimensions of x and y must be same.

## 5. References

[1] Hido, S., Tsuboi, Y., Kashima, H., Sugiyama, M., & Kanamori, T. Statistical outlier detection using direct density ratio estimation. Knowledge and Information Systems 2011.

[2] Sugiyama, M., Nakajima, S., Kashima, H., von Bünau, P. & Kawanabe, M. Direct importance estimation with model selection and its application to covariate shift adaptation. NIPS 2007.

[3] Sugiyama, M., Suzuki, T. & Kanamori, T. Density Ratio Estimation in Machine Learning. Cambridge University Press 2012.

## Try the densratio package in your browser

Any scripts or data that you put into this service are public.

densratio documentation built on May 9, 2019, 1:03 a.m.