knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "README-" )
LogConcaveDESM
is an \code{R} package to compute and visualize the (penalized) log-concave score matching density estimate.
It also contains functions to plot the first and second derivatives of the log-density estimate.
Furthermore, functions to compute various distances between two probability distributions
are provided to assess the quality of the density estimate.
The development version of the LogConcaveDESM
can be installed from the GitHub using devtools
:
# install.packages("devtools") devtools::install_github("zhoucx1119/LogConcaveDESM")
library(LogConcaveDESM, warn.conflicts = FALSE) library(patchwork) library(ggplot2)
We show how to use functions in LogConcaveDESM
to compute and visualize the log-concave
score matching density estimate over a bounded domain.
We first simulate data from a truncated normal distribution over the interval $[-5, 5]$.
set.seed(2021) N <- 200 parent_mean <- 0 parent_sd <- 1 domain <- c(-5, 5) t_normal <- truncated_normal(parent_mean, parent_sd, domain) data <- t_normal$sampling(N)
We plot the histogram of the data below.
binwidth <- 2 * stats::IQR(data) / N ** (1/3) ggplot() + geom_histogram( data = data.frame(x = data), aes(x = x, y = ..density..), binwidth = binwidth, color = "darkblue", fill = "lightblue") + coord_cartesian(xlim = domain, ylim = c(-0.01, 0.5)) + theme_bw()
We assume that the second derivative of the logarithm of the log-concave score matching density estimate is a piecewise linear function with knots at the data points. With no penalty, the resulting optimal second derivative values at data points can be computed as below:
result <- lcd_scorematching( data = data, domain = domain, penalty_param = 0, verbose = TRUE )
Then, the second derivative of the logarithm of the log-concave score matching density estimate
at arbitrary points can be computed using the evaluate_logdensity_deriv2
function, and
can be visualized using the plot_logdensity_deriv2
function.
logden_deriv2_vals <- evaluate_logdensity_deriv2( scorematching_logconcave = result, newx = seq(domain[1], domain[2], length.out = 11) ) logden_deriv2_vals plot_deriv2 <- plot_logdensity_deriv2( scorematching_logconcave = result, plot_domain = domain, plot_points_cnt = 500) plot_deriv2
Subsequently, the first derivative of the logarithm of the log-concave score matching density estimate
at arbitrary points can be computed using the evaluate_logdensity_deriv1
function, and
can be visualized using the plot_logdensity_deriv1
function.
logden_deriv1_vals <- evaluate_logdensity_deriv1( scorematching_logconcave = result, newx = seq(domain[1], domain[2], length.out = 11) ) logden_deriv1_vals plot_deriv1 <- plot_logdensity_deriv1( scorematching_logconcave = result, plot_domain = domain, plot_points_cnt = 500) plot_deriv1
Then, the (un-normalized) log-density estimate at arbitrary point can be computed
using the evaluate_logdensity
function, and can be visualized using the plot_logdensity
function.
logden_vals <- evaluate_logdensity( scorematching_logconcave = result, newx = seq(domain[1], domain[2], length.out = 11) ) logden_vals plot_logden <- plot_logdensity( scorematching_logconcave = result, plot_domain = domain, plot_points_cnt = 500) plot_logden
Finally, to evaluate and visualize the density estimate itself, we use evaluate_density
and plot_density
functions, respectively.
den_vals <- evaluate_density( scorematching_logconcave = result, newx = seq(domain[1], domain[2], length.out = 11) ) den_vals plot_den <- plot_density( scorematching_logconcave = result, plot_domain = domain, plot_points_cnt = 500, plot_hist = TRUE ) plot_den
As we can see above, the un-penalized log-concave score matching density estimate is
too concentrated at the place where data are abundant. To remedy this, we consider
penalized log-concave score matching density estimate, where the optimal penalty parameter
can be chosen using the cv_optimal_density_estimate
function.
lambda_cand <- exp(seq(-5, 1, by = 0.5)) opt_den <- cv_optimal_density_estimate( data = data, domain = domain, penalty_param_candidates = lambda_cand, fold_number = 5 )
The second and first derivatives of the logarithm of the optimal density estimate, the logarithm of the un-normalized optimal density estimate, and the optimal density estimate itself are plotted as below.
plot_ld2 <- plot_logdensity_deriv2( scorematching_logconcave = opt_den, plot_domain = domain ) plot_ld1 <- plot_logdensity_deriv1( scorematching_logconcave = opt_den, plot_domain = domain ) plot_ld <- plot_logdensity( scorematching_logconcave = opt_den, plot_domain = domain ) plot_den <- plot_density( scorematching_logconcave = opt_den, plot_domain = domain, plot_points_cnt = 500, plot_hist = TRUE ) plot_ld2 + plot_ld1 + plot_ld + plot_den
We can also use the function plot_mle_scorematching
to view the log-concave maximum likelihood and score matching
density estimates together with the histogram.
plot_mle_scorematching( scorematching_logconcave = opt_den, plot_domain = domain, plot_points_cnt = 500, plot_hist = TRUE)
We could assess the quality of this density estimate using various metrics between two probability distributions, for example, the Kullback-Leibler divergence, the Hyvarinen divergence, the $L1$ distance, and the Hellinger distance.
kl <- kl_div( true_density = t_normal, density_estimate = opt_den) hyva <- hyvarinen_div( true_density = t_normal, density_estimate = opt_den) l1 <- L1_dist( true_density = t_normal, density_estimate = opt_den) he <- hellinger_dist( true_density = t_normal, density_estimate = opt_den) metric_table <- cbind( Metrics = c("Kullback-Leibler Divergence", "Hyvarinen divergence", "L1 Distance", "Hellinger Distance"), Values = round(c(kl, hyva, l1, he), 5) ) knitr::kable(metric_table)
Besides the bounded domain demonstrated above, LogConcaveDESM
can also compute and visualize log-concave score matching density estimate
over other types of domain, including the entire real line, an interval of the form $(-\infty, b)$ for some $b < \infty$, and an interval
of the form $(a, \infty)$ for some $a > -\infty$. Functions used are the same as those used in the bounded interval case. Demonstrations are omitted.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.