README.md
In zhoucx1119/LogConcaveDESM: Log-Concave Density Estimation Using the Score Matching Loss Function

Log-Concave Density Estimation Using the Score Matching Loss Function

LogConcaveDESM is a package to compute and visualize the (penalized) log-concave score matching density estimate. It also contains functions to plot the first and second derivatives of the log-density estimate. Furthermore, functions to compute various distances between two probability distributions are provided to assess the quality of the density estimate.

The development version of the LogConcaveDESM can be installed from the GitHub using devtools:

# install.packages("devtools")
devtools::install_github("zhoucx1119/LogConcaveDESM")

library(LogConcaveDESM, warn.conflicts = FALSE)
library(patchwork)
library(ggplot2)

We show how to use functions in LogConcaveDESM to compute and visualize the log-concave score matching density estimate over a bounded domain.

We first simulate data from a truncated normal distribution over the interval [−5,5].

set.seed(2021)
N <- 200
parent_mean <- 0
parent_sd <- 1
domain <- c(-5, 5)
t_normal <- truncated_normal(parent_mean, parent_sd, domain)
data <- t_normal$sampling(N)

We plot the histogram of the data below.

binwidth <- 2 * stats::IQR(data) / N ** (1/3)
ggplot() + geom_histogram(
  data = data.frame(x = data), 
  aes(x = x, y = ..density..), 
  binwidth = binwidth,
            color = "darkblue",
            fill = "lightblue") + 
  coord_cartesian(xlim = domain, ylim = c(-0.01, 0.5)) + 
  theme_bw()

We assume that the second derivative of the logarithm of the log-concave score matching density estimate is a piecewise linear function with knots at the data points. With no penalty, the resulting optimal second derivative values at data points can be computed as below:

result <- lcd_scorematching(
  data = data, 
  domain = domain, 
  penalty_param = 0, 
  verbose = TRUE
)
#> -----------------------------------------------------------------
#>            OSQP v0.6.0  -  Operator Splitting QP Solver
#>               (c) Bartolomeo Stellato,  Goran Banjac
#>         University of Oxford  -  Stanford University 2019
#> -----------------------------------------------------------------
#> problem:  variables n = 202, constraints m = 203
#>           nnz(P) + nnz(A) = 20504
#> settings: linear system solver = qdldl,
#>           eps_abs = 1.0e-05, eps_rel = 1.0e-05,
#>           eps_prim_inf = 1.0e-04, eps_dual_inf = 1.0e-04,
#>           rho = 1.00e-01 (adaptive),
#>           sigma = 1.00e-06, alpha = 1.60, max_iter = 10000
#>           check_termination: on (interval 25),
#>           scaling: on, scaled_termination: off
#>           warm start: on, polish: on, time_limit: off
#> 
#> iter   objective    pri res    dua res    rho        time
#>    1  -4.3429e+00   1.35e+01   4.63e-03   1.00e-01   2.43e-03s
#>  200  -6.7500e+01   2.79e+00   1.88e-03   1.98e-02   1.06e-02s
#>  400  -7.1064e+01   4.32e+00   4.59e-04   3.81e-03   1.85e-02s
#>  600  -6.5517e+01   1.60e+00   1.22e-04   3.81e-03   2.57e-02s
#>  800  -6.7135e+01   4.42e-01   3.10e-05   3.81e-03   3.31e-02s
#> 1000  -6.6824e+01   1.01e-01   6.51e-06   3.81e-03   4.00e-02s
#> plsh  -6.6874e+01   4.27e-10   3.14e-09   --------   4.16e-02s
#> 
#> status:               solved
#> solution polish:      successful
#> number of iterations: 1000
#> optimal objective:    -66.8737
#> run time:             4.16e-02s
#> optimal rho estimate: 1.93e-03
#> The status of solving the constrained quadratic optimization problem is: optimal.

Then, the second derivative of the logarithm of the log-concave score matching density estimate at arbitrary points can be computed using the evaluate_logdensity_deriv2 function, and can be visualized using the plot_logdensity_deriv2 function.

logden_deriv2_vals <- evaluate_logdensity_deriv2(
  scorematching_logconcave = result, 
  newx = seq(domain[1], domain[2], length.out = 11)
)
logden_deriv2_vals
#>    newx_sorted logderiv2_vals
#> 1           -5  -1.488096e-23
#> 2           -4  -7.214839e+01
#> 3           -3  -1.442968e+02
#> 4           -2   2.368584e-13
#> 5           -1   2.606746e-12
#> 6            0  -1.251196e-10
#> 7            1   1.092312e-10
#> 8            2  -5.711015e-11
#> 9            3  -7.400681e-12
#> 10           4  -3.700341e-12
#> 11           5   0.000000e+00

plot_deriv2 <- plot_logdensity_deriv2(
  scorematching_logconcave = result, 
  plot_domain = domain, 
  plot_points_cnt = 500)
plot_deriv2

Subsequently, the first derivative of the logarithm of the log-concave score matching density estimate at arbitrary points can be computed using the evaluate_logdensity_deriv1 function, and can be visualized using the plot_logdensity_deriv1 function.

logden_deriv1_vals <- evaluate_logdensity_deriv1(
  scorematching_logconcave = result, 
  newx = seq(domain[1], domain[2], length.out = 11)
)
logden_deriv1_vals
#>    newx_sorted logderiv1_vals
#> 1           -5     294.125904
#> 2           -4     258.051708
#> 3           -3     149.829119
#> 4           -2      13.667716
#> 5           -1       9.167263
#> 6            0       5.598630
#> 7            1     -13.979790
#> 8            2     -33.035110
#> 9            3     -33.035110
#> 10           4     -33.035110
#> 11           5     -33.035110

plot_deriv1 <- plot_logdensity_deriv1(
  scorematching_logconcave = result, 
  plot_domain = domain, 
  plot_points_cnt = 500)
plot_deriv1

Then, the (un-normalized) log-density estimate at arbitrary point can be computed using the evaluate_logdensity function, and can be visualized using the plot_logdensity function.

logden_vals <- evaluate_logdensity(
  scorematching_logconcave = result, 
  newx = seq(domain[1], domain[2], length.out = 11)
)
logden_vals
#>    newx_sorted logdensity_vals
#> 1           -5          0.0000
#> 2           -4        282.1012
#> 3           -3        492.0540
#> 4           -2        562.3994
#> 5           -1        574.0298
#> 6            0        580.8876
#> 7            1        575.0199
#> 8            2        553.7940
#> 9            3        520.7589
#> 10           4        487.7238
#> 11           5        454.6887

plot_logden <- plot_logdensity(
  scorematching_logconcave = result, 
  plot_domain = domain, 
  plot_points_cnt = 500)
plot_logden

Finally, to evaluate and visualize the density estimate itself, we use evaluate_density and plot_density functions, respectively.

den_vals <- evaluate_density(
  scorematching_logconcave = result, 
  newx = seq(domain[1], domain[2], length.out = 11)
)
den_vals
#>    newx_sorted  density_vals
#> 1           -5 5.718881e-253
#> 2           -4 1.871947e-130
#> 3           -3  2.842021e-39
#> 4           -2  1.009840e-08
#> 5           -1  1.135730e-03
#> 6            0  1.080366e+00
#> 7            1  3.056907e-03
#> 8            2  1.849261e-12
#> 9            3  8.318257e-27
#> 10           4  3.741679e-41
#> 11           5  1.683064e-55

plot_den <- plot_density(
  scorematching_logconcave = result,
  plot_domain = domain,
  plot_points_cnt = 500, 
  plot_hist = TRUE
  )
plot_den

As we can see above, the un-penalized log-concave score matching density estimate is too concentrated at the place where data are abundant. To remedy this, we consider penalized log-concave score matching density estimate, where the optimal penalty parameter can be chosen using the cv_optimal_density_estimate function.

lambda_cand <- exp(seq(-5, 1, by = 0.5))
opt_den <- cv_optimal_density_estimate(
  data = data, 
  domain = domain, 
  penalty_param_candidates = lambda_cand, 
  fold_number = 5
)
#> Penalty parameter value: 0.00673794699908547
#> The status of solving the constrained quadratic optimization problem is: optimal. 
#> The status of solving the constrained quadratic optimization problem is: optimal. 
#> The status of solving the constrained quadratic optimization problem is: optimal. 
#> The status of solving the constrained quadratic optimization problem is: optimal. 
#> The status of solving the constrained quadratic optimization problem is: optimal.
#> Penalty parameter value: 0.0111089965382423
#> The status of solving the constrained quadratic optimization problem is: optimal. 
#> The status of solving the constrained quadratic optimization problem is: optimal. 
#> The status of solving the constrained quadratic optimization problem is: optimal. 
#> The status of solving the constrained quadratic optimization problem is: optimal. 
#> The status of solving the constrained quadratic optimization problem is: optimal.
#> Penalty parameter value: 0.0183156388887342
#> The status of solving the constrained quadratic optimization problem is: optimal. 
#> The status of solving the constrained quadratic optimization problem is: optimal. 
#> The status of solving the constrained quadratic optimization problem is: optimal. 
#> The status of solving the constrained quadratic optimization problem is: optimal. 
#> The status of solving the constrained quadratic optimization problem is: optimal.
#> Penalty parameter value: 0.0301973834223185
#> The status of solving the constrained quadratic optimization problem is: optimal. 
#> The status of solving the constrained quadratic optimization problem is: optimal. 
#> The status of solving the constrained quadratic optimization problem is: optimal. 
#> The status of solving the constrained quadratic optimization problem is: optimal. 
#> The status of solving the constrained quadratic optimization problem is: optimal.
#> Penalty parameter value: 0.0497870683678639
#> The status of solving the constrained quadratic optimization problem is: optimal. 
#> The status of solving the constrained quadratic optimization problem is: optimal. 
#> The status of solving the constrained quadratic optimization problem is: optimal. 
#> The status of solving the constrained quadratic optimization problem is: optimal. 
#> The status of solving the constrained quadratic optimization problem is: optimal.
#> Penalty parameter value: 0.0820849986238988
#> The status of solving the constrained quadratic optimization problem is: optimal. 
#> The status of solving the constrained quadratic optimization problem is: optimal. 
#> The status of solving the constrained quadratic optimization problem is: optimal. 
#> The status of solving the constrained quadratic optimization problem is: optimal. 
#> The status of solving the constrained quadratic optimization problem is: optimal.
#> Penalty parameter value: 0.135335283236613
#> The status of solving the constrained quadratic optimization problem is: optimal. 
#> The status of solving the constrained quadratic optimization problem is: optimal. 
#> The status of solving the constrained quadratic optimization problem is: optimal. 
#> The status of solving the constrained quadratic optimization problem is: optimal. 
#> The status of solving the constrained quadratic optimization problem is: optimal.
#> Penalty parameter value: 0.22313016014843
#> The status of solving the constrained quadratic optimization problem is: optimal. 
#> The status of solving the constrained quadratic optimization problem is: optimal. 
#> The status of solving the constrained quadratic optimization problem is: optimal. 
#> The status of solving the constrained quadratic optimization problem is: optimal. 
#> The status of solving the constrained quadratic optimization problem is: optimal.
#> Penalty parameter value: 0.367879441171442
#> The status of solving the constrained quadratic optimization problem is: optimal. 
#> The status of solving the constrained quadratic optimization problem is: optimal. 
#> The status of solving the constrained quadratic optimization problem is: optimal. 
#> The status of solving the constrained quadratic optimization problem is: optimal. 
#> The status of solving the constrained quadratic optimization problem is: optimal.
#> Penalty parameter value: 0.606530659712633
#> The status of solving the constrained quadratic optimization problem is: optimal. 
#> The status of solving the constrained quadratic optimization problem is: optimal. 
#> The status of solving the constrained quadratic optimization problem is: optimal. 
#> The status of solving the constrained quadratic optimization problem is: optimal. 
#> The status of solving the constrained quadratic optimization problem is: optimal.
#> Penalty parameter value: 1
#> The status of solving the constrained quadratic optimization problem is: optimal. 
#> The status of solving the constrained quadratic optimization problem is: optimal. 
#> The status of solving the constrained quadratic optimization problem is: optimal. 
#> The status of solving the constrained quadratic optimization problem is: optimal. 
#> The status of solving the constrained quadratic optimization problem is: optimal.
#> Penalty parameter value: 1.64872127070013
#> The status of solving the constrained quadratic optimization problem is: optimal. 
#> The status of solving the constrained quadratic optimization problem is: optimal. 
#> The status of solving the constrained quadratic optimization problem is: optimal. 
#> The status of solving the constrained quadratic optimization problem is: optimal. 
#> The status of solving the constrained quadratic optimization problem is: optimal.
#> Penalty parameter value: 2.71828182845905
#> The status of solving the constrained quadratic optimization problem is: optimal. 
#> The status of solving the constrained quadratic optimization problem is: optimal. 
#> The status of solving the constrained quadratic optimization problem is: optimal. 
#> The status of solving the constrained quadratic optimization problem is: optimal. 
#> The status of solving the constrained quadratic optimization problem is: optimal.
#> Optimal penality parameter is 0.22313016014843.
#> The status of solving the constrained quadratic optimization problem is: optimal.

The second and first derivatives of the logarithm of the optimal density estimate, the logarithm of the un-normalized optimal density estimate, and the optimal density estimate itself are plotted as below.

plot_ld2 <- plot_logdensity_deriv2(
  scorematching_logconcave = opt_den,
  plot_domain = domain
  )

plot_ld1 <- plot_logdensity_deriv1(
  scorematching_logconcave = opt_den,
  plot_domain = domain
  )

plot_ld <- plot_logdensity(
  scorematching_logconcave = opt_den,
  plot_domain = domain
  )

plot_den <- plot_density(
  scorematching_logconcave = opt_den,
  plot_domain = domain,
  plot_points_cnt = 500, 
  plot_hist = TRUE
  )

plot_ld2 + plot_ld1 + plot_ld + plot_den

We can also use the function plot_mle_scorematching to view the log-concave maximum likelihood and score matching density estimates together with the histogram.

plot_mle_scorematching(
  scorematching_logconcave = opt_den, 
  plot_domain = domain, 
  plot_points_cnt = 500, 
  plot_hist = TRUE)

We could assess the quality of this density estimate using various metrics between two probability distributions, for example, the Kullback-Leibler divergence, the Hyvarinen divergence, the L1 distance, and the Hellinger distance.

kl <- kl_div(
  true_density = t_normal, 
  density_estimate = opt_den) 

hyva <- hyvarinen_div(
  true_density = t_normal, 
  density_estimate = opt_den) 

l1 <- L1_dist(
  true_density = t_normal, 
  density_estimate = opt_den) 

he <- hellinger_dist(
  true_density = t_normal, 
  density_estimate = opt_den) 

metric_table <- cbind(
  Metrics = c("Kullback-Leibler Divergence", "Hyvarinen divergence", 
              "L1 Distance", "Hellinger Distance"), 
  Values = round(c(kl, hyva, l1, he), 5)
)

knitr::kable(metric_table)

| Metrics | Values | |:----------------------------|:--------| | Kullback-Leibler Divergence | 0.09961 | | Hyvarinen divergence | 0.10237 | | L1 Distance | 0.29509 | | Hellinger Distance | 0.1774 |

Besides the bounded domain demonstrated above, LogConcaveDESM can also compute and visualize log-concave score matching density estimate over other types of domain, including the entire real line, an interval of the form (−∞,b) for some b \< ∞, and an interval of the form (a,∞) for some a > − ∞. Functions used are the same as those used in the bounded interval case. Demonstrations are omitted.

zhoucx1119/LogConcaveDESM documentation built on Aug. 28, 2024, 3:25 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

zhoucx1119/LogConcaveDESM
Log-Concave Density Estimation Using the Score Matching Loss Function

README.md
In zhoucx1119/LogConcaveDESM: Log-Concave Density Estimation Using the Score Matching Loss Function

Log-Concave Density Estimation Using the Score Matching Loss Function

Installation

Load the Package

Computing and Visualizing Unpenalized Log-concave Score Matching Density Estimate over a Bounded Domain

Computing and Visualizing Penalized Log-concave Score Matching Density Estimate over a Bounded Domain

Computing and Visualizing Penalized Log-concave Score Matching Density Estimate over Other Types of Domain

R Package Documentation

Browse R Packages

We want your feedback!

zhoucx1119/LogConcaveDESM Log-Concave Density Estimation Using the Score Matching Loss Function

README.md In zhoucx1119/LogConcaveDESM: Log-Concave Density Estimation Using the Score Matching Loss Function

Log-Concave Density Estimation Using the Score Matching Loss Function

Installation

Load the Package

Computing and Visualizing Unpenalized Log-concave Score Matching Density Estimate over a Bounded Domain

Computing and Visualizing Penalized Log-concave Score Matching Density Estimate over a Bounded Domain

Computing and Visualizing Penalized Log-concave Score Matching Density Estimate over Other Types of Domain

R Package Documentation

Browse R Packages

We want your feedback!

zhoucx1119/LogConcaveDESM
Log-Concave Density Estimation Using the Score Matching Loss Function

README.md
In zhoucx1119/LogConcaveDESM: Log-Concave Density Estimation Using the Score Matching Loss Function