lg_main: Create an 'lg' object

Description Usage Arguments Details References Examples

View source: R/main_function.R

Description

Create an lg-object, that can be used to estimate local Gaussian correlations, unconditional and conditional densities, local partial correlation and for testing purposes.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
lg_main(
  x,
  bw_method = "plugin",
  est_method = "1par",
  transform_to_marginal_normality = TRUE,
  bw = NULL,
  plugin_constant_marginal = 1.75,
  plugin_constant_joint = 1.75,
  plugin_exponent_marginal = -1/5,
  plugin_exponent_joint = -1/6,
  tol_marginal = 10^(-3),
  tol_joint = 10^(-3)
)

Arguments

x

A matrix or data frame with data, on column per variable, one row per observation.

bw_method

The method used for bandwidth selection. Must be either "cv" (cross-validation, slow, but accurate) or "plugin" (fast, but crude).

est_method

The estimation method, must be either "1par", "5par", "5par_marginals_fixed" or "trivariate". (see details).

transform_to_marginal_normality

Logical, TRUE if we want to transform our data to marginal standard normality. This is assumed by method "1par", but can of course be skipped using this argument if it has been done already.

bw

Bandwidth object if it has already been calculated.

plugin_constant_marginal

The constant c in cn^a used for finding the plugin bandwidth for locally Gaussian marginal density estimates, which we need if estimation method is "5par_marginals_fixed".

plugin_constant_joint

The constant c in cn^a used for finding the plugin bandwidth for estimating the pairwise local Gaussian correlation between two variables.

plugin_exponent_marginal

The constant a in cn^a used for finding the plugin bandwidth for locally Gaussian marginal density estimates, which we need if estimation method is "5par_marginals_fixed".

plugin_exponent_joint

The constant a in cn^a used for finding the plugin bandwidth for estimating the pairwise local Gaussian correlation between two variables.

tol_marginal

The absolute tolerance in the optimization for finding the marginal bandwidths, passed on to the optim-function.

tol_joint

The absolute tolerance in the optimization for finding the joint bandwidths. Passed on to the optim-function.

Details

This is the main function in the package. It lets the user supply a data set and set a number of options, which is then used to prepare an lg object that can be supplied to other functions in the package, such as dlg (density estimation), clg (conditional density estimation). The details has been laid out in Otneim & Tjøstheim (2017) and Otneim & Tjøstheim (2018).

The papers mentioned above deal with the estimation of multivariate density functions and conditional density functions. The idea is to fit a multivariate Normal locally to the unknown density function by first transforming the data to marginal standard normality, and then estimate the local correlations pairwise. The local means and local standard deviations are held fixed and constantly equal to 0 and 1 respectively to reflect the knowledge that the marginals are approximately standard normal. Use est_method = "1par" for this strategy, which means that we only estimate one local parameter (the correlation) for each pair, and note that this method requires marginally standard normal data. If est_method = "1par" and transform_to_marginal_normality = FALSE the function will throw a warning. It might be okay though, if you know that the data are marginally standard normal already.

The second option is est_method = "5par_marginals_fixed" which is more flexible than "1par". This method will estimate univariate local Gaussian fits to each marginal, thus producing local estimates of the local means: μ_i(x_i) and σ_i(x_i) that will be held fixed in the next step when the pairwise local correlations are estimated. This method can in many situations provide a better fit, even if the marginals are standard normal. It also opens up for creating a multivariate locally Gaussian fit to any density without having to transform the marginals if you for some reason want to avoid that.

The third option is est_method = "5par", which is a full nonparametric locally Gaussian fit of a bivariate density as laid out and used by Tjøstheim & Hufthammer (2013) and others. This is simply a wrapper for the localgauss-package by Berentsen et.al. (2014).

A recent option is described by Otneim and Tjøstheim (2019), who allow a full trivariate fit to a three dimensional data set that is transformed to marginal standard normality in the context of their test for conditional independence (see ?ci_test for details), but this can of course be used as an option to estimate three-variate density functions as well.

References

Berentsen, Geir Drage, Tore Selland Kleppe, and Dag Tjøstheim. "Introducing localgauss, an R package for estimating and visualizing local Gaussian correlation." Journal of Statistical Software 56.1 (2014): 1-18.

Hufthammer, Karl Ove, and Dag Tjøstheim. "Local Gaussian Likelihood and Local Gaussian Correlation" PhD Thesis of Karl Ove Hufthammer, University of Bergen, 2009.

Otneim, Håkon, and Dag Tjøstheim. "The locally gaussian density estimator for multivariate data." Statistics and Computing 27, no. 6 (2017): 1595-1616.

Otneim, Håkon, and Dag Tjøstheim. "Conditional density estimation using the local Gaussian correlation" Statistics and Computing 28, no. 2 (2018): 303-321.

Otneim, Håkon, and Dag Tjøstheim. "The local Gaussian partial correlation" Working paper (2019).

Tjøstheim, D., & Hufthammer, K. O. (2013). Local Gaussian correlation: a new measure of dependence. Journal of Econometrics, 172(1), 33-48.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
  x <- cbind(rnorm(100), rnorm(100), rnorm(100))

  # Quick example
  lg_object1 <- lg_main(x, bw_method = "plugin", est_method = "1par")

  # In the simulation experiments in Otneim & Tjøstheim (2017a),
  # the cross-validation bandwidth selection is used:
  ## Not run: 
  lg_object2 <- lg_main(x, bw_method = "cv", est_method = "1par")
  
## End(Not run)

  # If you do not wish to transform the data to standard normality,
  # use the five parameter fit:
  lg_object3 <- lg_main(x, est_method = "5par_marginals_fixed",
                  transform_to_marginal_normality = FALSE)

  # In the bivariate case, you can use the full nonparametric fit:
  x_biv <- cbind(rnorm(100), rnorm(100))
  lg_object4 <- lg_main(x_biv, est_method = "5par",
                  transform_to_marginal_normality = FALSE)

  # Whichever method you choose, the lg-object can now be passed on
  # to the dlg- or clg-functions for evaluation of the density or
  # conditional density estimate. Control the grid with the grid
  # argument.
  grid1 <- x[1:10,]
  dens_est <- dlg(lg_object1, grid = grid1)

  # The conditional density of X1 given X2 = 1 and X2 = 0:
  grid2 <- matrix(-3:3, ncol = 1)
  c_dens_est <- clg(lg_object1, grid = grid2, condition = c(1, 0))

hotneim/lg documentation built on May 9, 2020, 7:35 a.m.