View source: R/main_function.R
| gKRLS | R Documentation |
This page documents how to use gKRLS as part of a model estimated with
mgcv. Post-estimation functions to calculate marginal effects are
documented elsewhere, e.g. calculate_effects.
gKRLS(
sketch_method = "subsampling",
standardize = "Mahalanobis",
bandwidth = NULL,
sketch_multiplier = 5,
sketch_size_raw = NULL,
sketch_prob = NULL,
rescale_penalty = TRUE,
truncate.eigen.tol = sqrt(.Machine$double.eps),
demean_kernel = FALSE,
remove_instability = TRUE
)
get_calibration_information(object)
sketch_method |
A string that specifies which kernel sketching method
should be used (default of To force |
standardize |
A string that specifies how the data is standardized
before calculating the distance between observations. The default is
|
bandwidth |
A bandwidth |
sketch_multiplier |
A number that sets the size of the sketching
dimension: |
sketch_size_raw |
A number to set the exact size of the sketching
dimension. The default, |
sketch_prob |
A probability for an element of the sketching matrix to
equal |
rescale_penalty |
A logical value for whether the penalty should be
rescaled for numerical stability. See documentation for
|
truncate.eigen.tol |
A threshold to remove columns of the penalty
|
demean_kernel |
A logical value that indicates whether columns of the
(sketched) kernel should be demeaned before estimation. The default is
|
remove_instability |
A logical value that indicates whether numerical
zeros (set via |
object |
Model estimated using |
Overview: The gKRLS function should not be called directly. Its
options, described above, control how gKRLS is estimated. It should be
passed to mgcv as follows: s(x1, x2, x3, bs = "gKRLS", xt =
gKRLS(...)). Multiple kernels can be specified and have different
gKRLS arguments. It can also be used alongside the existing options
for s() in mgcv.
If bandwidth="calibrate", the function
get_calibration_information reports the estimated bandwidth and time
(in minutes) needed to do so.
Default Settings: By default, bs = "gKRLS" uses Mahalanobis
distance between the observations, random sketching using subsampling
sketching (i.e., where the kernel is constructed using a random sample of the
observations; Yang et al. 2017) and a sketching dimension of 5 *
ceiling(N^(1/3)) where N is the number of observations. Chang and
Goplerud (2024) provide an exploration of alternative options.
Notes: Please note that variables must be separated with commas inside
of s(...) and that character variables should usually be passed as
factors to work smoothly with mgcv. When using this function with
bam, the sketching dimension uses chunk.size in place of
N and thus either chunk.size or sketch_size_raw must be used to cause
the sketching dimension to increase with N.
gKRLS returns a named list with the elements in "Arguments".
Chang, Qing, and Max Goplerud. 2024. "Generalized Kernel Regularized Least Squares." Political Analysis 32(2):157-171.
Hartman, Erin, Chad Hazlett, and Ciara Sterbenz. 2024. "kpop: A Kernel Balancing Approach for Reducing Specification Assumptions in Survey Weighting." Journal of the Royal Statistical Society Series A: Statistics in Society \Sexpr[results=rd]{tools:::Rd_expr_doi("doi:10.1093/jrsssa/qnae082")}.
Drineas, Petros, Michael W. Mahoney, and Nello Cristianini. 2005. "On the Nyström Method for Approximating a Gram Matrix For Improved Kernel-Based Learning." Journal of Machine Learning Research 6(12):2153-2175.
Yang, Yun, Mert Pilanci, and Martin J. Wainwright. 2017. "Randomized Sketches for Kernels: Fast and Optimal Nonparametric Regression." Annals of Statistics 45(3):991-1023.
set.seed(123)
n <- 100
x1 <- rnorm(n)
x2 <- rnorm(n)
x3 <- rnorm(n)
state <- sample(letters[1:5], n, replace = TRUE)
y <- 0.3 * x1 + 0.4 * x2 + 0.5 * x3 + rnorm(n)
data <- data.frame(y, x1, x2, x3, state)
data$state <- factor(data$state)
# A gKRLS model without fixed effects
fit_gKRLS <- mgcv::gam(y ~ s(x1, x2, x3, bs = "gKRLS"), data = data)
summary(fit_gKRLS)
# A gKRLS model with fixed effects outside of the kernel
fit_gKRLS_FE <- mgcv::gam(y ~ state + s(x1, x2, x3, bs = "gKRLS"), data = data)
# HC3 is not available for mgcv; this uses the effective degrees of freedom
# instead of the number of columns; see ?estfun.gam for details
robust <- sandwich::vcovHC(fit_gKRLS, type = 'HC1')
cluster <- sandwich::vcovCL(fit_gKRLS, cluster = data$state)
# Change default standardization to "scaled", sketch method to Gaussian,
# and alter sketching multiplier
fit_gKRLS_alt <- mgcv::gam(y ~ s(x1, x2, x3,
bs = "gKRLS",
xt = gKRLS(
standardize = "scaled",
sketch_method = "gaussian",
sketch_multiplier = 2
)
),
data = data
)
# A model with multiple kernels
fit_gKRLS_2 <- mgcv::gam(y ~ s(x1, x2, bs = 'gKRLS') + s(x1, x3, bs = 'gKRLS'), data = data)
# A model with a custom set of ids for sketching
id <- sample(1:n, 5)
fit_gKRLS_custom <- mgcv::gam(y ~ s(x1, bs = 'gKRLS', xt = gKRLS(sketch_method = id)), data = data)
# Note that the ids of the sampled observations can be extracted
# from the fitted mgcv object
stopifnot(identical(id, fit_gKRLS_custom$smooth[[1]]$subsampling_id))
# calculate marginal effect (see ?calculate_effects for more examples)
calculate_effects(fit_gKRLS, variables = "x1")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.