Description Usage Arguments Details
This is the primary fitting function. By default it uses squared loss (loss="leastsquares") as one would for a continuous outcome, but now also implements logistic regression with the loss="logistic" option. It also allows faster computation and larger training sets than prior versions by an option to approximate the kernel matrix with lower dimensional approximation using the truncate argument.
The workflow for using KRLS mimics that of lm and similar functions: a krls object of class KRLS2 is fitted in one step, then can later be examined using summary(). The krls object contains all the information that may be needed at summary time, including information required to estimate pointwise partial derivatives, their average for each covariate, standard errors, etc. via the summary() function. See summary.krls2().
1 2 3 4 5 6 7 | krls(X, y, w = NULL, loss = "leastsquares", whichkernel = "gaussian",
b = NULL, bstart = NULL, binterval = c(10^-8, 500 * ncol(X)),
lambda = NULL, hyperfolds = 5, lambdastart = 10^(-6)/length(y),
lambdainterval = c(10^-8, 25), L = NULL, U = NULL, tol = NULL,
truncate = FALSE, epsilon = NULL, lastkeeper = NULL, con = list(maxit
= 500), returnopt = FALSE, printlevel = 0, warn = 1, sigma = NULL,
...)
|
X |
N by D data numeric matrix that contains the values of D predictor variables for i=1,…,N observations. The matrix may not contain missing values or constants. Note that no intercept is required for the least squares or logistic loss. In the case of least squares, the function operates on demeaned data and subtracting the mean of y is equivalent to including an (unpenalized) intercept into the model. In the case of logistic loss, we automatically estimate an unpenalized intercept in the linear component of the model. |
y |
N by 1 data numeric matrix or vector that contains the values of the response variable for all observations. This vector may not contain missing values, and in the case of logistic loss should be a vector of 0s and 1s. |
w |
N by 1 data numeric matrix or vector that contains the weights that should applied to each observation. These need not sum to one. |
loss |
String vector that specifies the loss function. For KRLS, use |
whichkernel |
String vector that specifies which kernel should be used. Must be one of |
b |
A positive scalar (formerly |
bstart |
A positive scalar that is the starting value for a numerical estimation of the |
binterval |
A numeric vector of length two that specifies the minimum and maxmum |
lambda |
A positive scalar that specifies the lambda parameter for the regularizer (see details). It governs the tradeoff between model fit and complexity. By default, this parameter is chosen by minimizing the sum of the squared leave-one-out errors for KRLS and by minimizing the sum of cross-validation negative log likelihood for KRLogit, with the number of folds set by |
hyperfolds |
A positive scalar that sets the number of folds used in selecting |
lambdastart |
A positive scalar that specifices the starting value for a numerical optimization of |
lambdainterval |
A numeric vector of length two that specifies the minimum and maxmum |
L |
Non-negative scalar that determines the lower bound of the search window for the leave-one-out optimization to find lambda with least squares loss. Default is |
U |
Positive scalar that determines the upper bound of the search window for the leave-one-out optimization to find lambda with least squares loss. Default is |
tol |
Positive scalar that determines the tolerance used in the optimization routine used to find lambda with least squares loss. Default is |
truncate |
A boolean that defaults to |
epsilon |
Scalar between 0 and 1 that determines the total variance that can be lost in truncation. If not NULL, truncation is automatically set to TRUE. If |
lastkeeper |
Number of columns of |
con |
A list of control arguments passed to optimization for the numerical optimization of the kernel regularized logistic loss function. |
returnopt |
A boolean that defaults to |
printlevel |
A number that is either 0 (default), 1, or 2. 0 Has minimal printing, 1 prints out most diagnostics, and 2 prints out most diagnostics including |
warn |
A number that sets your |
sigma |
DEPRECATED. Users should now use |
krls
implements the Kernel-based Regularized Least Squares (KRLS) estimator as described in Hainmueller and Hazlett (2014). Please consult this reference for any details.
Kernel-based Regularized Least Squares (KRLS) arises as a Tikhonov minimization problem with a squared loss. Assume we have data of the from y_i, x_i where i indexes observations, y_i in R is the outcome and x_i in R^D is a D-dimensional vector of predictor values. Then KRLS searches over a space of functions H and chooses the best fitting function f according to the rule:
argmin_{f in H} sum_i^N (y_i - f(x_i))^2 + lambda || f ||_H^2
where (y_i - f(x_i))^2 is a loss function that computes how ‘wrong’ the function is at each observation i and || f ||_H^2 is the regularizer that measures the complexity of the function according to the L_2 norm ||f||^2 = int f(x)^2 dx. lambda is the scalar regularization parameter that governs the tradeoff between model fit and complexity. By default, lambda is chosen by minimizing the sum of the squared leave-one-out errors, but it can also be specified by the user in the lambda
argument to implement other approaches.
Under fairly general conditions, the function that minimizes the regularized loss within the hypothesis space established by the choice of a (positive semidefinite) kernel function k(x_i,x_j) is of the form
f(x_j)= sum_i^N c_i k(x_i,x_j)
where the kernel function k(x_i,x_j) measures the distance between two observations x_i and x_j and c_i is the choice coefficient for each observation i. Let K be the N by N kernel matrix with all pairwise distances K_ij=k(x_i,x_j) and c be the N by 1 vector of choice coefficients for all observations then in matrix notation the space is y=Kc.
Accordingly, the krls
function solves the following minimization problem
argmin_{f in H} sum_i^n (y - Kc)'(y-Kc)+ lambda c'Kc
which is convex in c and solved by c=(K +lambda I)^-1 y where I is the identity matrix. Note that this linear solution provides a flexible fitted response surface that typically reduces misspecification bias because it can learn a wide range of nonlinear and or nonadditive functions of the predictors. In an extension, Hazlett and Sonnet consier a logistic loss function, details of which are forthcoming.
If vcov=TRUE
is specified, krls
also computes the variance-covariance matrix for the choice coefficients c and fitted values y=Kc based on a variance estimator developed in Hainmueller and Hazlett (2014). Note that both matrices are N by N and therefore this results in increased memory and computing time.
By default, krls
uses the Gaussian Kernel (whichkernel = "gaussian"
) given by
k(x_i,x_j)=exp(-|| x_i - x_j ||^2 / sigma^2)
where ||x_i - x_j|| is the Euclidean distance. The kernel bandwidth sigma^2 is set to D, the number of dimensions, by default, but the user can also specify other values using the sigma
argument to implement other approaches.
If binary=TRUE
is also specified, the function will identify binary predictors and return first differences for these predictors instead of partial derivatives. First differences are computed going from the minimum to the maximum value of each binary predictor. Note that first differences are more appropriate to summarize the effects for binary predictors (see Hainmueller and Hazlett (2014) for details).
A few other kernels are also implemented, but derivatives are currently not supported for these: "linear": k(x_i,x_j)=x_i'x_j, "poly1", "poly2", "poly3", "poly4" are polynomial kernels based on k(x_i,x_j)=(x_i'x_j +1)^p where p is the order.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.