estim.sur | R Documentation |
Use this function to estimate a Seemingly Unrelated Regression model.
estim.sur(
data,
searchSigMaxIter = 0,
searchSigMaxProb = 0.1,
restriction = NULL,
pcaOptionsY = NULL,
pcaOptionsX = NULL,
simFixSize = 0,
simTrainFixSize = 0,
simTrainRatio = 0.75,
simSeed = 0,
simMaxConditionNumber = Inf
)
data |
A list that determines data and other required information for the search process.
Use |
searchSigMaxIter |
An integer for the maximum number of iterations in searching for significant coefficients. Use 0 to disable the search. |
searchSigMaxProb |
A number for the maximum value of type I error to be used in searching for significant coefficients. If p-value is less than this, it is interpreted as significant. |
restriction |
A |
pcaOptionsY |
A list of options to use principal components of the endogenous data, instead of the actual values. Set |
pcaOptionsX |
A list of options to use principal components of the exogenous data, instead of the actual values. Set |
simFixSize |
An integer that determines the number of out-of-sample simulations. Use zero to disable the simulation. |
simTrainFixSize |
An integer representing the number of data points in the training sample in the out-of-sample simulation. If zero, |
simTrainRatio |
A number representing the size of the training sample relative to the available size, in the out-of-sample simulation. It is effective if |
simSeed |
A seed for the random number generator. Use zero for a random value. |
simMaxConditionNumber |
A number for the maximum value for the condition number in the simulation. |
As described in section 10.2 in \insertCitegreene2020econometric;textualldt, this type of statistical model consists of multiple regression equations, where each equation may have a different set of exogenous variables and the disturbances between the equations are assumed to be correlated. The general form with m
equations can be written as y_i=z_i'\gamma_i+v_i
and E(v_i v_j)=\sigma_{ij}^2
for i=1,\ldots m
. Assuming that a sample of N
independent observations is available, we can stack the observations and use the following system for estimation:
Y = X B + V, \quad \mathrm{vec}B = R\gamma,
where the columns of Y:N \times m
contain the endogenous variables for each equation and the columns of X: N\times k
contain the explanatory variables, with k
being the number of unique explanatory variables in all equations. Note that X
combines the z_i
variables, and the restrictions imposed by R:mk\times q
and \gamma:q\times 1
determine a set of zero constraints on B: k \times m
, resulting in a system of equations with different sets of exogenous variables.
Imposing restrictions on the model using the R
matrix is not user-friendly, but it is suitable for use in this package, as users are not expected to specify such restrictions, but only to provide a list of potential regressors. Note that in this package, most procedures, including significance search, are supposed to be automated.
The unrestricted estimators (i.e., \hat{B}=(X'X)^{-1}X'Y
, and \hat{\Sigma}=(\hat{V}'\hat{V})/N
where \hat{V}=Y-X\hat{B}
) are used to initialize the feasible GLS estimators:
\tilde{B} = RW^{-1}R'[\hat{V}-1 \otimes x']\mathrm{vec}Y, \quad \tilde{\Sigma}=(\tilde{V}'\tilde{V})/N,
where W = R'[\hat{V}^{-1} \otimes X'X]R
and \tilde{V}=Y-X\tilde{B}
. The properties of these estimators are discussed in proposition 5.3 in \insertCitelutkepohl2005new;textualldt. See also section 10.2 in \insertCitegreene2020econometric;textualldt. The maximum likelihood value is calculated by -\frac{N}{2}(m(\ln 2\pi+1)+\ln|\tilde{\Sigma}|)
. The condition number is calculated by multiplying 1-norm of W
and its inverse (e.g., see page 94 in \insertCitetrefethen1997numerical;textualldt). Furthermore, given an out-of-sample observation such as x:k\times 1
, the prediction is y^f = \tilde{B}'x
, and its variance is estimated by the following formula:
\mathrm{var}y^f = \tilde{V} + (x' \otimes I_m)R W^{-1}R'(x \otimes I_m).
Note that the focus in ldt
is model uncertainty and for more sophisticated implementations of the FGLS estimator, you may consider using other packages such as systemfit
.
Finally, note that the main purpose of exporting this method is to show the inner calculations of the search process in search.sur function.
search.sur
# Example 1 (simulation, small model):
set.seed(123)
sample <- sim.sur(sigma = 2L, coef = 3L, nObs = 100)
print(sample$coef)
print(sample$sigma)
data <- data.frame(sample$y, sample$x)
# Use systemfit to estimate:
exp_names <- paste0(colnames(sample$x), collapse = " + ")
fmla <- lapply(1:ncol(sample$y), function(i) as.formula(paste0("Y", i, " ~ -1 +", exp_names)))
fit <- systemfit::systemfit(fmla, data = data, method = "SUR")
print(fit)
# Use 'ldt::estim.sur' function
fit <- estim.sur(data = get.data(cbind(sample$y, sample$x),
endogenous = ncol(sample$y),
addIntercept = FALSE))
# or, by using formula list:
fit <- estim.sur(data = get.data(data = data,
equations = fmla,
addIntercept = FALSE))
print(fit)
print(fit$estimations$sigma)
plot_data <- plot(fit, equation = 1)
# Example 2 (simulation, large model with significancy search):
num_obs <- 100
sample <- sim.sur(sigma = 2L, coef = 3L, nObs = num_obs)
print(sample$coef)
# create irrelevant data:
num_x_ir <- 20
x_ir <- matrix(rnorm(num_obs * num_x_ir), ncol = num_x_ir)
data_x <- data.frame(sample$x, x_ir)
colnames(data_x) <- c(colnames(sample$x), paste0("z", 1:num_x_ir))
fit <- estim.sur(data = get.data(cbind(sample$y, data_x),
endogenous = ncol(sample$y),
addIntercept = FALSE),
searchSigMaxIter = 100,
searchSigMaxProb = 0.05)
print(fit$estimations$coefs)
# coefficient matrix, with lots of zero restrictions
# Example 3 (simulation, large model with PCA):
# by using data of the previous example
fit <- estim.sur(data = get.data(cbind(sample$y, data_x),
endogenous = ncol(sample$y),
addIntercept = FALSE),
pcaOptionsX = get.options.pca(2,4))
print(fit$estimations$coefs)
# coefficients are: intercept and the first exogenous variable and 4 PCs
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.