fitstat: Computes fit statistics of fixest objects

View source: R/fitstats.R

fitstatR Documentation

Computes fit statistics of fixest objects

Description

Computes various fit statistics for fixest estimations.

Usage

fitstat(
  x,
  type,
  simplify = FALSE,
  verbose = TRUE,
  show_types = FALSE,
  frame = parent.frame(),
  ...
)

Arguments

x

A fixest estimation.

type

Character vector or one sided formula. The type of fit statistic to be computed. The classic ones are: n, rmse, r2, pr2, f, wald, ivf, ivwald. You have the full list in the details section or use show_types = TRUE. Further, you can register your own types with fitstat_register.

simplify

Logical, default is FALSE. By default a list is returned whose names are the selected types. If simplify = TRUE and only one type is selected, then the element is directly returned (ie will not be nested in a list).

verbose

Logical, default is TRUE. If TRUE, an object of class fixest_fitstat is returned (so its associated print method will be triggered). If FALSE a simple list is returned instead.

show_types

Logical, default is FALSE. If TRUE, only prompts all available types.

frame

An environment in which to evaluate variables, default is parent.frame(). Only used if the argument type is a formula and some values in the formula have to be extended with the dot square bracket operator. Mostly for internal use.

...

Other elements to be passed to other methods and may be used to compute the statistics (for example you can pass on arguments to compute the VCOV when using type = "g" or type = "wald".).

Value

By default an object of class fixest_fitstat is returned. Using verbose = FALSE returns a simple a list. Finally, if only one type is selected, simplify = TRUE leads to the selected type to be returned.

Registering your own types

You can register custom fit statistics with the function fitstat_register.

Available types

The types are case sensitive, please use lower case only. The types available are:

n, ll, aic, bic, rmse:

The number of observations, the log-likelihood, the AIC, the BIC and the root mean squared error, respectively.

my:

Mean of the dependent variable.

g:

The degrees of freedom used to compute the t-test (it influences the p-values of the coefficients). When the VCOV is clustered, this value is equal to the minimum cluster size, otherwise, it is equal to the sample size minus the number of variables.

r2, ar2, wr2, awr2, pr2, apr2, wpr2, awpr2:

All r2 that can be obtained with the function r2. The a stands for 'adjusted', the w for 'within' and the p for 'pseudo'. Note that the order of the letters a, w and p does not matter. The pseudo R2s are McFadden's R2s (ratios of log-likelihoods).

theta:

The over-dispersion parameter in Negative Binomial models. Low values mean high overdispersion.

f, wf:

The F-tests of nullity of the coefficients. The w stands for 'within'. These types return the following values: stat, p, df1 and df2. If you want to display only one of these, use their name after a dot: e.g. f.stat will give the statistic of the F-test, or wf.p will give the p-values of the F-test on the projected model (i.e. projected onto the fixed-effects).

wald:

Wald test of joint nullity of the coefficients. This test always excludes the intercept and the fixed-effects. These type returns the following values: stat, p, df1, df2 and vcov. The element vcov reports the way the VCOV matrix was computed since it directly influences this statistic.

ivf, ivf1, ivf2, ivfall:

These statistics are specific to IV estimations. They report either the IV F-test (namely the Cragg-Donald F statistic in the presence of only one endogenous regressor) of the first stage (ivf or ivf1), of the second stage (ivf2) or of both (ivfall). The F-test of the first stage is commonly named weak instrument test. The value of ivfall is only useful in etable when both the 1st and 2nd stages are displayed (it leads to the 1st stage F-test(s) to be displayed on the 1st stage estimation(s), and the 2nd stage one on the 2nd stage estimation – otherwise, ivf1 would also be displayed on the 2nd stage estimation). These types return the following values: stat, p, df1 and df2.

ivwald, ivwald1, ivwald2, ivwaldall:

These statistics are specific to IV estimations. They report either the IV Wald-test of the first stage (ivwald or ivwald1), of the second stage (ivwald2) or of both (ivwaldall). The Wald-test of the first stage is commonly named weak instrument test. Note that if the estimation was done with a robust VCOV and there is only one endogenous regressor, this is equivalent to the Kleibergen-Paap statistic. The value of ivwaldall is only useful in etable when both the 1st and 2nd stages are displayed (it leads to the 1st stage Wald-test(s) to be displayed on the 1st stage estimation(s), and the 2nd stage one on the 2nd stage estimation – otherwise, ivwald1 would also be displayed on the 2nd stage estimation). These types return the following values: stat, p, df1, df2, and vcov.

cd:

The Cragg-Donald test for weak instruments.

kpr:

The Kleibergen-Paap test for weak instruments.

wh:

This statistic is specific to IV estimations. Wu-Hausman endogeneity test. H0 is the absence of endogeneity of the instrumented variables. It returns the following values: stat, p, df1, df2.

sargan:

Sargan test of overidentifying restrictions. H0: the instruments are not correlated with the second stage residuals. It returns the following values: stat, p, df.

lr, wlr:

Likelihood ratio and within likelihood ratio tests. It returns the following elements: stat, p, df. Concerning the within-LR test, note that, contrary to estimations with femlm or feNmlm, estimations with feglm/fepois need to estimate the model with fixed-effects only which may prove time-consuming (depending on your model). Bottom line, if you really need the within-LR and estimate a Poisson model, use femlm instead of fepois (the former uses direct ML maximization for which the only FEs model is a by product).

Examples


data(trade)
gravity = feols(log(Euros) ~ log(dist_km) | Destination + Origin, trade)

# Extracting the 'working' number of observations used to compute the pvalues
fitstat(gravity, "g", simplify = TRUE)

# Some fit statistics
fitstat(gravity, ~ rmse + r2 + wald + wf)

# You can use them in etable
etable(gravity, fitstat = ~ rmse + r2 + wald + wf)

# For wald and wf, you could show the pvalue instead:
etable(gravity, fitstat = ~ rmse + r2 + wald.p + wf.p)

# Now let's display some statistics that are not built-in
# => we use fitstat_register to create them

# We need: a) type name, b) the function to be applied
#          c) (optional) an alias

fitstat_register("tstand", function(x) tstat(x, se = "stand")[1], "t-stat (regular)")
fitstat_register("thc", function(x) tstat(x, se = "heter")[1], "t-stat (HC1)")
fitstat_register("t1w", function(x) tstat(x, se = "clus")[1], "t-stat (clustered)")
fitstat_register("t2w", function(x) tstat(x, se = "twow")[1], "t-stat (2-way)")

# Now we can use these keywords in fitstat:
etable(gravity, fitstat = ~ . + tstand + thc + t1w + t2w)

# Note that the custom stats we created are can easily lead
# to errors, but that's another story!



fixest documentation built on Nov. 24, 2023, 5:11 p.m.