Little helper functions to aid users to detect linear dependent columns in a two-dimensional data structure, especially in a (transformed) model matrix - typically useful in interactive mode during model building phase.
1 2 3 4 5 6 7 8 9 10 11 12 13 14
detect_lin_dep(object, ...) ## S3 method for class 'matrix' detect_lin_dep(object, suppressPrint = FALSE, ...) ## S3 method for class 'data.frame' detect_lin_dep(object, suppressPrint = FALSE, ...) ## S3 method for class 'plm' detect_lin_dep(object, suppressPrint = FALSE, ...) ## S3 method for class 'plm' alias(object, ...) ## S3 method for class 'pFormula' alias(object, data, model = c("pooling", "within", "Between", "between", "mean", "random", "fd"), effect = c("individual", "time", "twoways"), ...)
Linear dependence of columns/variables is (usually) readily avoided when building one's model.
However, linear dependence is sometimes not obvious and harder to detect for
less experienced applied statisticians. The so called "dummy variable trap" is a common and probably
the best–known fallacy of this kind (see e. g. Wooldridge (2016), sec. 7-2.). When building linear
pooling model, linear dependence in one's model is
easily detected, at times post hoc.
However, linear dependence might also occur after some transformations of the data, albeit it
is not present in the untransformed data. The within transformation (also called fixed effect
transformation) used in the
"within" model can result in such linear dependence and this
is harder to come to mind when building a model. See Examples for two examples of linear
dependent columns after the within transformation: ex. 1) the transformed variables have the
opposite sign of one another; ex. 2) the transformed variables are identical.
plm's model estimation, linear dependent columns and their corresponding coefficients
in the resulting object are silently dropped, while the corresponding model frame and model matrix
still contain the affected columns.
The plm object contains an element
aliased which indicates any such aliased coefficients by a named logical.
alias, help to detect linear dependence and accomplish almost the same:
detect_lin_dep is a stand alone implementation while
is a wrapper around
alias.lm, extending the
alias generic to classes
alias hinges on the availability of the package MASS on the system. Not all arguments of
alias is more informative
as it gives the linear combination of dependent columns (after data transformations, i. e. after (quasi)-demeaning)
detect_lin_dep only gives columns involved in the linear dependence in a simple format (thus being more
suited for automatic post–processing of the information).
detect_lin_dep: A named numeric vector containing column numbers of the linear dependent columns in the object after data transformation,
if any are present.
NULL if no linear dependent columns are detected.
alias: return value of
alias.lm run on the (quasi-)demeaned model, i. e. the information outputted applies to the
transformed model matrix, not the original data.
Wooldridge, J.M. (2016) Introductory Econometrics: A Modern Approach, 6th ed., Cengage Learning, Boston, sec. 7-2, pp. 206–211.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
### Example 1 ### # prepare the data data("Cigar" , package = "plm") Cigar[ , "fact1"] <- c(0,1) Cigar[ , "fact2"] <- c(1,0) Cigar.p <- pdata.frame(Cigar) # setup a pFormula and a model frame pform <- pFormula(price ~ 0 + cpi + fact1 + fact2) mf <- model.frame(pform, data = Cigar.p) # no linear dependence in the pooling model's model matrix # (with intercept in the formula, there would be linear depedence) detect_lin_dep(model.matrix(pform, data = mf, model = "pooling")) # linear dependence present in the FE transformed model matrix modmat_FE <- model.matrix(pform, data = mf, model = "within") detect_lin_dep(modmat_FE) mod_FE <- plm(pform, data = Cigar.p, model = "within") detect_lin_dep(mod_FE) alias(mod_FE) # => fact1 == -1*fact2 plm(pform, data = mf, model = "within")$aliased # "fact2" indicated as aliased # look at the data: after FE transformation fact1 == -1*fact2 head(modmat_FE) all.equal(modmat_FE[ , "fact1"], -1*modmat_FE[ , "fact2"]) ### Example 2 ### # Setup the data: # Assume CEOs stay with the firms of the Grunfeld data # for the firm's entire lifetime and assume some fictional # data about CEO tenure and age in year 1935 (first observation # in the data set) to be at 1 to 10 years and 38 to 55 years, respectively. # => CEO tenure and CEO age increase by same value (+1 year per year). data(Grunfeld, package = "plm") set.seed(42) # add fictional data Grunfeld$CEOtenure <- c(replicate(10, seq(from=s<-sample(1:10, 1), to=s+19, by=1))) Grunfeld$CEOage <- c(replicate(10, seq(from=s<-sample(38:65, 1), to=s+19, by=1))) # look at the data head(Grunfeld, 50) pform <- pFormula(inv ~ value + capital + CEOtenure + CEOage) mf <- model.frame(pform, data=pdata.frame(Grunfeld)) # no linear dependent columns in original data/pooling model modmat_pool <- model.matrix(pform, data = mf, model="pooling") detect_lin_dep(modmat_pool) mod_pool <- plm(pform, data = Grunfeld, model = "pooling") alias(mod_pool) # CEOtenure and CEOage are linear dependent after FE transformation # (demeaning per individual) modmat_FE <- model.matrix(pform, data = mf, model="within") detect_lin_dep(modmat_FE) mod_FE <- plm(pform, data = Grunfeld, model = "within") detect_lin_dep(mod_FE) alias(mod_FE) # look at the transformed data: after FE transformation CEOtenure == 1*CEOage head(modmat_FE, 50) all.equal(modmat_FE[ , "CEOtenure"], modmat_FE[ , "CEOage"])
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.