library("knitr") opts_chunk$set(message = FALSE, warning = FALSE)

plm tries to follow as close as possible the way models are fitted
using `lm`

. This relies on the following steps, using the
`formula`

-`data`

with some modifications:

- compute internally the
`model.frame`

by getting the relevant arguments (`formula`

,`data`

,`subset`

,`weights`

,`na.action`

and`offset`

) and the supplementary argument, - extract from the
`model.frame`

the response`y`

(with`pmodel.response`

) and the model matrix`X`

(with`model.matrix`

), - call the (non-exported) estimation function
`plm.fit`

with`X`

and`y`

as arguments.

Panel data has a special structure which is described by an
`index`

argument. This argument can be used in the `pdata.frame`

function which
returns a `pdata.frame`

object. A `pdata.frame`

can be used as input to the `data`

argument of `plm`

. If the `data`

argument of `plm`

is an ordinary
`data.frame`

, the `index`

argument can also be supplied as an argument of
`plm`

. In this case, the `pdata.frame`

function is called internally to
transform the data.

Next, the `formula`

, which is the first and mandatory argument of
`plm`

is coerced to a `Formula`

object.

`model.frame`

is then called, but with the `data`

argument in the
first position (a `pdata.frame`

object) and the `formula`

in the
second position. This unusual order of the arguments enables to use a
specific `model.frame.pdata.frame`

method defined in `plm`

.

As for the `model.frame.formula`

method, a `data.frame`

is returned,
with a `terms`

attribute.

Next, the `X`

matrix is extracted using `model.matrix`

. The usual way
to do so is to feed the function with two arguments, a `formula`

or a
`terms`

object and a `data.frame`

created with `model.frame`

. `lm`

uses
something like `model.matrix(terms(mf), mf)`

where `mf`

is a
`data.frame`

created with `model.frame`

. Therefore, `model.matrix`

needs actually one argument and not two and we therefore wrote a
`model.matrix.pdata.frame`

which does the job ; the method first checks
that the argument has a `term`

attribute, extracts the `terms`

(actually the `formula`

) and then computes the model's matrix `X`

.

The response `y`

is usually extracted using `model.response`

, with a
`data.frame`

created with `model.frame`

as first argument, but it is
not generic. We therefore created a generic called `pmodel.response`

and provide a `pmodel.response.pdata.frame`

method. We illustrate
these features using a simplified (in terms of covariates) example
with the `SeatBelt`

data set:

library("plm") data("SeatBelt", package = "pder") SeatBelt$occfat <- with(SeatBelt, log(farsocc / (vmtrural + vmturban))) pSB <- pdata.frame(SeatBelt)

We start with an OLS (pooling) specification:

formols <- occfat ~ log(usage) + log(percapin) mfols <- model.frame(pSB, formols) Xols <- model.matrix(mfols) y <- pmodel.response(mfols) coef(lm.fit(Xols, y))

which is equivalent to:

coef(plm(formols, SeatBelt, model = "pooling"))

Next, we use an instrumental variables specification. Variable `usage`

is
endogenous and instrumented by three variables indicating the law
context: `ds`

, `dp`

, and `dsp`

.

The model is described using a two-parts formula, the first part of the RHS describing the covariates and the second part the instruments. The following two formulations can be used:

formiv1 <- occfat ~ log(usage) + log(percapin) | log(percapin) + ds + dp + dsp formiv2 <- occfat ~ log(usage) + log(percapin) | . - log(usage) + ds + dp + dsp

The second formulation has two advantages:

- in the common case when a lot of covariates are instruments, these covariates don't need to be indicated in the second RHS part of the formula,
- the endogenous variables clearly appear as they are proceeded by a
`-`

sign in the second RHS part of the formula.

The formula is coerced to a `Formula`

, using the `Formula`

package. `model.matrix.pdata.frame`

then internally calls
`model.matrix.Formula`

in order to extract the covariates and
instruments model matrices:

mfSB1 <- model.frame(pSB, formiv1) X1 <- model.matrix(mfSB1, rhs = 1) W1 <- model.matrix(mfSB1, rhs = 2) head(X1, 3) ; head(W1, 3)

For the second (and preferred formulation), the `dot`

argument should
be set and is passed to the `Formula`

methods. `.`

has actually two
meanings:

- all available covariates,
- the previous covariates used while updating a formula.

which correspond respectively to `dot = "seperate"`

(the default) and
`dot = "previous"`

. See the difference between the following two examples:

library("Formula") head(model.frame(Formula(formiv2), SeatBelt), 3) head(model.frame(Formula(formiv2), SeatBelt, dot = "previous"), 3)

In the first case, all the covariates are returned by `model.frame`

as
the `.`

is understood by default as "everything".

In `plm`

, the `dot`

argument is internally set to
`previous`

so that the end-user doesn't have to worry about these
subtleties.

mfSB2 <- model.frame(pSB, formiv2) X2 <- model.matrix(mfSB2, rhs = 1) W2 <- model.matrix(mfSB2, rhs = 2) head(X2, 3) ; head(W2, 3)

The IV estimator can then be obtained as a 2SLS estimator: First, regress the covariates on the instruments and get the fitted values:

HX1 <- lm.fit(W1, X1)$fitted.values head(HX1, 3)

Next, regress the response on these fitted values:

coef(lm.fit(HX1, y))

The same can be achieved in one command by using the `formula`

-`data`

interface
with `plm`

:

coef(plm(formiv1, SeatBelt, model = "pooling"))

or with the `ivreg`

function from package `AER`

(or with the newer function `ivreg`

in package `ivreg`

superseding `AER::ivreg()`

):

coef(AER::ivreg(formiv1, data = SeatBelt))

X2 <- model.matrix(Formula(form1), mfSB, rhs = 2, dot = "previous") formols <- occfat ~ log(usage) + log(percapin) | . - log(usage) + ds + dp + dsp form1 <- occfat ~ log(usage) + log(percapin) + log(unemp) + log(meanage) + log(precentb) + log(precenth) + log(densrur) + log(densurb) + log(viopcap) + log(proppcap) + log(vmtrural) + log(vmturban) + log(fueltax) + lim65 + lim70p + mlda21 + bac08 form2 <- . ~ . | . - log(usage) + ds + dp +dsp jorm1 <- occfat ~ log(usage) + log(percapin) + log(unemp) + log(meanage) + log(precentb) + log(precenth) + log(densrur) + log(densurb) + log(viopcap) + log(proppcap) + log(vmtrural) + log(vmturban) + log(fueltax) + lim65 + lim70p + mlda21 + bac08 | . - log(usage) + ds + dp + dsp jorm2 <- noccfat ~ . | .

**Any scripts or data that you put into this service are public.**

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.