plm tries to follow as close as possible the way models are fitted using `lm`

. This relies on the following steps, using the `formula`

-`data`

:

- compute internally the
`model.frame`

by getting the relevant arguments (`formula`

,`data`

,`subset`

,`weights`

,`na.action`

and`offset`

) and the supplementary argument, - extract from the
`model.frame`

the response (with`model.response`

), the model matrix (with`model.matrix`

), - call the estimation function
`plm.fit`

with`X`

and`y`

as arguments.

with some modifications.

Firstly, panel data has a special structure which is describe by an `index`

argument. This can be used in the `pdata.frame`

function which returns a `pdata.frame`

object which can be used as the `data`

argument of `plm`

. If the `data`

argument of `plm`

is an ordinary `data.frame`

, the `index`

argument can also be used as an argument of `plm`

. In this case, the `pdata.frame`

function is used internally to transform the data.

Next, the `formula`

, which is the first and mandatory argument of `plm`

is coerced to a `Formula`

object.

`model.frame`

is then called, but with the `data`

argument in the first position (a `pdata.frame`

object) and the `formula`

in the second position. This unusual order of the arguments enables to use a specific `model.frame.pdata.frame`

method defined in `plm`

.

As for the `model.frame.formula`

method, a `data.frame`

is returned, with a `terms`

attribute.

Next, the `X`

matrix is extracted using `model.matrix`

. The usual way to do so is to feed the function with to arguments, a `formula`

or a `terms`

object and a `data.frame`

created with `model.frame`

. `lm`

use something like `model.matrix(terms(mf), mf)`

where `mf`

is a `data.frame`

created with a `model.frame`

. Therefore, `model.matrix`

needs actually one argument and not two and we therefore wrote a `model.matrix.pdata.frame`

which does the job ; the method first check that the argument has a `term`

attribute, extract the `terms`

(actually the `formula`

) and then compute the `model.matrix`

.

The response `y`

is usually extracted using `model.response`

, with a `data.frame`

created with `model.frame`

as first argument, but it is not generic. We therefore create a generic called `pmodel.response`

and provide a `pmodel.response.pdata.frame`

method. We illustrate these features using a simplified (in terms of covariates) example with the `SeatBelt`

data set :

```
library("plm")
data("SeatBelt", package = "pder")
SeatBelt$occfat <- with(SeatBelt, log(farsocc / (vmtrural + vmturban)))
pSB <- pdata.frame(SeatBelt)
```

We start with an OLS (`pooling`

) specification.

```
formols <- occfat ~ log(usage) + log(percapin)
mfols <- model.frame(pSB, formols)
Xols <- model.matrix(mfols)
y <- pmodel.response(mfols)
coef(lm.fit(Xols, y))
```

```
## (Intercept) log(usage) log(percapin)
## 7.4193570 0.1657293 -1.1583712
```

which is equivalent with :

`coef(plm(formols, SeatBelt, model = "pooling"))`

```
## (Intercept) log(usage) log(percapin)
## 7.4193570 0.1657293 -1.1583712
```

Next we use an instrumental variables specification. `usage`

is endogenous and instrumented by three variables indicating the law context : `ds`

, `dp`

and `dsp`

.

The model is described using a two-parts formula, the first part of the RHS describing the covariates and the second part the instruments. The following two formulations can be used :

```
formiv1 <- occfat ~ log(usage) + log(percapin) | log(percapin) + ds + dp + dsp
formiv2 <- occfat ~ log(usage) + log(percapin) | . - log(usage) + ds + dp + dsp
```

The second formulation has two advantages :

- in the common case when a lot of covariates are instruments, these covariates don’t need to be indicated in the second RHS part of the formula,
- the endogenous variables clearly appear as they are proceeded by a
`-`

sign in the second RHS part of the formula.

The formula is coerced to a `Formula`

, using the `Formula`

package. `model.matrix.pdata.frame`

then internally calls the `model.matrix.Formula`

in order to extract the covariates and instruments model matrices :

```
mfSB1 <- model.frame(pSB, formiv1)
X1 <- model.matrix(mfSB1, rhs = 1)
W1 <- model.matrix(mfSB1, rhs = 2)
head(X1, 3) ; head(W1, 3)
```

```
## (Intercept) log(usage) log(percapin)
## 8 1 -0.7985077 9.955748
## 9 1 -0.4155154 9.975622
## 10 1 -0.4155154 10.002110
```

```
## (Intercept) log(percapin) ds dp dsp
## 8 1 9.955748 0 0 0
## 9 1 9.975622 1 0 0
## 10 1 10.002110 1 0 0
```

For the second (and preferred formulation), the `dot`

argument should be set and is passed to the `Formula`

methods. `.`

has actually two meanings :

- all the available covariates,
- the previous covariates used while updating a formula.

which correspond respectively to `dot = "seperate"`

(the default) and `dot = "previous"`

. See the difference between :

```
library("Formula")
head(model.frame(Formula(formiv2), SeatBelt), 3)
```

```
## occfat log(usage) log(percapin) state year farsocc farsnocc usage
## 8 -3.788976 -0.7985077 9.955748 AK 1990 90 8 0.45
## 9 -3.904837 -0.4155154 9.975622 AK 1991 81 20 0.66
## 10 -3.699611 -0.4155154 10.002110 AK 1992 95 13 0.66
## percapin unemp meanage precentb precenth densurb densrur
## 8 21073 7.05 29.58628 0.04157167 0.03252657 1.099419 0.1906836
## 9 21496 8.75 29.82771 0.04077293 0.03280357 1.114670 0.1906712
## 10 22073 9.24 30.21070 0.04192957 0.03331731 1.114078 0.1672785
## viopcap proppcap vmtrural vmturban fueltax lim65 lim70p mlda21 bac08
## 8 0.0009482704 0.008367458 2276 1703 8 0 0 1 0
## 9 0.0010787370 0.008940661 2281 1740 8 0 0 1 0
## 10 0.0011257068 0.008366873 2005 1836 8 1 0 1 0
## ds dp dsp
## 8 0 0 0
## 9 1 0 0
## 10 1 0 0
```

`head(model.frame(Formula(formiv2), SeatBelt, dot = "previous"), 3)`

```
## occfat log(usage) log(percapin) ds dp dsp
## 8 -3.788976 -0.7985077 9.955748 0 0 0
## 9 -3.904837 -0.4155154 9.975622 1 0 0
## 10 -3.699611 -0.4155154 10.002110 1 0 0
```

In the first case, all the covariates are returned by `model.frame`

as the `.`

is understood by default as “everything”.

In `plm`

, the `dot`

argument is internally set to `previous`

so that the end-user doesn’t have to worry about these subtleties.

```
mfSB2 <- model.frame(pSB, formiv2)
X2 <- model.matrix(mfSB2, rhs = 1)
W2 <- model.matrix(mfSB2, rhs = 2)
head(X2, 3) ; head(W2, 3)
```

```
## (Intercept) log(usage) log(percapin)
## 8 1 -0.7985077 9.955748
## 9 1 -0.4155154 9.975622
## 10 1 -0.4155154 10.002110
```

```
## (Intercept) log(percapin) ds dp dsp
## 8 1 9.955748 0 0 0
## 9 1 9.975622 1 0 0
## 10 1 10.002110 1 0 0
```

The iv estimator can then be obtained as a 2SLS estimator : first regress the covariates on the instruments and get the fitted values :

```
HX1 <- lm.fit(W1, X1)$fitted.values
head(HX1, 3)
```

```
## (Intercept) log(usage) log(percapin)
## 8 1 -1.0224257 9.955748
## 9 1 -0.5435055 9.975622
## 10 1 -0.5213364 10.002110
```

Next regress the response on these fitted values :

`coef(lm.fit(HX1, y))`

```
## (Intercept) log(usage) log(percapin)
## 7.5641209 0.1768576 -1.1722590
```

Or using the `formula`

-`data`

interface with the `ivreg`

function :

`coef(AER::ivreg(formiv1, data = SeatBelt))`

```
## (Intercept) log(usage) log(percapin)
## 7.5641209 0.1768576 -1.1722590
```

or `plm`

:

`coef(plm(formiv1, SeatBelt, model = "pooling"))`

```
## (Intercept) log(usage) log(percapin)
## 7.5641209 0.1768576 -1.1722590
```