Introduction to Panel Data Models
The panel data approach pools time series data with
cross-sectional data. Depending on the application, it can comprise a sample of
individuals, firms, countries, or regions over a specific time period. The
general structure of such a model could be expressed as follows:
Yit =
ao + bXit + uit
where uit ~ IID(0, 𝜎2)
and i = 1, 2, ..., N individual-level
observations, and t = 1, 2, ...,T time series
observations.
In this application, it is assumed
that Yit is a continuous variable. In this model,
the observations of each individual, firm or country are simply stacked over
time on top of each another. This is the standard pooled model where intercepts
and slope coefficients are homogeneous across all N cross-sections
and through all T time periods. The application of OLS to this
model ignores the temporal and spatial dimension inherent in
the data and thus throws away useful information. It is important to note that
the temporal dimension captures the ‘within’ variation in the data while the
spatial dimension captures the ‘between’ variation in the data. The pooled OLS
estimator exploits both ‘between’ and ‘within’ dimensions of the data but does
not do so efficiently. Thus, in this procedure each observation is given equal
weight in estimation. In addition, the unbiasedness and consistency of the
estimator requires that the explanatory variables are uncorrelated with any
omitted factors. The limitations of OLS in such an application prompted
interest in alternative procedures. There are a number of different panel
estimators but the most popular is the fixed effects (or ‘within’) estimator.
Fixed Effects or Random Effects?
The question is usually asked which
econometric model an investigator should use when modelling with panel data.
The different models can generate considerably different results and this has
been documented in many empirical studies. In terms of a model where time
effects are assumed absent for simplicity, the model to be estimated may be
given by:
Yit = ai + bXit +
uit
The question, therefore, is do we
treat ai as fixed or random? The following points
are worth noting.
·1) The
estimation of the fixed effects model is costly in terms of degrees of freedom.
This is a statistical and not a computing cost. It is particularly problematic
when N is large and T is small. The
occurrence of large N and small T currently
tends to characterize most panel data applications encountered.
·2) The ai terms
are taken to characterize (for want of a better expression) investigator
ignorance. In the fixed effects model does it make sense to treat one type of
investigator ignorance (ai) as fixed but another as random (uit)?
·3) The fixed effects formulation is viewed as one
where investigators make inferences conditional on the fixed effects in the
sample.
4)The
random effects formulation is viewed as one where investigators make
unconditional inferences with respect to the population of all effects.
· 5) The
random effects formulation treats the random effects as independent of the
explanatory variables (i.e. E(ai Xit)
= 0). Violation of this assumption leads to bias and inconsistency in the b vector.
Advantage and disadvantage of the fixed
effects model
The main advantage of the fixed effects
model is its relative ease of estimation and the fact that it does not require
independence of the fixed effects from the other included explanatory
variables. The main disadvantage is that it requires estimation of N separate
intercepts. This causes problems because much of the variation that exists in
the data may be used up in estimating these different intercept terms. As a
consequence, the estimated effects (the bs) for other explanatory variables in the
regression model may be imprecisely estimated. These might represent the more
important parameters of interest from the perspective of policy. As noted above
the fixed effects estimator is derived using the deviations between the cross-sectional
observations and the long-run average value for the cross-sectional unit. This
problem is most acute, therefore, when there is little variation or movement in
the characteristics over time, that is when the variables are
rarely-changing or they are time-invariant. In essence, the effects of
these variables are eliminated from the analysis.
Advantage and disadvantage of the random effects model
The main advantage of the random
effects estimator is that it uses up fewer degrees of freedom in estimation
and allows for the inclusion of time invariant covariates. The main
disadvantage of the model is the assumption that the random effects are independent
of the included explanatory variables. It is fairly plausible that there may be
unobservable attributes not included in the regression model that are
correlated with the observable characteristics. This procedure, unlike fixed
effects, does not allow for the elimination of the omitted heterogeneous
effects.
The Hausman Test
In determining which model is the more
appropriate to use, a statistical test can be implemented. The Hausman test
compares the random effects estimator to the ‘within’ estimator. If the null is
rejected, this favours the ‘within’ estimator’s treatment of the omitted
effects (i.e., it favours the fixed effects but only relative to the random
effects). The use of the test in this case is to discriminate between a model
where the omitted heterogeneity is treated as fixed and correlated with the
explanatory variables, and a model where the omitted heterogeneity is treated
as random and independent of the explanatory variables.
· If the omitted effects are uncorrelated with the explanatory
variables, the random effects estimator is consistent and efficient. However,
the fixed effects estimator is consistent but not efficient given the
estimation of a large number of additional parameters (i.e., the fixed
effects).
· If the effects are correlated with the explanatory
variables, the fixed effects estimator is consistent but the random effects
estimator is inconsistent. The Hausman test provides the basis for
discriminating between these two models and the matrix version of the Hausman
test is expressed as:
[bRE– bFE][V(bFE) – V(bRE)]-1[bRE – bFE]′
~ 𝝌²k
where k is the number
of covariates (excluding the constant) in the specification. If the random
effects are correlated with the explanatory variables, then there will be a
statistically significant difference between the random effects and the fixed
effects estimates. Thus, the null and alternative hypotheses are expressed as:
H0: Random effects are independent of explanatory variables
H1: H0 is
not true.
The null hypothesis is the random
effects model and if the test statistic exceeds the relevant critical value,
the random effects model is rejected in favour of the fixed effects model. In
finite samples the inversion of the matrix incorporating the difference in the
variance-covariance matrices may be negative-definite (or negative
semi-definite) thus yielding non-interpretable values for the chi-squared.
The selection of one model over the
other might be dictated by the nature of the application. For example, if the
cross-sectional units were countries and states, it may be plausible to assume
that the omitted effects are fixed in nature and not the outcome of a random
draw. However, if we are dealing with a sample of individuals or firms drawn
from a population, the assumption of a random effects model has greater appeal.
However, the choice of which model to choose is ultimately dictated
empirically. If it does not prove possible to discriminate between the two
models on the basis of the Hausman test, it may be safest to use the fixed
effects model, where the consequences of a correlation between the fixed effects
and the explanatory variables are less devastating than is the case with the
random effects model where the consequences of failure result in inconsistent
estimates. Of course, if the random effects are found to be independent of the
covariates, the random effects model is the most appropriate because it
provides a more efficient estimator than the
fixed effects estimator.
**This tutorial is
culled from my lecture note as given by Prof. Barry Reilly (Professor of
Econometrics, University of Sussex, UK).
How
to Perform the Hausman Test in EViews
First: Load
file into EViews and create Group
data (see video on how to do this)
Second: Perform
fixed effects estimation: Quick >>
Estimate Equation >> Panel Options >> Fixed >> OK
EViews: Equation Estimation Dialog Box Source: CrunchEconometrix |
Third:
Perform random effects estimation: Quick
>> Estimate Equation >> Panel Options >> Random >> OK
Fourth: Perform
the Hausman test: View >> Fixed/Random
Effects testing >> Correlated Random Effects – Hausman Test
Fifth:
Interpret results:
Reject the null
hypothesis if the prob-value is statistically significant at 5% level. It
implies that the individual effects (ai)
correlate with the explanatory variables. Therefore use the fixed effect
estimator to run the analysis. Otherwise, use the random effects estimator.
[Watch video tutorial on performing the
Hausman test in EViews]
If you still
have comments or questions regarding how to perform the Hausman test, kindly
post them in the comments section below…..
No comments:
Post a Comment