Introduction
to Panel Data Models
The panel data approach pools time series data with cross-sectional data. Depending on the application, it can comprise a sample of individuals, firms, countries, or regions over a specific time period. The general structure of such a model could be expressed as follows:
Yit = ao
+ bXit + uit
where uit
~ IID(0, 𝜎2) and i = 1, 2,
..., N individual-level observations,
and t = 1, 2, ...,T time series observations.
In this
application, it is assumed that Yit
is a continuous variable. In this model, the observations of each individual,
firm or country are simply stacked over time on top of each another. This is
the standard pooled model where intercepts and slope coefficients are homogeneous across all N cross-sections and through all T time periods. The application of OLS
to this model ignores the temporal
and spatial dimension inherent in the data and thus throws away useful
information. It is important to note that the temporal dimension captures the
‘within’ variation in the data while the spatial dimension captures the
‘between’ variation in the data. The pooled OLS estimator exploits both
‘between’ and ‘within’ dimensions of the data but does not do so efficiently.
Thus, in this procedure each observation is given equal weight in estimation.
In addition, the unbiasedness and consistency of the estimator requires that
the explanatory variables are uncorrelated with any omitted factors. The
limitations of OLS in such an application prompted interest in alternative
procedures. There are a number of different panel estimators but the most popular
is the fixed effects (or ‘within’) estimator.
Fixed
Effects or Random Effects?
The question is
usually asked which econometric model an investigator should use when modelling
with panel data. The different models can generate considerably different results
and this has been documented in many empirical studies. In terms of a model
where time effects are assumed absent for simplicity, the model to be estimated
may be given by:
Yit = ai + bXit + uit
The question,
therefore, is do we treat ai as
fixed or random? The following points are worth noting.
· The
estimation of the fixed effects model is costly in terms of degrees of freedom.
This is a statistical and not a computing cost. It is particularly problematic
when N is large and T is small. The occurrence of large N and small T currently tends to characterize most panel data applications
encountered.
· The ai terms are taken to
characterize (for want of a better expression) investigator ignorance. In the
fixed effects model does it make sense to treat one type of investigator
ignorance (ai) as fixed
but another as random (uit)?
· The
fixed effects formulation is viewed as one where investigators make inferences
conditional on the fixed effects in the sample.
· The
random effects formulation is viewed as one where investigators make
unconditional inferences with respect to the population of all effects.
· The
random effects formulation treats the random effects as independent of the
explanatory variables (i.e. E(ai Xit) = 0). Violation of this assumption leads to bias
and inconsistency in the b
vector.
Advantage
and disadvantage of the fixed effects model
The main
advantage of the fixed effects model is its relative ease of estimation and the
fact that it does not require independence of the fixed effects from the other
included explanatory variables. The main disadvantage is that it requires
estimation of N separate intercepts.
This causes problems because much of the variation that exists in the data may
be used up in estimating these different intercept terms. As a consequence, the
estimated effects (the bs)
for other explanatory variables in the regression model may be imprecisely
estimated. These might represent the more important parameters of interest from
the perspective of policy. As noted above the fixed effects estimator is
derived using the deviations between the cross-sectional observations and the
long-run average value for the cross-sectional unit. This problem is most
acute, therefore, when there is little variation or movement in the
characteristics over time, that is when
the variables are rarely-changing or they are time-invariant. In essence,
the effects of these variables are eliminated from the analysis.
Advantage
and disadvantage of the random effects model
The main
advantage of the random effects estimator is that it uses up fewer degrees of
freedom in estimation and allows for the inclusion
of time invariant covariates. The main disadvantage of the model is the
assumption that the random effects are independent of the included explanatory
variables. It is fairly plausible that there may be unobservable attributes not
included in the regression model that are correlated with the observable
characteristics. This procedure, unlike fixed effects, does not allow for the
elimination of the omitted heterogeneous effects.
The
Hausman Test
In determining
which model is the more appropriate to use, a statistical test can be
implemented. The Hausman test compares the random effects estimator to the
‘within’ estimator. If the null is rejected, this favours the ‘within’
estimator’s treatment of the omitted effects (i.e., it favours the fixed effects
but only relative to the random effects). The use of the test in this case is
to discriminate between a model where the omitted heterogeneity is treated as
fixed and correlated with the explanatory variables, and a model where the
omitted heterogeneity is treated as random and independent of the explanatory
variables.
· If
the omitted effects are uncorrelated with the explanatory variables, the random
effects estimator is consistent and efficient. However, the fixed effects
estimator is consistent but not efficient given the estimation of a large
number of additional parameters (i.e., the fixed effects).
· If
the effects are correlated with the explanatory variables, the fixed effects
estimator is consistent but the random effects estimator is inconsistent. The
Hausman test provides the basis for discriminating between these two models and
the matrix version of the Hausman test is expressed as:
[bRE– bFE][V(bFE) – V(bRE)]-1[bRE – bFE]′ ~ 𝝌²k
where k is the number of covariates (excluding
the constant) in the specification. If the random effects are correlated with
the explanatory variables, then there will be a statistically significant
difference between the random effects and the fixed effects estimates. Thus,
the null and alternative hypotheses are expressed as:
H0:
Random effects are independent of explanatory variables
H1: H0 is not
true.
The null
hypothesis is the random effects model and if the test statistic exceeds the
relevant critical value, the random effects model is rejected in favour of the
fixed effects model. In finite samples the inversion of the matrix
incorporating the difference in the variance-covariance matrices may be
negative-definite (or negative semi-definite) thus yielding non-interpretable
values for the chi-squared.
The selection of
one model over the other might be dictated by the nature of the application.
For example, if the cross-sectional units were countries and states, it may be
plausible to assume that the omitted effects are fixed in nature and not the
outcome of a random draw. However, if we are dealing with a sample of
individuals or firms drawn from a population, the assumption of a random
effects model has greater appeal. However, the choice of which model to choose
is ultimately dictated empirically. If it does not prove possible to
discriminate between the two models on the basis of the Hausman test, it may be
safest to use the fixed effects model, where the consequences of a correlation
between the fixed effects and the explanatory variables are less devastating
than is the case with the random effects model where the consequences of
failure result in inconsistent estimates. Of course, if the random effects are
found to be independent of the covariates, the random effects model is the most
appropriate because it provides a more efficient estimator than the
fixed effects estimator.
**This tutorial is culled from my lecture
note as given by Prof. Barry Reilly (Professor of Econometrics, University of
Sussex, UK).
How
to Perform the Hausman Test in Stata
First: Open
a log file, load data into Stata, use a do-file (to replicate your research)
Second: Inform
Stata that you are using a panel with ‘id’
the cross-sectional indicator and 'year'
the time period indicator to prepare for panel data analysis.
xtset
id year
Third:
Create year dummies (to capture time variations in the data)
tab
year, gen(yr)
Fourth: Run
the fixed effects model and store the results
eststo
fixed: xtreg y x1 x2 x3 x4 yr2 –
yr..., fe i(c_id)
Fifth: Run
the random effects model and store the results
eststo
random: xtreg y x1 x2 x3 x4 yr2 –
yr..., re i(c_id)
Sixth: Run
the Hausman test
hausman
fixed random
Seventh:
Interpret results: Reject the null
hypothesis if the prob-value is statistically significant at 5% level. It
implies that the individual effects (ai)
correlate with the explanatory variables. Therefore use the fixed effect
estimator to run the analysis. Otherwise, use the random effects estimator.
[Watch video tutorial on performing the
Hausman test in Stata]
If you still have
comments or questions regarding how to perform the Hausman test, kindly post
them in the comments section below…..
No comments:
Post a Comment