**Introduction to Panel Data Models**

**The panel data approach pools time series data with cross-sectional data. Depending on the application, it can comprise a sample of individuals, firms, countries, or regions over a specific time period. The general structure of such a model could be expressed as follows:**

*Y*

_{it}= a

_{o}+ b

*X*

_{it}+ u

_{it}

where u

_{it}~ IID(0, 𝜎^{2}) and*i*= 1, 2, ...,*N*individual-level observations, and*t*= 1, 2, ...,*T*time series observations.
In this application, it is assumed
that

*Y*_{it}is a continuous variable. In this model, the observations of each individual, firm or country are simply stacked over time on top of each another. This is the standard pooled model where intercepts and slope coefficients are*homogeneous*across all*N*cross-sections and through all*T*time periods. The application of OLS to this model*ignores*the temporal and spatial dimension inherent in the data and thus throws away useful information. It is important to note that the temporal dimension captures the ‘within’ variation in the data while the spatial dimension captures the ‘between’ variation in the data. The pooled OLS estimator exploits both ‘between’ and ‘within’ dimensions of the data but does not do so efficiently. Thus, in this procedure each observation is given equal weight in estimation. In addition, the unbiasedness and consistency of the estimator requires that the explanatory variables are uncorrelated with any omitted factors. The limitations of OLS in such an application prompted interest in alternative procedures. There are a number of different panel estimators but the most popular is the fixed effects (or ‘within’) estimator.**Fixed Effects or Random Effects?**

The question is usually asked which
econometric model an investigator should use when modelling with panel data.
The different models can generate considerably different results and this has
been documented in many empirical studies. In terms of a model where time
effects are assumed absent for simplicity, the model to be estimated may be
given by:

*Y*

_{it}=

*a*+

_{i}**b**

*X***+ u**

_{it}_{it}

The question, therefore, is do we
treat

*a*as fixed or random? The following points are worth noting._{i}
·1) The
estimation of the fixed effects model is costly in terms of degrees of freedom.
This is a statistical and not a computing cost. It is particularly problematic
when

*N*is large and*T*is small. The occurrence of large*N*and small*T*currently tends to characterize most panel data applications encountered.
·2) The

*a*terms are taken to characterize (for want of a better expression) investigator ignorance. In the fixed effects model does it make sense to treat one type of investigator ignorance (_{i}*a*) as fixed but another as random (u_{i}_{it})?
·3) The fixed effects formulation is viewed as one
where investigators make inferences conditional on the fixed effects in the
sample.

4)The
random effects formulation is viewed as one where investigators make
unconditional inferences with respect to the population of all effects.

· 5) The
random effects formulation treats the random effects as independent of the
explanatory variables (i.e.

*E*(*a*_{i}**X**_{it}) = 0). Violation of this assumption leads to bias and inconsistency in the**b**vector.**Advantage and disadvantage of the fixed effects model**

The main advantage of the fixed effects
model is its relative ease of estimation and the fact that it does not require
independence of the fixed effects from the other included explanatory
variables. The main disadvantage is that it requires estimation of

*N*separate intercepts. This causes problems because much of the variation that exists in the data may be used up in estimating these different intercept terms. As a consequence, the estimated effects (the**b**s) for other explanatory variables in the regression model may be imprecisely estimated. These might represent the more important parameters of interest from the perspective of policy. As noted above the fixed effects estimator is derived using the deviations between the cross-sectional observations and the long-run average value for the cross-sectional unit. This problem is most acute, therefore, when there is little variation or movement in the characteristics over time,*that is when the variables are rarely-changing or they are time-invariant*. In essence, the effects of these variables are eliminated from the analysis.**Advantage and disadvantage of the random effects model**

The main advantage of the random
effects estimator is that it uses up fewer degrees of freedom in estimation
and

*allows for the inclusion of time invariant covariates*. The main disadvantage of the model is the assumption that the random effects are independent of the included explanatory variables. It is fairly plausible that there may be unobservable attributes not included in the regression model that are correlated with the observable characteristics. This procedure, unlike fixed effects, does not allow for the elimination of the omitted heterogeneous effects.**The Hausman Test**

In determining which model is the more
appropriate to use, a statistical test can be implemented. The Hausman test
compares the random effects estimator to the ‘within’ estimator. If the null is
rejected, this favours the ‘within’ estimator’s treatment of the omitted
effects (i.e., it favours the fixed effects but only relative to the random
effects). The use of the test in this case is to discriminate between a model
where the omitted heterogeneity is treated as fixed and correlated with the
explanatory variables, and a model where the omitted heterogeneity is treated
as random and independent of the explanatory variables.

· If the omitted effects are uncorrelated with the explanatory
variables, the random effects estimator is consistent and efficient. However,
the fixed effects estimator is consistent but not efficient given the
estimation of a large number of additional parameters (i.e., the fixed
effects).

· If the effects are correlated with the explanatory
variables, the fixed effects estimator is consistent but the random effects
estimator is inconsistent. The Hausman test provides the basis for
discriminating between these two models and the matrix version of the Hausman
test is expressed as:

[

**b**_{RE}–**b**_{FE}][**V**(**b**_{FE}) –**V**(**b**_{RE})]^{-1}[**b**_{RE}–**b**_{FE}]′ ~ 𝝌²_{k}
where

*k*is the number of covariates (excluding the constant) in the specification. If the random effects are correlated with the explanatory variables, then there will be a statistically significant difference between the random effects and the fixed effects estimates. Thus, the null and alternative hypotheses are expressed as:##
** H**_{0}: Random effects are
independent of explanatory variables

_{0}: Random effects are independent of explanatory variables

**H**

_{1}: H_{0}is not true.
The null hypothesis is the random
effects model and if the test statistic exceeds the relevant critical value,
the random effects model is rejected in favour of the fixed effects model. In
finite samples the inversion of the matrix incorporating the difference in the
variance-covariance matrices may be negative-definite (or negative
semi-definite) thus yielding non-interpretable values for the chi-squared.

The selection of one model over the
other might be dictated by the nature of the application. For example, if the
cross-sectional units were countries and states, it may be plausible to assume
that the omitted effects are fixed in nature and not the outcome of a random
draw. However, if we are dealing with a sample of individuals or firms drawn
from a population, the assumption of a random effects model has greater appeal.
However, the choice of which model to choose is ultimately dictated
empirically. If it does not prove possible to discriminate between the two
models on the basis of the Hausman test, it may be safest to use the fixed
effects model, where the consequences of a correlation between the fixed effects
and the explanatory variables are less devastating than is the case with the
random effects model where the consequences of failure result in inconsistent
estimates. Of course, if the random effects are found to be independent of the
covariates, the random effects model is the most appropriate because it
provides a more

**estimator than the fixed effects estimator.**__efficient__***This tutorial is culled from my lecture note as given by Prof. Barry Reilly (Professor of Econometrics, University of Sussex, UK).*

**How to Perform the Hausman Test in EViews**

**First**: Load file into EViews and create

**Group**data (see video on how to do this)

**Second**: Perform fixed effects estimation:

**Quick >> Estimate Equation >> Panel Options >> Fixed >> OK**

EViews: Equation Estimation Dialog Box Source: CrunchEconometrix |

**Third**: Perform random effects estimation:

**Quick >> Estimate Equation >> Panel Options >> Random >> OK**

**Fourth**: Perform the Hausman test:

**View >> Fixed/Random Effects testing >> Correlated Random Effects – Hausman Test**

**Fifth**: Interpret results:

Reject the null
hypothesis if the prob-value is statistically significant at 5% level. It
implies that the individual effects (

*a*) correlate with the explanatory variables. Therefore use the fixed effect estimator to run the analysis. Otherwise, use the random effects estimator._{i}**[Watch video tutorial on performing the Hausman test in EViews]**

If you still
have comments or questions regarding how to perform the Hausman test, kindly
post them in the comments section below…..