Heteroscedasticity,
you can try to pronounce it the way I do,

*he-te-ro-sce-das-ti-ci-ty*. You see it isn’t so difficult to pronounce after all. It happens to be the longest word in the econometrics dictionary with 18 words…yes, 18 words! Can be written as heteroskedasticity but whichever way you choose to write it is fine, only be consistent with your choice. I will be sticking to heteroscedasticity…
Perhaps you have
heard about the word, so what exactly is heteroscedasticity? It may seem like a
ton of vocals in your mouth but the concept is very simple to grasp. It refers
to disturbances (errors) whose variances are not constant in a given model. It
is when the variance of the error terms differ across observations. That is,
when a data has unequal variability (dispersion) across a given set of second
predictor variables. Again what are disturbances? You may begin to think at
this point that econometrics has tons of jargons, yes, you are absolutely
right. But relax, you will understand them as you become more involved in its
processes.

So, again what
does heteroscedasticity mean? It means that in a given model, it is important
that error variances across observations are constant. For instance, one of the
assumptions of ordinary least squares (OLS) is that the model must be
homoscedastic.

*In the presence of heteroscedasticity:*

· OLS
estimators,
are still linear,
unbiased, consistent and asymptotically normally distributed. The regression
estimates and the attendant predictions remain unbiased and consistent. But the
estimators are inefficient (that is, not having minimum variance) in the class
of minimum variance estimators. Hence, OLS is not BLUE (Best Linear Unbiased Estimator),
therefore the regression predictors are also inefficient, though consistent.
What this means that the regression estimates cannot be used to construct
confidence intervals, or used for inferences.

· Heteroskedasticity causes statistical
inference based on the usual

*t*and*F*statistics to be invalid, even in large samples. As heteroskedasticity is a violation of the Gauss-Markov assumptions, OLS is no longer BLUE.*Causes of heteroscedasticity:*

Okay, having known
that the presence of heteroscedasticity in a model can invalidate statistical
tests of significance, it is important to know its causes. That is, what can
lead to heteroscedasticity being evident in your data?

· The presence
of outliers can lead to your model becoming heteroscedastic. And what are
outliers? These are simply

*bogus*figures in your data that stands out. Very obvious to the prying eyes. Doing a simple summary statistic of your data before any regression analysis, can easily detect outliers by indicating both the minimum and maximum values of a variable. For example, you may have a 30 years inflation data for country*J*and on average, the yearly inflation figures for that country hovers around 9%, 7.5%, 8.2% and suddenly you observe an inflation rate of 58.7%. Since there is no economic phenomenon to support that outrageous figure, then 58.7% is an outlier which may cause your model to become heteroscedastic.
· Wrongly
specifying your model is another factor. This can be related to the functional
form by which your model is specified. Functional form can be a log-log model
(where the dependent variable and all or some of the explanatory variables are
in natural logarithms or logs for short); a log-level model (where only the
dependent variable is transformed into natural logarithm and the explanatory
variables are in their level forms, that is, not transformed); lastly is the
level-level form.

· Wrong
data transformation. For instance over differencing a variable can be a cause.
If a variable is stationary in level at 10%, for example, I have seen cases
where students still go ahead to difference the same variable in order to
obtain stationarity at maybe 1% or 5% statistical significance. This is not
necessary. Once your variable is stationary in level, that is an

*I*(0) series, just go ahead and run your analysis. Note that further differencing the variable again, may lead to heteroscedasticity.
· Poor
data sampling method may lead to heteroscedasticity particularly when
collecting primary data.

· Skewness
of one or more regressors (closely related to outliers being evident in the
data). Regressors are explanatory or independent variables.

*Detecting heteroscedasticity*

Having known what
heteroscedasticity is and its causes, how can it be detected? The truth is that
there is no hard and fast rule for detecting heteroscedasticity. Therefore,
more often than not, heteroscedasticity may be a case of educated guesswork,
prior empirical experiences or mere speculation. However, several formal and
informal approaches can be used in detecting the presence of heteroscedasticity
but discussions will be limited to the graphical approach (plotting the
residuals form the regression against the estimated dependent variable),
Breusch-Pagan test and White test.

So, let us take an

**example**using JM Wooldridge’s GPA3.dta or GPA3.xls data to make this topic clearer. (use .xls if Stata is not installed on your devise and run the analysis using any econometric software).Regression output in Stata Source: CrunchEconometrix |

*F*-statistic is significant at the 1% level, the R

^{2}reveals that about 48% variation are explained by the independent variables.

But how do we know
if this model is heteroscedastic or not?

1). Start from the
informal approach which is plotting the squared residuals,
against
using the Stata
commands

*rvfplot*or*rvfplot, yline(0)*to see if there is a definite pattern. If a definite pattern exists, then the model is heteroscedastic.

**rvfplot, yline(0)**

From both plots, a definite pattern is observed evidencing that the model is heteroscedastic.

2). Conduct either
the Breusch-Pagan or White heteroscedasticity test after your regression to
check if the residuals of a regression have a changing variance. The Stata
commands are:

*estat hettest*and*estat imtest, white*. If the obtained*p*-values are significant, then the model exhibits heteroscedasticity and if otherwise, then the model is homoscedastic.**estat hettest**

Breusch-Pagan/Cook-Weisberg
test for heteroscedasticity

Ho: Constant variance

Variables: fitted values of trmgpa

chi2(1) =
14.12

Prob > chi2 =
0.0002

**estat imtest, white**

White's test for
Ho: homoscedasticity

against Ha: unrestricted heteroscedasticity

chi2(33) =
61.22

Prob > chi2 =
0.0020

//the null
hypotheses for both tests are that the model is homoscedastic. But since the

*p*-values for both tests are significant, the null hypothesis is rejected in favour of the alternative hypothesis evidencing that the model is heteroscedastic*Controlling/Correcting heteroscedasticity*

Also, as a
pre-condition it is advisable to run your analysis using White’s heteroscedasticity-robust
standard errors by including the

*robust*option in the command line like this example:**reg trmgpa crsgpa cumgpa tothrs sat hsperc female season, robust**

By using this
code, the problem of heteroscedasticity is controlled in comparison to if the

*robust*option is not used.**Assignment:**Using Wooldridge’s hprice1.dta or hprice1.xls data, how can you detect if the model is heteroscedastic and how will you correct it? Compare the usual standard errors with the obtained heteroscedasticity-robust standard errors. What do you observe?

So, with this
brief and practical tutorial, you can confidently run your regressions and test
if your model suffers from heteroscedasticity or not….good luck!

**Post your comments and questions….**

## No comments:

## Post a Comment