Heteroscedasticity,
you can try to pronounce it the way I do, he-te-ro-sce-das-ti-ci-ty.
You see it isn’t so difficult to pronounce after all. It happens to be the
longest word in the econometrics dictionary with 18 words…yes, 18 words! Can be
written as heteroskedasticity but whichever way you choose to write it is fine,
only be consistent with your choice. I will be sticking to heteroscedasticity…
Perhaps you have
heard about the word, so what exactly is heteroscedasticity? It may seem like a
ton of vocals in your mouth but the concept is very simple to grasp. It refers
to disturbances (errors) whose variances are not constant in a given model. It
is when the variance of the error terms differ across observations. That is,
when a data has unequal variability (dispersion) across a given set of second
predictor variables. Again what are disturbances? You may begin to think at
this point that econometrics has tons of jargons, yes, you are absolutely
right. But relax, you will understand them as you become more involved in its
processes.
So, again what
does heteroscedasticity mean? It means that in a given model, it is important
that error variances across observations are constant. For instance, one of the
assumptions of ordinary least squares (OLS) is that the model must be
homoscedastic.
In
the presence of heteroscedasticity:
· OLS
estimators,
are still linear,
unbiased, consistent and asymptotically normally distributed. The regression
estimates and the attendant predictions remain unbiased and consistent. But the
estimators are inefficient (that is, not having minimum variance) in the class
of minimum variance estimators. Hence, OLS is not BLUE (Best Linear Unbiased Estimator),
therefore the regression predictors are also inefficient, though consistent.
What this means that the regression estimates cannot be used to construct
confidence intervals, or used for inferences.
· Heteroskedasticity causes statistical
inference based on the usual t and F statistics to be invalid,
even in large samples. As heteroskedasticity is a violation of the Gauss-Markov
assumptions, OLS is no longer BLUE.
Causes
of heteroscedasticity:
Okay, having known
that the presence of heteroscedasticity in a model can invalidate statistical
tests of significance, it is important to know its causes. That is, what can
lead to heteroscedasticity being evident in your data?
· The presence
of outliers can lead to your model becoming heteroscedastic. And what are
outliers? These are simply bogus
figures in your data that stands out. Very obvious to the prying eyes. Doing a
simple summary statistic of your data before any regression analysis, can
easily detect outliers by indicating both the minimum and maximum values of a
variable. For example, you may have a 30 years inflation data for country J and on average, the yearly inflation
figures for that country hovers around 9%, 7.5%, 8.2% and suddenly you observe
an inflation rate of 58.7%. Since there is no economic phenomenon to support
that outrageous figure, then 58.7% is an outlier which may cause your model to
become heteroscedastic.
· Wrongly
specifying your model is another factor. This can be related to the functional
form by which your model is specified. Functional form can be a log-log model
(where the dependent variable and all or some of the explanatory variables are
in natural logarithms or logs for short); a log-level model (where only the
dependent variable is transformed into natural logarithm and the explanatory
variables are in their level forms, that is, not transformed); lastly is the
level-level form.
· Wrong
data transformation. For instance over differencing a variable can be a cause.
If a variable is stationary in level at 10%, for example, I have seen cases
where students still go ahead to difference the same variable in order to
obtain stationarity at maybe 1% or 5% statistical significance. This is not
necessary. Once your variable is stationary in level, that is an I(0) series, just go ahead and run your
analysis. Note that further differencing the variable again, may lead to
heteroscedasticity.
· Poor
data sampling method may lead to heteroscedasticity particularly when
collecting primary data.
· Skewness
of one or more regressors (closely related to outliers being evident in the
data). Regressors are explanatory or independent variables.
Detecting
heteroscedasticity
Having known what
heteroscedasticity is and its causes, how can it be detected? The truth is that
there is no hard and fast rule for detecting heteroscedasticity. Therefore,
more often than not, heteroscedasticity may be a case of educated guesswork,
prior empirical experiences or mere speculation. However, several formal and
informal approaches can be used in detecting the presence of heteroscedasticity
but discussions will be limited to the graphical approach (plotting the
residuals form the regression against the estimated dependent variable),
Breusch-Pagan test and White test.
So, let us take an
example using JM Wooldridge’s GPA3.dta or GPA3.xls
data to make this topic clearer. (use .xls if Stata is not installed on
your devise and run the analysis using any econometric software).
Regression output in Stata Source: CrunchEconometrix |
But how do we know
if this model is heteroscedastic or not?
1). Start from the
informal approach which is plotting the squared residuals,
against
using the Stata
commands rvfplot or rvfplot, yline(0) to see if there is a
definite pattern. If a definite pattern exists, then the model is
heteroscedastic.
rvfplot, yline(0)
From both plots, a definite pattern is observed evidencing that the model is heteroscedastic.
2). Conduct either
the Breusch-Pagan or White heteroscedasticity test after your regression to
check if the residuals of a regression have a changing variance. The Stata
commands are: estat hettest and estat imtest, white. If the obtained p-values are significant, then the model
exhibits heteroscedasticity and if otherwise, then the model is homoscedastic.
estat hettest
Breusch-Pagan/Cook-Weisberg
test for heteroscedasticity
Ho: Constant variance
Variables: fitted values of trmgpa
chi2(1) =
14.12
Prob > chi2 =
0.0002
estat imtest, white
White's test for
Ho: homoscedasticity
against Ha: unrestricted heteroscedasticity
chi2(33) =
61.22
Prob > chi2 =
0.0020
//the null
hypotheses for both tests are that the model is homoscedastic. But since the p-values for both tests are significant,
the null hypothesis is rejected in favour of the alternative hypothesis evidencing
that the model is heteroscedastic
Controlling/Correcting
heteroscedasticity
Also, as a
pre-condition it is advisable to run your analysis using White’s heteroscedasticity-robust
standard errors by including the robust
option in the command line like this example:
reg trmgpa crsgpa cumgpa tothrs sat
hsperc female season, robust
By using this
code, the problem of heteroscedasticity is controlled in comparison to if the robust option is not used.
Assignment: Using Wooldridge’s
hprice1.dta or hprice1.xls data, how
can you detect if the model is heteroscedastic and how will you correct it? Compare
the usual standard errors with the obtained heteroscedasticity-robust standard
errors. What do you observe?
So, with this
brief and practical tutorial, you can confidently run your regressions and test
if your model suffers from heteroscedasticity or not….good luck!
No comments:
Post a Comment