## Wednesday, 10 January 2018

### Multicollinearity

Multicollinearity, pronounced as mul-ti-co-lli-nea-ri-ty is the second longest word in the econometrics dictionary after heteroscedasticity. It contains 17 talking words! It occurs when there exists perfect or exact linear dependence or relationships among explanatory variables in a given model. Collinearity is when such exact dependence is between two variables. In that wise, we say that the variables are collinear. For instance, when an explanatory variable is 80 to 100% explained by another explanatory variable, separating the influence of each of them on the dependent variable (regressand) becomes difficult and interpreting the estimated coefficients from that model will also be problematic. This is because variation in one regressor can be completely explained by another regressor in the same model.

With perfect or less than perfect multicollinearity or collinearity:

Ø  Regression coefficients are indeterminate (because the collinear variables cannot be distinguished from one another)

Ø  Standard errors are infinite (they are very large)

Ø  Estimates are biased

Ø  Coefficients cannot be estimated with precision or accuracy

Note: multicollinearity does not violate any regression assumptions; the OLS estimators are still BLUE (Best Linear Unbiased Estimators); it does not destroy the property of minimum variance.

Multicollinearity can be detected using “r” the coefficient of correlation. So, if r = 1, then multicollinearity or collinearity exists. So whenever you run your correlation matrix, look out for those relationship where r > 0.8, that tells us that the respective variables are collinear. Multicollinearity is ruled out when regressors in a model have non-linear relationships.

A major problem associated with multicollinearity is that, if r is high, then the standard error will be high and the computed t-statistic will be low making it more likely not to reject the null hypothesis when is false. Thereby committing a Type II error….that is, incorrectly retaining a very false null hypothesis.

How do you know if your model suffers from multicollinearity?

Ø  High R2

Ø  Few significant t-ratios

Ø  Wider confidence intervals

Ø  Contradictory signs of beta coefficients to expected a priori

Ø  Estimates are sensitive to even small changes in model specification

Ø  High pair-wise correlation statistic among the regressors

Ø  From the tolerance level and variance inflation factor (VIF). A tolerance level lower than 0.10 and a VIF of 10 are indicative of multicollinearity in a model.  A higher VIF provides evidence of multicollinearity.

Correcting/controlling for multicollinearity:

Ø  Collect more data

Ø  Change the scope of analysis

Ø  Do not include collinear variables in the same regression

Ø  Drop the highly collinear variable

Ø  Transform the collinear variable through differencing (however, the differenced error term is serially-correlated and violates OLS assumptions).

What I do often, is to drop the collinear variable and if that variable is very important to my model, I’ll transform my modelling structure into a step-wise fashion such that collinear variables are not included together in the same regression.

[Watch video on multicollinearity]

Back to Home