##
**General Overview on Lag Selection**

##
Since
this blog is tailored for beginners in econometrics, I will not be engaging an
advanced discussion on the topic but an introductory approach by which a
beginner can understand the essence of using lags in a model and the pitfalls
that may occur if lags are excessively used. Interested readers who require more advanced
information on lag selection can consult appropriate econometric textbooks. Having said that, in
economics the dependence of a variable *Y**
*(outcome variable or regressand) on another variable(s) *X** *(the predictor variable
or regressor) is rarely instantaneous. Very often, *Y** *responds to *X**
*with a lapse of time. Such a lapse of time is called a *lag*. Therefore, in time series analysis, some level of care
must be exercised when including lags in a model.

*Y*

*X*

*Y*

*X*

**So how many lags should be used in a model?**There is no hard-and-fast-rule on the choice of lag length. It is basically an empirical issue. As noted in Damodar Gujarati

*Basic Econometrics,*there is no

*a priori*guide as to what the maximum length of the lag should be. The researcher must bear in mind that, as one estimates successive lags, there are fewer degrees of freedom left, making statistical inference somewhat unstable. Economists are usually not that lucky to have a long series of data so that they can go on estimating numerous lags. More importantly, in economic time series data, successive values (lags) tend to be highly correlated increasing the likelihood of multicollinearity in the model.

Also, from
Jeffery Wooldridge’s

*Introductory Econometrics: A Modern Approach*with annual data, the number of lags is typically small, 1 or 2 lags in order not to lose degrees of freedom. With quarterly data, 1 to 8 lags is appropriate, and for monthly data, 6, 12 or 24 lags can be used given sufficient data points. Again, in the words of Damodar Gujarati*Basic Econometrics*“the sequential search for the lag length opens the researcher to the charge of data mining”**.**He further stated that the nominal and true level of significance to test statistical hypotheses becomes an important issue in such sequential searches”. For instance, if the lag length,*k*, is incorrectly specified, the researcher will have to contend with the problem of misspecification errors. In addition, because of the lags involved, distributed and or autoregressive models raise the topic of causality in economic variables.
Hence,
before
you estimate a time series equation, it is necessary to decide on the maximum
lag length. Like I mentioned earlier, this is purely an empirical question.
Suppose there are 40 observations in all, by including too many lagged values,
your model consumes degrees of freedom, not to mention introducing the
likelihood of multicollinearity occurring. As noted in my previous tutorial on
multicollinearity, it leads to imprecise estimation; that is, the standard
errors tend to be inflated in relation to the estimated coefficients. As a
result, based on the routinely computed

*t*ratios, we may tend to declare (erroneously), that a lagged coefficient(s) is statistically insignificant. In the same vein, including too few lags will lead to specification errors. The easiest way out of this quagmire, is to decide using a criterion like the Akaike or Schwarz and choose that model that gives the lowest values of these criteria. Most econometric packages easily compute these optimal lag length but note some trial and error is inevitable.#
**Choosing Optimal
Lags in EViews**

For instance, if
there are limited observations in a vector autoregressive (VAR) estimation, it
is often advised to use the Akaike Selection Criterion (AIC) in selecting the
lag length that "prefers" the more parsimonious models. However, the
information criterion with the smallest criterion value evidences the most
ideal lag length to employ. Most researchers prefer using the Akaike
information criterion (AIC) but my valuable advice is always to select that
criterion with the smallest value, because that ensures the model will be
stable. Let us begin by showing how you can select the optimal lag order for
your model and variables using the EViews analytical package.

Please note that
in EViews, the procedure is simply to

*run an initial VAR on the variables at level with the default settings and obtain the results*. I will go through the steps in detail.
For this
tutorial, I will extract data from

**Gujarati and Porter Table 21.1**dataset. It is a quarterly data on United States from 1970 to 1991, which is 88 observations. The variables are*gdp*(gross domestic product),*pdi*(personal disposable income) and*pce*(personal consumption expenditure).**Step 1: Load Data into EViews**

To import the
Excel file into EViews, go to:

**File**>>**Import**>>**Import from file**>>**Next**>>**Finish**. If it is correctly done, you obtain:EViews Workfile Source: CrunchEconometrix |

From the EViews
interface, the three variables

*gdp*,*pce*and*pdi*are individually shown. Double-clicking on each variable shows them in separate sheets, like is:EViews Creating Group Data Source: CrunchEconometrix |

Step 2: Create Group Data

Step 2: Create Group Data

But because I
need to obtain the optimal lag for the model, it becomes necessary to open this
data as a GROUP by putting all three variables in a worksheet. To do that: Press down the

**Cntrl key**>> click on**and***gdp*,*pce***>> Right click on any part of the screen >>***pdi***Open**>>**as Group:**

EViews - Open as Group Data Source: CrunchEconometrix |

When you click

**"as Group"**, you should have this:EViews Group Data Source: CrunchEconometrix |

**Step 3: Run Unrestricted VAR model**

Now that our
variables are grouped, next is to run an

**with***unrestricted VAR model***of the variables and taking different lags before deciding which model is the best. Remember, I am using quarterly data which allows me to use up to 8 lags. But if yours is a yearly data you can use 2 lags at the most in order not to lose too many degrees of freedom or if monthly data, up to 24 lags. The***the level***unrestricted VAR**is chosen only on the assumption that the three variables are not co-integrated.**Note: if the variables are cointegrated, you should run the vector error correction model**

To run the

*unrestricted VAR model*, go to:**Quick**>>**Estimate VAR**>> Dialog box opens:EViews VAR Specification Source: CrunchEconometrix |

Type in all the
variables names in the

**Endogenous variables box**(note under VAR, there is no exogenous variable, all variables are endogenous). Since between 1 to 8 lags can be used because I am using a quarterly data, I begin with 4 lags before deciding which model is the best.
Click

**OK**….here is the output (to save space only relevant part shown):EViews Regression Output Source: CrunchEconometrix |

The EViews
output reports among others, the AIC and Schwarz criterion. You will also observe
that the output returned 2 sets of results, those identified by

**red bracket**are for the respective endogenous variables with each column representing the result for*gdp, pce*and*pdi*in that order. But the results we are most interested in are those identified by the**blue bracket**. These are the estimates for the VAR system. However, at this moment, we are only interested in the criterion. Hence, between the AIC and Schwartz, the former’s criterion of**26.85144**is lower than that of Schwartz at**27.98004**. Therefore, we conclude based on this output that the lag selection must be based on the AIC.**Step 4: Choose Optimal Lag length for the Model**

However, we
cannot be running the unrestricted VAR model using different lag lengths before
deciding on the best model to adopt, there is a simplified way of obtaining the
optimal lag structure at once given a variety of information criteria. To do
that, click on

**View**>>**Lag****Structure**>>**Lag****Length****Criteria**>> the**Lag Specification**dialog box opens:EViews Lag Specification Dialogue Box Source:CrunchEconometrix |

Note: I put in 8 lags because I am at liberty to use up
to 8 lags due to the nature of my data (quarterly). So, if yours is a yearly
data, you may put in 2.

Click

**OK**to obtain the various information criterion from lag 0 to 8 shown below:EViews Model Lag Structure Source: CrunchEconometrix |

From the output,
the selected lag order is indicated by an

**asterisk sign (*)**which is distributed between lags 1 and 2, but mostly on lag order 2. The rule-of-thumb is to select the criterion with the lowest value which again is the AIC at**26.90693**this is because the lower the value, the better the model. We can conclude that the optimal lag length for the model is 2 and the best criterion to adopt for the model is AIC.
The same
procedure can be adopted in obtaining the respective lags for each variable.
For instance to obtain for

*gdp*:
1.
Double click on

*gdp*>>**Quick**>> Run the unrestricted VAR >>**OK**>> Obtain the output
2.
Click

**View**>>**Lag****Structure**>>**Lag Length Criteria**>>**Lag Specification**dialog box opens >>**OK**
…and you obtain
this:

EViews - Lag Structure for gdpSource: CrunchEconometrix |

From the output,
the best criterion that fits the

*gdp*model is the AIC with the lowest figure of**9.937278**meaning that the optimal lag length for*gdp*is 2.
Doing the same
procedure for

*pce*, here is the result:EViews - Lag Structure for pceSource: CrunchEconometrix |

From the output,
the optimal lag length for

*pce*model is 4 given the AIC value at**8.698617**which the lowest among the criterion, hence it is the best criterion for the*pce*model. For*pdi*, the optimal lag length is 1 given the AIC value at**9.602079**shown below:EViews - Lag Structure for pdiSource: CrunchEconometrix |

**: There are also cases where the used lag length is that which is most selected by the criterion named after the econometricians who developed them, like HQ, SIC, AIC and LR, etc. Some researchers prefer Schwartz criterion when the variables are more than 4 and use the AIC when the variables are less than 4. As, mentioned in the introductory part of this tutorial, the decision on the choice of lag is purely an empirical issue. Generally, we choose the lag length for which the values of most of these lag length criteria are minimised, indicated by asterisks in the EViews output.**

*Caveat***[Watch video tutorial on optimal lag selection using EViews]**

Having
gone through this tutorial, it will be easy to understand and know how to determine the optimal
lags for a model regardless of the analytical package used. Remember that the “Lag
length criteria” indicates a definite way of selecting the optimal lags after estimating the initial VAR model. Also VAR and ARDL models
are susceptible to arbitrary use of lags as this may erode the degrees of
freedom, weaken the significance of the coefficients, may induce
auto-correlation and weaken the strength of diagnostic tests.

## No comments:

## Post a comment