## How to Interpret Regression Output in Stata

This period happens to be the

*dissertation semester*for undergraduate students in most universities, at least for those with undisrupted academic calendar J. The students are in different stages of their*project*, as it is commonly called. Some are yet to wrap up their chapter one which gives the “study background” and the framing of research hypotheses, objectives and questions. Some have moved on to chapter two reviewing relevant literature related to their scope of study. Others have gone further in developing both the theoretical and empirical frameworks for chapter three, but not without the usual teething lags…but they’ll get around it, somehow J. A handful have even done better by progressing to chapter four attempting to analyse their data.
Since, chapters one to three are
relative to each students’ scope of research, but a regression output is common
to all (although actual outcomes differ), I decided to do this tutorial in
explaining the basic features of a regression output. Also, this write-up is in
response to requests received from readers on (1) what some specific figures in
a regression output are and (2) how to interpret the results. Let me state here
that regardless of the analytical software whether Stata, EViews, SPSS, R,
Python, Excel etc. what you obtain in a regression output is common to all
analytical packages.

For instance, in undertaking an ordinary
least squares (OLS) estimation using any of these applications, the regression
output will churn out the ANOVA (analysis of variance) table,

*F*-statistic,*R*-squared, prob-values, coefficient, standard error,*t*-statistic, degree of freedom, 95% confidence interval and so on. These are the basic features of a regression output regardless of your model and/or estimation technique. However, the issue is: what do they mean and how can they be interpreted and related to your study.
Hence, the essence of this tutorial is to
teach students the relevance of these features and how to interpret their
results. I will be using (See "How-to-interpret regression output" here for

**Stata**analytical package to explain a regression output, but you can practise along using any analytical package of your choice.**EViews**and**Excel**users)**An Example: Using Gujarati and Porter Dataset Table7_12.dta or Table7_12.xlsx dataset**

Note: In this tutorial I will not be discussing
stationarity or cointegration analysis (those topics will be covered in subsequent tutorials). Since the purpose is simply to explain the basic features of a regression output, I will only be doing a simple linear regression
analysis (a bi-variate analysis) with only one explanatory variable.

The dataset is on the United States from
1960 to 2009 (50 years data). The outcome variable is consumption expenditure (

*pce*) and the explanatory variable is income (*income*).**First step: load data in excel format into Stata**

Here is the data in excel format:

Data in Excel format Source: CrunchEconometrix |

And here is the data in Stata format:

Data in Stata format Source: CrunchEconometrix |

**Second step: Set the time variable in Stata for analysis**

Before analysing the data, you must set
up the time variable in readiness for the regression. The general code is:

*tsset timevar*

in my case, the time variable is

*obs*, and my code becomes:*tsset obs*

and Stata responds with:

Time set command in Stata Source: CrunchEconometrix |

The

*tsset*implies**“time series set”**and as you can see, the begin year is 1960 and the end year is 2009. You must always do this after loading your data and before you begin your regressions.**Third step: Visualise the relationship between the variables**

Before analysing the data, it is good to
always graph the dependent and key explanatory variable (using a scatter plot) in
order to observe the pattern between them. It kind of gives you what to expect
in your actual analysis.

So, to graph

*pce*and*income*, the Stata code is:*twoway (scatter pce income)*

The scatter diagram
indicates a positive relationship between the two variables:

Scatter plot of the variables Source: CrunchEconomterix |

This positive relationship seems
plausible because the more income you have, the more you’ll want to consume, except
you are very frugal J.

**Fourth step: The scientific investigation**

Now we want to scientifically
investigate the relationship between

*pce*and*income*. The Stata code is:*regress pce income*

(You have simply told Stata to regress
the dependent variable,

*pce*, on the explanatory variable,*income*), and the output is shown as:Regression output in Stata Source: CrunchEconometrix |

**Fifth step: The features of a regression output**

So what do these figures mean? I will
explain each feature in turns.

**Source**: there are two sources of variation in the dependent variable,

*pce*. Those explained by the regression (i.e, the

**Model**) and those due to randomness (

**Residuals**)

**SS**: implies sum of squared residuals for the Model (explained variation in

*pce*) and Residuals (unexplained variation in

*pce*). After doing the regression analysis, all the points on

*pce*

_{ha}_{t}do not fall on the regression line. Those points outside the line are known as residuals. Those that can be explained by the model are known as

**Explained Sum of Squares**(ESS) while those that are due to random nature, which are outside the model are known as

**Residual Sum of Squares**(RSS).

To graph the model (

*pce*) with the linear prediction (*pce*), the Stata code is:_{hat}*scatter pce income || lfit pce income*

As observed from the graph, all the
points do not fall on the predicted line. Some lie above, while some are
beneath the line. These are all the

**residuals**(in order words, the remnants obtained after the regression analysis).
To obtain the predicted value, the Stata
command is:

predict

*pce_hat*
and to obtain the residual value, the
Stata command is:

predict

*pce_resid*Predicted and residual value of the dependent variable Source: CrunchEconometrix |

If the predicted line falls above a
point, it means that

*pce*is over-predicted (that is,*pce**– pce*is negative) and if it is beneath a point, it implies that_{hat}*pce*is under-predicted (that is,*is positive). The sum and mean of the residuals equals zero.**pce**– pce*_{hat}**df**: this is degree of freedom calculated as

*k - 1*(for the model) and

*n - k*(for the residuals).

*n*= number of observations;

*k*= number of restrictions on the model

**MS**: implies mean sum of squared residuals and obtained by dividing

*by*

**SS***i.e.*

**df**

*SS/df***No. of obs**: the data span is from 1960 to 2009 = 50 years

*F***-stat**: captures whether the explanatory variable,

*income*is significant in explaining the outcome variable,

*pce*. The higher the F-stat, the better for the model.

**Prob>F**: this is the probability value that indicates the statistical significance of the

*F*ratio.You will prefer to have a

*prob*-value that is less than 0.05.

*R***-squared**: gives the variation in

*pce*that is explained by

*income*. The higher the

*R*

^{2}, the better the model and the more predictive power the variables have. Although, an

*R*

^{2}that equals 1 will elicit some suspicion. The R is actually the correlation coefficient between the 2 variables. This implies that:

*Adjusted R***-squared**: this is the

*R*

^{2}adjusted as you increase your explanatory variables. It reduces as more explanatory variables are added.

**Coeff**: this is the slope coefficient. The estimate for

*income*. The sign of the coefficient also tells you the direction of the relationship. A positive (negative) sign implies a positive (negative) relationship.

**_cons**: this is the hypothetical outcome on

*pce*if

*income*is zero. It is also the intercept for the model.

**Std. error**: this is the standard deviation for the coefficient. That is, since you are not so sure about the exact value for

*income*, there will be some variation in the prediction for the coefficient. Therefore, the standard error shows how much deviation occurs from predicting the slope coefficient estimate.

*t***-stat**: this measures the number of standard errors that the coefficient is from zero. It is obtained by:

**. A**

*coeff/std. error**t*-stat above 2 is sufficient evidence against the null hypothesis

**P>|t|**: there are several interpretations for this. (1) it is smallest evidence required to reject the null hypothesis, (2) it is the probability that one would have obtained the slope coefficient value from the data if the actual slope coefficient is zero, (3) the p-value looks up the

*t*-stat table using the degree of freedom (df) to show the number of standard errors the coefficient is from zero, (4) tells whether the relationship is significant or not.

So, if the

*p*-value is 0.4, then it means that you are only 60% (that is, (100-40)% ) confident that the slope coefficient is non-zero. This is not good enough. This is because a very low*p*-value gives a higher level of confidence in rejecting the null hypothesis. Hence, a*p*-value of 0.02, implies that you are 98% (that is, (100 - 2)% ) confident that the slope coefficient is non-zero. This is very comforting! J.**95% confidence interval**: if the coefficient is significant, this interval will contain that slope coefficient but it will not, if otherwise.

**Assignment:**

Use Gujarati and Porter datasets Table7_12.dta or
Table7_12.xlsx dataset.

(1) With

*pce*as the dependent variable and*gdpi*as the explanatory variable, plot the graph of*pce*and*gdpi*, what do you observe?
(2) Run your regression. Can you interpret the
table and the features?

(3) Plot the predicted line. What are your
observations?

I have taken you through the basic
features of a regression output using Stata analytical software on ordinary
least squares (OLS) model in a simple linear regression. Hence, you now have the
basic idea of what the

*F*-stat,*t*-stat, df, SS, MS, prob>F, p>|t|, confidence interval,*R*^{2}, coefficient, standard error stand for.
Practice the assignment and if you still
have further questions, kindly post them below…..

## No comments:

## Post a Comment