Title: Review I: Basics
1Review I Basics
- Introduction to Econometrics
- Conducting an Econometric Study
- The Summation Operator
- Properties of Random Variables
- Probability Distribution
- Some Useful Results
2What Is Econometrics?
- Possible answers
- 1. Econometrics is the science of testing
economic theories. - 2. Econometrics is the set of tools used for
forecasting future values of economic variables,
such as a firms sales, the growth rate of an
economy, or stock prices. - 3. Econometrics is the process of fitting
mathematical models to economic data. - 4. Econometrics is the art and science of using
historical data to make quantitative policy
recommendations in government and business.
3What Is Econometrics?
- All these answers are correct!!!
- At a broad level, we define,
- Econometrics is the study of the application of
statistical methods to economic problems -
- Econometric methods are widely used in Finance,
Labour Economics, Microecononmics,
Macroeconomics, Public Finance, Marketing,
Industrial Organization, Health, etc.
4Why study Econometrics?
- Rare in economics (and many other areas without
labs!) to have experimental data. - Need to use non-experimental, or observational
data to make inferences. - Important to be able to apply economic theory to
real world data.
5Why study Econometrics?
- An empirical analysis uses data to test a theory
or to estimate a relationship. - A formal economic model can be tested.
- Theory may be ambiguous as to the effect of some
policy change can use econometrics to evaluate
the program.
6Types of Data Cross Sectional
- Cross-sectional data is a random sample.
- Each observation is a new individual, firm, etc.
with information at a point in time. - If the data is not a random sample, we have a
sample-selection problem.
7Cross Sectional Data Structure
- A cross-sectional data set on wages and other
individual characteristics
8(No Transcript)
9Types of Data Panel (balanced or unbalanced)
- Can pool random cross sections and treat similar
to a normal cross section. Will just need to
account for time differences. - Can follow the same random individual
observations over time known as panel data or
longitudinal data.
10Pooled Cross Sectional Data Structure
- Pooled cross-sections two years of housing
prices when there was reduction in property taxes
in 2002
11Panel Data Structure
- A two year panel data set on City Crime Statistics
12(No Transcript)
13Types of Data Time Series
- Time series data has a separate observation for
each time period e.g. stock prices, inflation
rate, unemployment rate, etc. - Since time series data is not a random sample,
different problems to consider - Trends and seasonality will be important.
- Trend Stationary vs. Difference Stationary.
14Time Series Data Structure
- Time series data for important macroeconomic
variables (1971-2000)
15(No Transcript)
16The Question of Causality
- Simply establishing a relationship between
variables is rarely sufficient. - Want to the effect to be considered causal.
- If weve truly controlled for enough other
variables, then the estimated ceteris paribus
effect can often be considered to be causal. - Can be difficult to establish causality.
17Example Measuring the Return to Education
- A model of human capital investment implies
getting more education should lead to higher
earnings. - In the simplest case, this implies an equation
like - Earnings b1 b1Education ?
18Example (cont.)
- The estimate of b1, is the return to education,
but can it be considered causal? - While the error term, ?, includes other factors
affecting earnings, want to control for as much
as possible. - Some things are still unobserved, which can be
problematic.
19Conducting an Econometric Study
- Step 1 Develop a Research Question (Hypothesis)
- Step 2 Develop an economic model to frame the
question - Step 3 Collecting data to estimate the
parameters of the model - Step 4 Model Specification and testing
- Step5 Present the findings and interpret the
results (prediction or forecasting?)
20Step 1 Develop a Research Question
- Two Important Considerations
- The question has to be feasible
- Must have an objective answer (positive vs.
normative) - Answer can be found using econometric methods
- The Question has to be Practical
- Set up an economic model
- Collect relevant data
- Analyze those data within the available time frame
21Examples
- Does Reducing Class Size Improve Elementary
School Education? - Is there a Racial Discrimination in the Market
for Home Loans? - How much Do Cigarette Taxes Reduce Smoking?
- What will the Rate of Inflation be Next Year?
22Examples (cont.)
- Economic Model of Crime
- Job Training and Worker Productivity
- Effects of Fertilizer on Crop Yield
- The Effect of Law Enforcement on City Crime
Levels - The Effect of the Minimum Wage on Unemployment
23Data Source
- Canadian Sources Visit Data Liberation
Initiative at the University of Manitoba Library
to see all Canadian sources of Data. - US and International Sources Browse Web Links in
your CD-ROM that come with the Text. - If you are looking for a particular data set for
your undergraduate research paper, contact Gary
Strike, Dafoe Library, tel. 474-7086 for more
information and assistance.
24The Summation Notation
- The sum of a large number of terms occurs
frequently in econometrics. There is an
abbreviated notation for such sums. The upper
case Greek letter ? (sigma) is used to indicate a
summation and the terms are generally indexed by
subscripts.
25Examples
26Properties of the Summation Operator
27Sample Average
28Results
29Properties of Random Variables
- What is a random Variable?
- it takes a single, specific value
- We dont know in advance what value it takes
- We do know all possible values it may take
- We know the probability that it will take any one
of those possible values - Expected value (or population mean value) The
expected value of a discrete random variable X
is
30Properties of Expected Value
31Properties of Expected Value
- The law of large numbers Suppose one repeatedly
observes different realized values of a random
variable and calculates the mean of the realized
values. The mean will tend to be close to the
expected value the more times one observes the
random variable, the closer the mean will tend to
be. - The expected value of a random variable (say,
stock price) does not tell us how much it will go
up or down. The variance provides a measure of
how far the random variable is likely to be away
from its mean.
32Properties of Random Variables
- For a discrete random variable, the variance (?2
E(X - ?)2 is calculated by - Since the variance is the average value of the
squared distance between Xi and ?, it does not
have an easy interpretation. - The standard deviation is a very useful measure.
The standard deviation ? of a random variable is
equal to the square root of the variance of the
random variable.
33Properties of Variance
- 1. Var(constant) 0
- 2. If X and Y are two independent random
variables, then - Var(X Y) Var(X) Var (Y) and
- Var(X - Y) Var(X) Var (Y)
- 3. If b is a constant then Var(bX) Var(X)
- 4. If a is a constant then Var(aX) a2Var(X)
- 5. If a and b are constants then Var(aXb)
a2Var(X) - 6. If X and Y are two independent random
variables and a and b are constants then
Var(aXbY) a2Var(X) b2Var(Y)
34Covariance
- Covariance For two discrete random variables X
and Y with E(X) ?x and E(Y) ?y, the
covariance between X and Y is defined as Cov(XY)
?xy E(X - ?x) E(Y - ?y) E(XY) - ?x ?y. - To computer the covariance, we use the following
formula
35Covariance
- In general, the covariance between two random
variables can be positive or negative. If two
random variables move in the same direction, then
the covariance will be positive, if they move in
the opposite direction the covariance will be
negative. - Properties
- 1.If X and Y are independent random variables,
their covariance is zero. Since E(XY) E(X)E(Y) - 2. Cov(XX) Var(X)
- 3. Cov(YY) Var(Y)
36Correlation Coefficient
- The covariance tells the sign but not the
magnitude about how strongly the variables are
positively or negatively related. The correlation
coefficient provides such measure of how strongly
the variables are related to each other. - For two random variables X and Y with E(X) ?x
and E(Y) ?y, the correlation coefficient is
defined as
37Correlation Coefficient
- 1. Like the covariance, the correlation
coefficient can be positive or negative same
sign as the covariance. - 2. The correlation coefficient always lies
between 1 and 1. 1 perfectly negatively
correlated and 1 perfectly positively
correlated. - 3. Variances of correlated variables
- Var(X Y) Var(X) Var(Y) 2Cov(X,Y)
- Var(X - Y) Var(X) Var(Y) 2Cov(X,Y)
38The Normal Distribution
- The normal family of distributions occurs much
more often in econometrics than any other
parametric family. - One reason for this is that the sum of a large
number of independent random variables has an
approximately normal distribution. - Normal distributions are symmetrical about the
mean, and the normal probability curve is the
familiar bell-shaped curve. The mean, median, and
mode are equal for this family of distributions
39Shape of the Normal Distribution
40Normal Distribution
- A normally distributed random variable X with
mean ?x and variance ?x2 is written as X N(?x,
?x2). - Standard Normal A normally distributed random
variable Z with mean 0 and variance 1is written
as Z N(0,1).
41Normal Distribution
- The PDF of a normally distributed random
variable X with mean ?x and variance ?x2 is given
by - The PDF of a Standard Normal random variable Z is
given by
42Properties of N.D.
- 1. The normal distribution curve is symmetrical
around its mean value. - 2. The The PDF of the distribution is highest at
its mean value. That is, the probability of
obtaining a value of a normally distributed r.v.
far away from its mean value becomes
progressively smaller.
43Properties of N.D. (cont.)
- 3. Approximately 68 of the area under the
normal curve lies between - Approximately 95 of the area under the normal
curve lies between - Approximately 99.7 of the area under the normal
curve lies between
44Properties of N.D. (cont.)
- 4. A normally distributed random variable is
fully described by its two parameters mean and
variance. - 5. A linear combination of two or more normally
distributed random variables is itself normally
distributed. - 6. For a normal distribution, skewness is zero
and kurtosis is 3.
45Properties of N.D. (cont.)
- (Note Skewness is (square of the 3rd
moment)/(cube of the 2nd moment) and kurtosis is
(fourth moment)/square of the 2nd moment) - 7. Z-transformation
- X N(?x, ?x2) then (X -?x)/?x. N(0,1)
46Useful Results
47Properties of t distribution
- The t distribution like the normal distribution
is symmetric. It is a little bit wider and
flatter than the standard normal distribution. - The mean of the t distribution, like the standard
normal distribution is zero, but its variance is
k/(k-2), where k is the d.f. Thus, variance is
defined for d.f.gt2.
48?2 Distribution
- If X N(0,1), then X2 is also random and it
takes the ?2 (Chi-square) distribution with 1
d.f. That is, X2 ?2(1). - If N independent random variables X1, X2, , Xk
are all distributed N(0,1), then the sum of their
squares are also random and has the ?2
distribution with k d.f. That is, ?Xi2 ?2(k).
49Properties of ?2 Distribution
- 1. Unlike the normal distribution, the ?2
distribution takes only positive values and
ranges from 0 to ?. - 2. Unlike the normal distribution, the ?2
distribution is a skewed distribution, the degree
of skewness depending on the d.f. For a
relatively few d.f., the distribution is highly
skewed to the right, but as the d.f. increases,
the distribution becomes increasingly symmetrical
and approaches the normal distribution.
50Properties of ?2 Distribution
- 3. The expected value of a ?2 r.v. is k and its
variance is 2k, where k is the d.f. - 4. If Z1 and Z2 are two independent ?2 variables
with k1 and k2 d.f., respectively, then their sum
is also a ?2 variable with d.f. (k1k2). - 5. If X N(?, ?2) then (X-?)/?2 ?2(1). If
X1, X2 , , Xk are all N(?, ?2) then
?(Xi-?)/?2 ?2(k)
51F-Distribution
- If Z1 and Z2 are independently distributed ?2
variables with k1 and k2 d.f., respectively, then
the variable (Z1/k1)/(Z2/k2) Fk1,k2 - Like the ?2 distribution, the F-distribution is
also skewed to the right and also ranges between
0 and ?. - Like the ?2 distribution, the F-distribution
approaches the normal distribution as k1 and k2,
the d.f., become large. - The square of a t distributed random variable
with k d.f. has a F-distribution with 1 and k
d.f. That is, tk2 F1,k.