Title: DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES
 1DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES
This sequence explains how you can include 
qualitative explanatory variables in your 
regression model. Suppose that you have data on 
the annual recurrent expenditure, COST, and the 
number of students enrolled, N, for a sample of 
secondary schools, of which there are two types 
regular and occupational. The occupational 
schools aim to provide skills for specific 
occupations and they tend to be relatively 
expensive to run because they need to maintain 
specialized workshops.
1 
 2DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES
One way of dealing with the difference in the 
costs would be to run separate regressions for 
the two types of school. However this would have 
the drawback that you would be running 
regressions with two small samples instead of one 
large one, with an adverse effect on the 
precision of the estimates of the coefficients.
5 
 3DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES
b1'
b1
 OCC  0 Regular school COST  b1  b2N  u OCC 
 1 Occupational school COST  b1'  b2N  u
Another way of handling the difference would be 
to hypothesize that the cost function for 
occupational schools has an intercept b1' that is 
greater than that for regular schools.
6 
 4DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES
b1'
b1
 OCC  0 Regular school COST  b1  b2N  u OCC 
 1 Occupational school COST  b1'  b2N  u
Effectively, we are hypothesizing that the annual 
overhead cost is different for the two types of 
school, but the marginal cost is the same. The 
marginal cost assumption is not very plausible 
and we will relax it in due course.
7 
 5DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES
b1'
d
b1
 OCC  0 Regular school COST  b1  b2N  u OCC 
 1 Occupational school COST  b1'  b2N  u
Let us define d to be the difference in the 
intercepts d  b1' - b1.
8 
 6DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES
b1d
d
b1
 OCC  0 Regular school COST  b1  b2N  u OCC 
 1 Occupational school COST  b1  d  b2N  u
Then b1'  b1  d and we can rewrite the cost 
function for occupational schools as shown.
9 
 7DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES
b1d
d
b1
Combined equation COST  b1  d OCC  b2N  u OCC 
 0 Regular school COST  b1  b2N  u OCC  1 
Occupational school COST  b1  d  b2N  u
We can now combine the two cost functions by 
defining a dummy variable OCC that has value 0 
for regular schools and 1 for occupational 
schools.
10 
 8DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES
b1d
d
b1
Combined equation COST  b1  d OCC  b2N  u OCC 
 0 Regular school COST  b1  b2N  u OCC  1 
Occupational school COST  b1  d  b2N  u
Dummy variables always have two values, 0 or 1. 
If OCC is equal to 0, the cost function becomes 
that for regular schools. If OCC is equal to 1, 
the cost function becomes that for occupational 
schools.
11 
 9DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES
We will now fit a function of this type using 
actual data for a sample of 74 secondary schools 
in Shanghai.
12 
 10DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES
 School Type COST N OCC 1 Occupationa
l 345,000 623 1 2 Occupational 
 537,000 653 1 3 Regular 170,000 400 0 4 Occupa
tional 526.000 663 1 5 Regular 100,000 563 0 6 
Regular 28,000 236 0 7 Regular 
 160,000 307 0 8 Occupational 45,000 173 1 9 Oc
cupational 120,000 146 1 10 Occupational 61,00
0 99 1
The table shows the data for the first 10 schools 
in the sample. The annual cost is measured in 
yuan, one yuan being worth about 20 cents U.S. at 
the time. N is the number of students in the 
school. OCC is the dummy variable for the type of 
school.
13 
 11DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES
 . reg COST N OCC Source  SS df 
 MS Number of obs  
74 --------------------------------------- 
 F( 2, 71)  56.86 Model  
9.0582e11 2 4.5291e11 Prob gt 
F  0.0000 Residual  5.6553e11 71 
7.9652e09 R-squared  
0.6156 --------------------------------------- 
 Adj R-squared  0.6048 Total  
1.4713e12 73 2.0155e10 Root 
MSE  89248 ------------------------------
------------------------------------------------ 
 COST  Coef. Std. Err. t 
Pgtt 95 Conf. Interval ----------------
--------------------------------------------------
----------- N  331.4493 39.75844 
8.337 0.000 252.1732 410.7254 OCC 
 133259.1 20827.59 6.398 0.000 
91730.06 174788.1 _cons  -33612.55 
23573.47 -1.426 0.158 -80616.71 
13391.61 -----------------------------------------
------------------------------------- 
We now run the regression of COST on N and OCC, 
treating OCC just like any other explanatory 
variable, despite its artificial nature. The 
Stata output is shown. We will begin by 
interpreting the regression coefficients.
15 
 12DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES
 Regular School (OCC  
0) 
COST  -34,000  133,000OCC  331N
COST  -34,000  331N
The regression results have been rewritten in 
equation form. From it we can derive cost 
functions for the two types of school by setting 
OCC equal to 0 or 1. If OCC is equal to 0, we get 
the equation for regular schools, as shown. It 
implies that the marginal cost per student per 
year is 331 yuan and that the annual overhead 
cost is -34,000 yuan. Obviously having a negative 
intercept does not make any sense at all and it 
suggests that the model is misspecified in some 
way. We will come back to this later.
18 
 13DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES
 Regular School (OCC  
0) Occupational School (OCC  1) 
COST  -34,000  133,000OCC  331N
COST  -34,000  331N
COST  -34,000  133,000  331N
 99,000  331N
The coefficient of the dummy variable is an 
estimate of d, the extra annual overhead cost of 
an occupational school. Putting OCC equal to 1, 
we estimate the annual overhead cost of an 
occupational school to be 99,000 yuan. The 
marginal cost is the same as for regular schools. 
 It must be, given the model specification.
21 
 14DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES
The scatter diagram shows the data and the two 
cost functions derived from the regression 
results.
22 
 15DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES
 . reg COST N OCC Source  SS df 
 MS Number of obs  
74 --------------------------------------- 
 F( 2, 71)  56.86 Model  
9.0582e11 2 4.5291e11 Prob gt 
F  0.0000 Residual  5.6553e11 71 
7.9652e09 R-squared  
0.6156 --------------------------------------- 
 Adj R-squared  0.6048 Total  
1.4713e12 73 2.0155e10 Root 
MSE  89248 ------------------------------
------------------------------------------------ 
 COST  Coef. Std. Err. t 
Pgtt 95 Conf. Interval ----------------
--------------------------------------------------
----------- N  331.4493 39.75844 
8.337 0.000 252.1732 410.7254 OCC 
 133259.1 20827.59 6.398 0.000 
91730.06 174788.1 _cons  -33612.55 
23573.47 -1.426 0.158 -80616.71 
13391.61 -----------------------------------------
------------------------------------- 
We will perform a t test on the coefficient of 
the dummy variable. Our null hypothesis is H0 d 
 0 and our alternative hypothesis is H1 d 
0. In words, our null hypothesis is that there is 
no difference in the overhead costs of the two 
types of school. The t statistic is 6.40, so it 
is rejected at the 0.1 significance level. 
24 
 16DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES
 . reg COST N OCC Source  SS df 
 MS Number of obs  
74 --------------------------------------- 
 F( 2, 71)  56.86 Model  
9.0582e11 2 4.5291e11 Prob gt 
F  0.0000 Residual  5.6553e11 71 
7.9652e09 R-squared  
0.6156 --------------------------------------- 
 Adj R-squared  0.6048 Total  
1.4713e12 73 2.0155e10 Root 
MSE  89248 ------------------------------
------------------------------------------------ 
 COST  Coef. Std. Err. t 
Pgtt 95 Conf. Interval ----------------
--------------------------------------------------
----------- N  331.4493 39.75844 
8.337 0.000 252.1732 410.7254 OCC 
 133259.1 20827.59 6.398 0.000 
91730.06 174788.1 _cons  -33612.55 
23573.47 -1.426 0.158 -80616.71 
13391.61 -----------------------------------------
------------------------------------- 
We can perform t tests on the other coefficients 
in the usual way. The t statistic for the 
coefficient of N is 8.34, so we conclude that the 
marginal cost is (very) significantly different 
from 0.
26 
 17DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES
 . reg COST N OCC Source  SS df 
 MS Number of obs  
74 --------------------------------------- 
 F( 2, 71)  56.86 Model  
9.0582e11 2 4.5291e11 Prob gt 
F  0.0000 Residual  5.6553e11 71 
7.9652e09 R-squared  
0.6156 --------------------------------------- 
 Adj R-squared  0.6048 Total  
1.4713e12 73 2.0155e10 Root 
MSE  89248 ------------------------------
------------------------------------------------ 
 COST  Coef. Std. Err. t 
Pgtt 95 Conf. Interval ----------------
--------------------------------------------------
----------- N  331.4493 39.75844 
8.337 0.000 252.1732 410.7254 OCC 
 133259.1 20827.59 6.398 0.000 
91730.06 174788.1 _cons  -33612.55 
23573.47 -1.426 0.158 -80616.71 
13391.61 -----------------------------------------
------------------------------------- 
In the case of the intercept, the t statistic is 
-1.43, so we do not reject the null hypothesis 
H0 b1  0. Thus one explanation of the 
nonsensical negative overhead cost of regular 
schools might be that they do not actually have 
any overheads and our estimate is a random 
number. A more realistic version of this 
hypothesis is that b1 is positive but small (as 
you can see, the 95 percent confidence interval 
includes positive values) and the error term is 
responsible for the negative estimate. As already 
noted, a further possibility is that the model is 
misspecified in some way. We will continue to 
develop the model in the next sequence.
27 
 18DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES
 COST  b1  dTTECH  dWWORKER  dVVOC  b2N 
 u 
This sequence explains how to extend the dummy 
variable technique to handle a qualitative 
explanatory variable which has more than two 
categories. In the previous sequence we used a 
dummy variable to differentiate between regular 
and occupational schools when fitting a cost 
function. In actual fact there are two types of 
regular secondary school in Shanghai. There are 
general schools, which provide the usual academic 
education, and vocational schools. 
1 
 19DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES
 COST  b1  dTTECH  dWWORKER  dVVOC  b2N 
 u 
As their name implies, the vocational schools are 
meant to impart occupational skills as well as 
give an academic education. However the 
vocational component of the curriculum is 
typically quite small and the schools are similar 
to the general schools. Often they are just 
general schools with a couple of workshops 
added. Likewise there are two types of 
occupational school. There are technical schools 
training technicians and skilled workers schools 
training craftsmen.
4 
 20DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES
 COST  b1  dTTECH  dWWORKER  dVVOC  b2N 
 u 
So now the qualitative variable has four 
categories. The standard procedure is to choose 
one category as the reference category and to 
define dummy variables for each of the others. In 
general it is good practice to select the most 
normal or basic category as the reference 
category, if one category is in some sense more 
normal or basic than the others. In the Shanghai 
sample it is sensible to choose the general 
schools as the reference category. They are the 
most numerous and the other schools are 
variations of them.
7 
 21DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES
 COST  b1  dTTECH  dWWORKER  dVVOC  b2N 
 u 
Accordingly we will define dummy variables for 
the other three types. TECH will be the dummy 
for the technical schools TECH is equal to 1 if 
the observation relates to a technical school, 0 
otherwise. Similarly we will define dummy 
variables WORKER and VOC for the skilled workers 
schools and the vocational schools. 
10 
 22DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES
 COST  b1  dTTECH  dWWORKER  dVVOC  b2N 
 u 
Each of the dummy variables will have a 
coefficient which represents the extra overhead 
costs of the schools, relative to the reference 
category. Note that you do not include a dummy 
variable for the reference category, and that is 
the reason that the reference category is usually 
described as the omitted category. 
12 
 23DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES
 COST  b1  dTTECH  dWWORKER  dVVOC  b2N 
 u General School COST  b1  b2N  u (TECH 
 WORKER  VOC  0) 
If an observation relates to a general school, 
the dummy variables are all 0 and the regression 
model is reduced to its basic components.
14 
 24DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES
 COST  b1  dTTECH  dWWORKER  dVVOC  b2N 
 u General School COST  b1  b2N  u (TECH 
 WORKER  VOC  0) Technical School COST  
(b1  dT)  b2N  u (TECH  1 WORKER  VOC  
0) 
If an observation relates to a technical school, 
TECH will be equal to 1 and the other dummy 
variables will be 0. The regression model 
simplifies as shown.
15 
 25DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES
 COST  b1  dTTECH  dWWORKER  dVVOC  b2N 
 u General School COST  b1  b2N  u (TECH 
 WORKER  VOC  0) Technical School COST  
(b1  dT)  b2N  u (TECH  1 WORKER  VOC  
0) Skilled Workers School COST  (b1  dW)  
b2N  u (WORKER  1 TECH  VOC  0) Vocational 
School COST  (b1  dV)  b2N  u (VOC  1 
TECH  WORKER  0)
The regression model simplifies in a similar 
manner in the case of observations relating to 
skilled workers schools and vocational schools.
16 
 26DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES
COST
Technical
dT
dW
b1dT
Workers
b1dW
Vocational
dV
b1dV
General
b1
N
The diagram illustrates the model graphically. 
The d coefficients are the extra overhead costs 
of running technical, skilled workers, and 
vocational schools, relative to the overhead cost 
of general schools.
17 
 27DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES
COST
Technical
dT
dW
b1dT
Workers
b1dW
Vocational
dV
b1dV
General
b1
N
Note that we do not make any prior assumption 
about the size, or even the sign, of the d 
coefficients. They will be estimated from the 
sample data.
18 
 28DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES
The scatter diagram shows the data for the entire 
sample, differentiating by type of school.
20 
 29DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES
 . reg COST N TECH WORKER VOC Source  
SS df MS Number of 
obs  74 -----------------------------------
---- F( 4, 69)  29.63 
Model  9.2996e11 4 2.3249e11 
 Prob gt F  0.0000 Residual  5.4138e11 
 69 7.8461e09 R-squared  
0.6320 --------------------------------------- 
 Adj R-squared  0.6107 Total  
1.4713e12 73 2.0155e10 Root 
MSE  88578 ------------------------------
------------------------------------------------ 
 COST  Coef. Std. Err. t 
Pgtt 95 Conf. Interval ----------------
--------------------------------------------------
----------- N  342.6335 40.2195 
8.519 0.000 262.3978 422.8692 TECH 
 154110.9 26760.41 5.759 0.000 
100725.3 207496.4 WORKER  143362.4 
27852.8 5.147 0.000 87797.57 
198927.2 VOC  53228.64 31061.65 
1.714 0.091 -8737.646 115194.9 _cons 
 -54893.09 26673.08 -2.058 0.043 
-108104.4 -1681.748 ----------------------------
--------------------------------------------------
Here is the Stata output for this regression. 
The coefficient of N indicates that the marginal 
cost per student per year is 343 yuan.
21 
 30DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES
 . reg COST N TECH WORKER VOC Source  
SS df MS Number of 
obs  74 -----------------------------------
---- F( 4, 69)  29.63 
Model  9.2996e11 4 2.3249e11 
 Prob gt F  0.0000 Residual  5.4138e11 
 69 7.8461e09 R-squared  
0.6320 --------------------------------------- 
 Adj R-squared  0.6107 Total  
1.4713e12 73 2.0155e10 Root 
MSE  88578 ------------------------------
------------------------------------------------ 
 COST  Coef. Std. Err. t 
Pgtt 95 Conf. Interval ----------------
--------------------------------------------------
----------- N  342.6335 40.2195 
8.519 0.000 262.3978 422.8692 TECH 
 154110.9 26760.41 5.759 0.000 
100725.3 207496.4 WORKER  143362.4 
27852.8 5.147 0.000 87797.57 
198927.2 VOC  53228.64 31061.65 
1.714 0.091 -8737.646 115194.9 _cons 
 -54893.09 26673.08 -2.058 0.043 
-108104.4 -1681.748 ----------------------------
--------------------------------------------------
The coefficients of TECH, WORKER, and VOC are 
154,000, 143,000, and 53,000, respectively, and 
should be interpreted as the additional annual 
overhead costs, relative to those of general 
schools.
22 
 31DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES
 . reg COST N TECH WORKER VOC Source  
SS df MS Number of 
obs  74 -----------------------------------
---- F( 4, 69)  29.63 
Model  9.2996e11 4 2.3249e11 
 Prob gt F  0.0000 Residual  5.4138e11 
 69 7.8461e09 R-squared  
0.6320 --------------------------------------- 
 Adj R-squared  0.6107 Total  
1.4713e12 73 2.0155e10 Root 
MSE  88578 ------------------------------
------------------------------------------------ 
 COST  Coef. Std. Err. t 
Pgtt 95 Conf. Interval ----------------
--------------------------------------------------
----------- N  342.6335 40.2195 
8.519 0.000 262.3978 422.8692 TECH 
 154110.9 26760.41 5.759 0.000 
100725.3 207496.4 WORKER  143362.4 
27852.8 5.147 0.000 87797.57 
198927.2 VOC  53228.64 31061.65 
1.714 0.091 -8737.646 115194.9 _cons 
 -54893.09 26673.08 -2.058 0.043 
-108104.4 -1681.748 ----------------------------
--------------------------------------------------
The constant term is -55,000, indicating that the 
annual overhead cost of a general academic school 
is -55,000 yuan per year. Obviously this is 
nonsense and indicates that something is wrong 
with the model.
23 
 32DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES
 COST  -55,000  154,000TECH  143,000WORKER  
53,000VOC  343N 
The top line shows the regression result in 
equation form. We will derive the implicit cost 
functions for each type of school.
24 
 33DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES
 COST  -55,000  154,000TECH  143,000WORKER  
53,000VOC  343N General School COST  -55,000 
 343N (TECH  WORKER  VOC  0) 
In the case of a general school, the dummy 
variables are all 0 and the equation reduces to 
the intercept and the term involving N.
25 
 34DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES
 COST  -55,000  154,000TECH  143,000WORKER  
53,000VOC  343N General School COST  -55,000 
 343N (TECH  WORKER  VOC  0) 
The annual marginal cost per student is estimated 
at 343 yuan. The annual overhead cost per school 
is estimated at -55,000 yuan. Obviously a 
negative amount is inconceivable.
26 
 35DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES
 COST  -55,000  154,000TECH  143,000WORKER  
53,000VOC  343N General School COST  -55,000 
 343N (TECH  WORKER  VOC  0) Technical 
School COST  -55,000  154,000  343N (TECH  1 
WORKER  VOC  0)  99,000  343N 
The extra annual overhead cost for a technical 
school, relative to a general school, is 154,000 
yuan. Hence we derive the implicit cost function 
for technical schools.
27 
 36DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES
 COST  -55,000  154,000TECH  143,000WORKER  
53,000VOC  343N General School COST  -55,000 
 343N (TECH  WORKER  VOC  0) Technical 
School COST  -55,000  154,000  343N (TECH  1 
WORKER  VOC  0)  99,000  343N Skilled 
Workers School COST  -55,000  143,000  
343N (WORKER  1 TECH  VOC  0)  88,000  
343N Vocational School COST  -55,000  53,000  
343N (VOC  1 TECH  WORKER  0)  -2,000  
343N
And similarly the extra overhead costs of skilled 
workers and vocational schools, relative to 
those of general schools, are 143,000 and 53,000 
yuan, respectively.
28 
 37DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES
 COST  -55,000  154,000TECH  143,000WORKER  
53,000VOC  343N General School COST  -55,000 
 343N (TECH  WORKER  VOC  0) Technical 
School COST  -55,000  154,000  343N (TECH  1 
WORKER  VOC  0)  99,000  343N Skilled 
Workers School COST  -55,000  143,000  
343N (WORKER  1 TECH  VOC  0)  88,000  
343N Vocational School COST  -55,000  53,000  
343N (VOC  1 TECH  WORKER  0)  -2,000  
343N
Note that in each case the annual marginal cost 
per student is estimated at 343 yuan. The model 
specification assumes that this figure does not 
differ according to type of school.
29 
 38DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES
The four cost functions are illustrated 
graphically.
30 
 39DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES
. reg COST N TECH WORKER VOC Source  
SS df MS Number of 
obs  74 -----------------------------------
---- F( 4, 69)  29.63 
Model  9.2996e11 4 2.3249e11 
 Prob gt F  0.0000 Residual  5.4138e11 
 69 7.8461e09 R-squared  
0.6320 --------------------------------------- 
 Adj R-squared  0.6107 Total  
1.4713e12 73 2.0155e10 Root 
MSE  88578 ------------------------------
------------------------------------------------ 
 COST  Coef. Std. Err. t 
Pgtt 95 Conf. Interval ----------------
--------------------------------------------------
----------- N  342.6335 40.2195 
8.519 0.000 262.3978 422.8692 TECH 
 154110.9 26760.41 5.759 0.000 
100725.3 207496.4 WORKER  143362.4 
27852.8 5.147 0.000 87797.57 
198927.2 VOC  53228.64 31061.65 
1.714 0.091 -8737.646 115194.9 _cons 
 -54893.09 26673.08 -2.058 0.043 
-108104.4 -1681.748 ----------------------------
--------------------------------------------------
We can perform t tests on the coefficients in the 
usual way. The t statistic for N is 8.52, so the 
marginal cost is (very) significantly different 
from 0, as we would expect.
31 
 40DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES
. reg COST N TECH WORKER VOC Source  
SS df MS Number of 
obs  74 -----------------------------------
---- F( 4, 69)  29.63 
Model  9.2996e11 4 2.3249e11 
 Prob gt F  0.0000 Residual  5.4138e11 
 69 7.8461e09 R-squared  
0.6320 --------------------------------------- 
 Adj R-squared  0.6107 Total  
1.4713e12 73 2.0155e10 Root 
MSE  88578 ------------------------------
------------------------------------------------ 
 COST  Coef. Std. Err. t 
Pgtt 95 Conf. Interval ----------------
--------------------------------------------------
----------- N  342.6335 40.2195 
8.519 0.000 262.3978 422.8692 TECH 
 154110.9 26760.41 5.759 0.000 
100725.3 207496.4 WORKER  143362.4 
27852.8 5.147 0.000 87797.57 
198927.2 VOC  53228.64 31061.65 
1.714 0.091 -8737.646 115194.9 _cons 
 -54893.09 26673.08 -2.058 0.043 
-108104.4 -1681.748 ----------------------------
--------------------------------------------------
The t statistic for the technical school dummy is 
5.76, indicating the the annual overhead cost of 
a technical school is (very) significantly 
greater than that of a general school, again as 
expected.
32 
 41DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES
. reg COST N TECH WORKER VOC Source  
SS df MS Number of 
obs  74 -----------------------------------
---- F( 4, 69)  29.63 
Model  9.2996e11 4 2.3249e11 
 Prob gt F  0.0000 Residual  5.4138e11 
 69 7.8461e09 R-squared  
0.6320 --------------------------------------- 
 Adj R-squared  0.6107 Total  
1.4713e12 73 2.0155e10 Root 
MSE  88578 ------------------------------
------------------------------------------------ 
 COST  Coef. Std. Err. t 
Pgtt 95 Conf. Interval ----------------
--------------------------------------------------
----------- N  342.6335 40.2195 
8.519 0.000 262.3978 422.8692 TECH 
 154110.9 26760.41 5.759 0.000 
100725.3 207496.4 WORKER  143362.4 
27852.8 5.147 0.000 87797.57 
198927.2 VOC  53228.64 31061.65 
1.714 0.091 -8737.646 115194.9 _cons 
 -54893.09 26673.08 -2.058 0.043 
-108104.4 -1681.748 ----------------------------
--------------------------------------------------
Similarly for skilled workers schools, the t 
statistic being 5.15.
33 
 42DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES
. reg COST N TECH WORKER VOC Source  
SS df MS Number of 
obs  74 -----------------------------------
---- F( 4, 69)  29.63 
Model  9.2996e11 4 2.3249e11 
 Prob gt F  0.0000 Residual  5.4138e11 
 69 7.8461e09 R-squared  
0.6320 --------------------------------------- 
 Adj R-squared  0.6107 Total  
1.4713e12 73 2.0155e10 Root 
MSE  88578 ------------------------------
------------------------------------------------ 
 COST  Coef. Std. Err. t 
Pgtt 95 Conf. Interval ----------------
--------------------------------------------------
----------- N  342.6335 40.2195 
8.519 0.000 262.3978 422.8692 TECH 
 154110.9 26760.41 5.759 0.000 
100725.3 207496.4 WORKER  143362.4 
27852.8 5.147 0.000 87797.57 
198927.2 VOC  53228.64 31061.65 
1.714 0.091 -8737.646 115194.9 _cons 
 -54893.09 26673.08 -2.058 0.043 
-108104.4 -1681.748 ----------------------------
--------------------------------------------------
In the case of vocational schools, however, the t 
statistic is only 1.71, indicating that the 
overhead cost of such a school is not 
significantly greater than that of a general 
school. This is not surprising, given that the 
vocational schools are not much different from 
the general schools. Note that the null 
hypotheses for the tests on the coefficients of 
the dummy variables are than the overhead costs 
of the other schools are not different from those 
of the general schools.
34 
 43DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES
. reg COST N TECH WORKER VOC Source  
SS df MS Number of 
obs  74 -----------------------------------
---- F( 4, 69)  29.63 
Model  9.2996e11 4 2.3249e11 
 Prob gt F  0.0000 Residual  5.4138e11 
 69 7.8461e09 R-squared  
0.6320 --------------------------------------- 
 Adj R-squared  0.6107 Total  
1.4713e12 73 2.0155e10 Root 
MSE  88578 ------------------------------
------------------------------------------------ 
 COST  Coef. Std. Err. t 
Pgtt 95 Conf. Interval ----------------
--------------------------------------------------
----------- N  342.6335 40.2195 
8.519 0.000 262.3978 422.8692 TECH 
 154110.9 26760.41 5.759 0.000 
100725.3 207496.4 WORKER  143362.4 
27852.8 5.147 0.000 87797.57 
198927.2 VOC  53228.64 31061.65 
1.714 0.091 -8737.646 115194.9 _cons 
 -54893.09 26673.08 -2.058 0.043 
-108104.4 -1681.748 ----------------------------
--------------------------------------------------
Finally we will perform an F test of the joint 
explanatory power of the dummy variables as a 
group. The null hypothesis is H0 dT  dW  dV  
0. The alternative hypothesis is that at least 
one d is different from 0.
37 
 44DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES
. reg COST N TECH WORKER VOC Source  
SS df MS Number of 
obs  74 -----------------------------------
---- F( 4, 69)  29.63 
Model  9.2996e11 4 2.3249e11 
 Prob gt F  0.0000 Residual  5.4138e11 
 69 7.8461e09 R-squared  
0.6320 --------------------------------------- 
 Adj R-squared  0.6107 Total  
1.4713e12 73 2.0155e10 Root 
MSE  88578 ------------------------------
------------------------------------------------ 
 COST  Coef. Std. Err. t 
Pgtt 95 Conf. Interval ----------------
--------------------------------------------------
----------- N  342.6335 40.2195 
8.519 0.000 262.3978 422.8692 TECH 
 154110.9 26760.41 5.759 0.000 
100725.3 207496.4 WORKER  143362.4 
27852.8 5.147 0.000 87797.57 
198927.2 VOC  53228.64 31061.65 
1.714 0.091 -8737.646 115194.9 _cons 
 -54893.09 26673.08 -2.058 0.043 
-108104.4 -1681.748 ----------------------------
--------------------------------------------------
Finally we will perform an F test of the joint 
explanatory power of the dummy variables as a 
group. The null hypothesis is H0 dT  dW  dV  
0. The alternative hypothesis is that at least 
one d is different from 0. The residual sum of 
squares in the specification including the dummy 
variables is 5.411011.
38 
 45DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES
 . reg COST N Source  SS df 
MS Number of obs  
74 --------------------------------------- 
 F( 1, 72)  46.82 Model  
5.7974e11 1 5.7974e11 Prob gt 
F  0.0000 Residual  8.9160e11 72 
1.2383e10 R-squared  
0.3940 --------------------------------------- 
 Adj R-squared  0.3856 Total  
1.4713e12 73 2.0155e10 Root 
MSE  1.1e05 ------------------------------
------------------------------------------------ 
 COST  Coef. Std. Err. t 
Pgtt 95 Conf. Interval ----------------
--------------------------------------------------
----------- N  339.0432 49.55144 
6.842 0.000 240.2642 437.8222 _cons 
 23953.3 27167.96 0.882 0.381 
-30205.04 78111.65 ----------------------------
--------------------------------------------------
The residual sum of squares in the specification 
excluding the dummy variables is 8.921011.
39 
 46DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES
 . reg COST N Source  SS df 
MS Number of obs  
74 --------------------------------------- 
 F( 1, 72)  46.82 Model  
5.7974e11 1 5.7974e11 Prob gt 
F  0.0000 Residual  8.9160e11 72 
1.2383e10 R-squared  
0.3940 --------------------------------------- 
 Adj R-squared  0.3856 Total  
1.4713e12 73 2.0155e10 Root 
MSE  1.1e05 . reg COST N TECH WORKER 
VOC Source  SS df MS 
 Number of obs  74 ---------------
------------------------ F( 4, 
69)  29.63 Model  9.2996e11 4 
2.3249e11 Prob gt F  
0.0000 Residual  5.4138e11 69 7.8461e09 
 R-squared  0.6320 -------------
-------------------------- Adj 
R-squared  0.6107 Total  1.4713e12 73 
2.0155e10 Root MSE  88578 
The reduction in RSS when we include the dummies 
is therefore (8.92 - 5.41)1011. We will check 
whether this reduction is significant with the 
usual F test.
40 
 47DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES
 . reg COST N Source  SS df 
MS Number of obs  
74 --------------------------------------- 
 F( 1, 72)  46.82 Model  
5.7974e11 1 5.7974e11 Prob gt 
F  0.0000 Residual  8.9160e11 72 
1.2383e10 R-squared  
0.3940 --------------------------------------- 
 Adj R-squared  0.3856 Total  
1.4713e12 73 2.0155e10 Root 
MSE  1.1e05 . reg COST N TECH WORKER 
VOC Source  SS df MS 
 Number of obs  74 ---------------
------------------------ F( 4, 
69)  29.63 Model  9.2996e11 4 
2.3249e11 Prob gt F  
0.0000 Residual  5.4138e11 69 7.8461e09 
 R-squared  0.6320 -------------
-------------------------- Adj 
R-squared  0.6107 Total  1.4713e12 73 
2.0155e10 Root MSE  88578 
The numerator in the F ratio is the reduction in 
RSS divided by the cost, which is the 3 degrees 
of freedom given up when we estimate three 
additional coefficients (the coefficients of the 
dummies).
41 
 48DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES
 . reg COST N Source  SS df 
MS Number of obs  
74 --------------------------------------- 
 F( 1, 72)  46.82 Model  
5.7974e11 1 5.7974e11 Prob gt 
F  0.0000 Residual  8.9160e11 72 
1.2383e10 R-squared  
0.3940 --------------------------------------- 
 Adj R-squared  0.3856 Total  
1.4713e12 73 2.0155e10 Root 
MSE  1.1e05 . reg COST N TECH WORKER 
VOC Source  SS df MS 
 Number of obs  74 ---------------
------------------------ F( 4, 
69)  29.63 Model  9.2996e11 4 
2.3249e11 Prob gt F  
0.0000 Residual  5.4138e11 69 7.8461e09 
 R-squared  0.6320 -------------
-------------------------- Adj 
R-squared  0.6107 Total  1.4713e12 73 
2.0155e10 Root MSE  88578 
The denominator is RSS for the specification 
including the dummy variables, divided by the 
number of degrees of freedom remaining after they 
have been added. The F ratio is therefore 14.92.
42 
 49DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES
 . reg COST N Source  SS df 
MS Number of obs  
74 --------------------------------------- 
 F( 1, 72)  46.82 Model  
5.7974e11 1 5.7974e11 Prob gt 
F  0.0000 Residual  8.9160e11 72 
1.2383e10 R-squared  
0.3940 --------------------------------------- 
 Adj R-squared  0.3856 Total  
1.4713e12 73 2.0155e10 Root 
MSE  1.1e05 . reg COST N TECH WORKER 
VOC Source  SS df MS 
 Number of obs  74 ---------------
------------------------ F( 4, 
69)  29.63 Model  9.2996e11 4 
2.3249e11 Prob gt F  
0.0000 Residual  5.4138e11 69 7.8461e09 
 R-squared  0.6320 -------------
-------------------------- Adj 
R-squared  0.6107 Total  1.4713e12 73 
2.0155e10 Root MSE  88578 
F tables do not give the critical value for 3 and 
69 degrees of freedom, but it must be lower than 
the critical value with 3 and 60 degrees of 
freedom. This is 6.17, at the 0.1 significance 
level. 
44 
 50DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES
 . reg COST N Source  SS df 
MS Number of obs  
74 --------------------------------------- 
 F( 1, 72)  46.82 Model  
5.7974e11 1 5.7974e11 Prob gt 
F  0.0000 Residual  8.9160e11 72 
1.2383e10 R-squared  
0.3940 --------------------------------------- 
 Adj R-squared  0.3856 Total  
1.4713e12 73 2.0155e10 Root 
MSE  1.1e05 . reg COST N TECH WORKER 
VOC Source  SS df MS 
 Number of obs  74 ---------------
------------------------ F( 4, 
69)  29.63 Model  9.2996e11 4 
2.3249e11 Prob gt F  
0.0000 Residual  5.4138e11 69 7.8461e09 
 R-squared  0.6320 -------------
-------------------------- Adj 
R-squared  0.6107 Total  1.4713e12 73 
2.0155e10 Root MSE  88578 
Thus we reject H0 at a high significance level. 
This is not exactly surprising since t tests show 
that TECH and WORKER have highly significant 
coefficients.
45 
 51THE EFFECTS OF CHANGING THE REFERENCE CATEGORY
In the previous sequence we chose general 
academic schools as the reference (omitted) 
category and defined dummy variables for the 
other categories. This enabled us to compare the 
overhead costs of the other schools with those of 
general schools and to test whether the 
differences were significant. 
1 
 52THE EFFECTS OF CHANGING THE REFERENCE CATEGORY
However, suppose that we were interested in 
testing whether the overhead costs of skilled 
workers schools were different from those of the 
other types of school. How could we do this? It 
is possible to perform a t test using the 
variance-covariance matrix of the regression 
coefficients to calculate the relevant standard 
errors. But it is a pain and it is easy to make 
arithmetical errors.
3 
 53THE EFFECTS OF CHANGING THE REFERENCE CATEGORY
It is much simpler to re-run the regression 
making skilled workers schools the reference 
category. Now we need to define a dummy variable 
GEN for the general schools.
5 
 54THE EFFECTS OF CHANGING THE REFERENCE CATEGORY
 COST  b1  dTTECH  dVVOC  dGGEN  b2N  
u 
The model is shown in equation form. Note that 
there is no longer a dummy variable for skilled 
workers schools since they form the reference 
category. 
6 
 55THE EFFECTS OF CHANGING THE REFERENCE CATEGORY
 COST  b1  dTTECH  dVVOC  dGGEN  b2N  
u Skilled Workers' School COST  b1  b2N  
u (TECH  VOC  GEN  0) 
In the case of observations relating to skilled 
workers schools, all the dummy variables are 0 
and the model simplifies to the intercept and the 
term involving N.
7 
 56THE EFFECTS OF CHANGING THE REFERENCE CATEGORY
 COST  b1  dTTECH  dVVOC  dGGEN  b2N  
u Skilled Workers' School COST  b1  b2N  
u (TECH  VOC  GEN  0) Technical School COST 
 (b1  dT)  b2N  u (TECH  1 VOC  GEN  
0) 
In the case of observations relating to technical 
schools, TECH is equal to 1 and the intercept 
increases by an amount dT.
8 
 57THE EFFECTS OF CHANGING THE REFERENCE CATEGORY
 COST  b1  dTTECH  dVVOC  dGGEN  b2N  
u Skilled Workers' School COST  b1  b2N  
u (TECH  VOC  GEN  0) Technical School COST 
 (b1  dT)  b2N  u (TECH  1 VOC  GEN  
0) 
Note that dT should now be interpreted as the 
extra overhead cost of a technical school 
relative to that of a skilled workers school.
9 
 58THE EFFECTS OF CHANGING THE REFERENCE CATEGORY
 COST  b1  dTTECH  dVVOC  dGGEN  b2N  
u Skilled Workers' School COST  b1  b2N  
u (TECH  VOC  GEN  0) Technical School COST 
 (b1  dT)  b2N  u (TECH  1 VOC  GEN  
0) Vocational School COST  (b1  dV)  b2N  
u (VOC  1 TECH  GEN  0) General School COST 
 (b1  dG)  b2N  u (GEN  1 TECH  VOC  0)
Similarly one can derive the implicit cost 
functions for vocational and general schools, 
their d coefficients also being interpreted as 
their extra overhead costs relative to those of 
skilled workers schools.
10 
 59THE EFFECTS OF CHANGING THE REFERENCE CATEGORY
COST
Technical
dT
dG
b1dT
dV
b1
Workers
Vocational
b1dV
b1dG
General
N
This diagram illustrates the model graphically. 
Note that the d shifts are measured from the line 
for skilled workers schools.
11 
 60THE EFFECTS OF CHANGING THE REFERENCE CATEGORY
 . reg COST N TECH VOC GEN Source  SS 
 df MS Number of obs  
 74 --------------------------------------- 
 F( 4, 69)  29.63 Model  
9.2996e11 4 2.3249e11 Prob gt 
F  0.0000 Residual  5.4138e11 69 
7.8461e09 R-squared  
0.6320 --------------------------------------- 
 Adj R-squared  0.6107 Total  
1.4713e12 73 2.0155e10 Root 
MSE  88578 ------------------------------
------------------------------------------------ 
 COST  Coef. Std. Err. t 
Pgtt 95 Conf. Interval ----------------
--------------------------------------------------
----------- N  342.6335 40.2195 
8.519 0.000 262.3978 422.8692 TECH 
 10748.51 30524.87 0.352 0.726 
-50146.93 71643.95 VOC  -90133.74 
33984.22 -2.652 0.010 -157930.4 
-22337.07 GEN  -143362.4 27852.8 
-5.147 0.000 -198927.2 -87797.57 
_cons  88469.29 28849.56 3.067 0.003 
 30916.01 146022.6 ------------------------
--------------------------------------------------
----
Here is the Stata output for the regression. We 
will focus first on the regression coefficients.
13 
 61THE EFFECTS OF CHANGING THE REFERENCE CATEGORY
 COST  88,000  11,000TECH - 90,000VOC - 
143,000GEN  343N 
The regression result is shown written as an 
equation.
14 
 62THE EFFECTS OF CHANGING THE REFERENCE CATEGORY
 COST  88,000  11,000TECH - 90,000VOC - 
143,000GEN  343N Skilled Workers' School COST  
 88,000  343N (TECH  VOC  GEN  0) 
Putting all the dummy variables equal to 0, we 
obtain the equation for the reference category, 
the skilled workers schools.
15 
 63THE EFFECTS OF CHANGING THE REFERENCE CATEGORY
 COST  88,000  11,000TECH - 90,000VOC - 
143,000GEN  343N Skilled Workers' School COST  
 88,000  343N (TECH  VOC  GEN  0) Technical 
School COST  88,000  11,000  343N (TECH  1 
VOC  GEN  0)  99,000  343N 
Putting TECH equal to 1 and VOC and GEN equal to 
0, we obtain the equation for the technical 
schools.
16 
 64THE EFFECTS OF CHANGING THE REFERENCE CATEGORY
 COST  88,000  11,000TECH - 90,000VOC - 
143,000GEN  343N Skilled Workers' School COST  
 88,000  343N (TECH  VOC  GEN  0) Technical 
School COST  88,000  11,000  343N (TECH  1 
VOC  GEN  0)  99,000  343N Vocational 
School COST  88,000 - 90,000  343N (VOC  1 
TECH  GEN  0)  -2,000  343N General 
School COST  88,000 - 143,000  343N (GEN  1 
TECH  VOC  0)  -55,000  343N
And similarly we obtain the equations for the 
vocational and general schools, putting VOC and 
GEN equal to 1 in turn.
17 
 65THE EFFECTS OF CHANGING THE REFERENCE CATEGORY
 COST  88,000  11,000TECH - 90,000VOC - 
143,000GEN  343N Skilled Workers' School COST  
 88,000  343N (TECH  VOC  GEN  0) Technical 
School COST  88,000  11,000  343N (TECH  1 
VOC  GEN  0)  99,000  343N Vocational 
School COST  88,000 - 90,000  343N (VOC  1 
TECH  GEN  0)  -2,000  343N General 
School COST  88,000 - 143,000  343N (GEN  1 
TECH  VOC  0)  -55,000  343N
Note that the cost functions turn out to be 
exactly the same as when we used general schools 
as the reference category.
18 
 66THE EFFECTS OF CHANGING THE REFERENCE CATEGORY
Consequently the scatter diagram with regression 
lines is exactly the same as before.
19 
 67THE EFFECTS OF CHANGING THE REFERENCE CATEGORY
 . reg COST N TECH VOC GEN Source  SS 
 df MS Number of obs  
 74 --------------------------------------- 
 F( 4, 69)  29.63 Model  
9.2996e11 4 2.3249e11 Prob gt 
F  0.0000 Residual  5.4138e11 69 
7.8461e09 R-squared  
0.6320 --------------------------------------- 
 Adj R-squared  0.6107 Total  
1.4713e12 73 2.0155e10 Root 
MSE  88578 ------------------------------
------------------------------------------------ 
 COST  Coef. Std. Err. t 
Pgtt 95 Conf. Interval ----------------
--------------------------------------------------
----------- N  342.6335 40.2195 
8.519 0.000 262.3978 422.8692 TECH 
 10748.51 30524.87 0.352 0.726 
-50146.93 71643.95 VOC  -90133.74 
33984.22 -2.652 0.010 -157930.4 
-22337.07 GEN  -143362.4 27852.8 
-5.147 0.000 -198927.2 -87797.57 
_cons  88469.29 28849.56 3.067 0.003 
 30916.01 146022.6 ------------------------
--------------------------------------------------
----
The goodness of fit, whether measured by R2, RSS, 
or the standard error of the regression (the 
estimate of the standard deviation of u, here 
denoted Root MSE), is likewise not affected by 
the change.
20 
 68THE EFFECTS OF CHANGING THE REFERENCE CATEGORY
 . reg COST N TECH VOC GEN Source  SS 
 df MS Number of obs  
 74 --------------------------------------- 
 F( 4, 69)  29.63 Model  
9.2996e11 4 2.3249e11 Prob gt 
F  0.0000 Residual  5.4138e11 69 
7.8461e09 R-squared  
0.6320 --------------------------------------- 
 Adj R-squared  0.6107 Total  
1.4713e12 73 2.0155e10 Root 
MSE  88578 ------------------------------
------------------------------------------------ 
 COST  Coef. Std. Err. t 
Pgtt 95 Conf. Interval ----------------
--------------------------------------------------
----------- N  342.6335 40.2195 
8.519 0.000 262.3978 422.8692 TECH 
 10748.51 30524.87 0.352 0.726 
-50146.93 71643.95 VOC  -90133.74 
33984.22 -2.652 0.010 -157930.4 
-22337.07 GEN  -143362.4 27852.8 
-5.147 0.000 -198927.2 -87797.57 
_cons  88469.29 28849.56 3.067 0.003 
 30916.01 146022.6 ------------------------
--------------------------------------------------
----
But the t tests are affected. In particular, the 
meaning of a null hypothesis for a dummy variable 
coefficient being equal to 0 is different.
21 
 69THE EFFECTS OF CHANGING THE REFERENCE CATEGORY
 . reg COST N TECH VOC GEN Source  SS 
 df MS Number of obs  
 74 --------------------------------------- 
 F( 4, 69)  29.63 Model  
9.2996e11 4 2.3249e11 Prob gt 
F  0.0000 Residual  5.4138e11 69 
7.8461e09 R-squared  
0.6320 --------------------------------------- 
 Adj R-squared  0.6107 Total  
1.4713e12 73 2.0155e10 Root 
MSE  88578 ------------------------------
------------------------------------------------ 
 COST  Coef. Std. Err. t 
Pgtt 95 Conf. Interval ----------------
--------------------------------------------------
----------- N  342.6335 40.2195 
8.519 0.000 262.3978 422.8692 TECH 
 10748.51 30524.87 0.352 0.726 
-50146.93 71643.95 VOC  -90133.74 
33984.22 -2.652 0.010 -157930.4 
-22337.07 GEN  -143362.4 27852.8 
-5.147 0.000 -198927.2 -87797.57 
_cons  88469.29 28849.56 3.067 0.003 
 30916.01 146022.6 ------------------------
--------------------------------------------------
----
For example, the t statistic for the technical 
school coefficient is for the null hypothesis 
that the overhead costs of technical schools are 
the same as those of skilled workers schools.
22 
 70THE EFFECTS OF CHANGING THE REFERENCE CATEGORY
 . reg COST N TECH VOC GEN Source  SS 
 df MS Number of obs  
 74 --------------------------------------- 
 F( 4, 69)  29.63 Model  
9.2996e11 4 2.3249e11 Prob gt 
F  0.0000 Residual  5.4138e11 69 
7.8461e09 R-squared  
0.6320 --------------------------------------- 
 Adj R-squared  0.6107 Total  
1.4713e12 73 2.0155e10 Root 
MSE  88578 ------------------------------
------------------------------------------------ 
 COST  Coef. Std. Err. t 
Pgtt 95 Conf. Interval ----------------
--------------------------------------------------
----------- N  342.6335 40.2195 
8.519 0.000 262.3978 422.8692 TECH 
 10748.51 30524.87 0.352 0.726 
-50146.93 71643.95 VOC  -90133.74 
33984.22 -2.652 0.010 -157930.4 
-22337.07 GEN  -143362.4 27852.8 
-5.147 0.000 -198927.2 -87797.57 
_cons  88469.29 28849.56 3.067 0.003 
 30916.01 146022.6 ------------------------
--------------------------------------------------
----
The t ratio in question is only 0.35, so the null 
hypothesis is not rejected.
23 
 71THE EFFECTS OF CHANGING THE REFERENCE CATEGORY
 . reg COST N TECH VOC GEN Source  SS 
 df MS Number of obs  
 74 --------------------------------------- 
 F( 4, 69)  29.63 Model  
9.2996e11 4 2.3249e11 Prob gt 
F  0.0000 Residual  5.4138e11 69 
7.8461e09 R-squared  
0.6320 --------------------------------------- 
 Adj R-squared  0.6107 Total  
1.4713e12 73 2.0155e10 Root 
MSE  88578 ------------------------------
------------------------------------------------ 
 COST  Coef. Std. Err. t 
Pgtt 95 Conf. Interval ----------------
--------------------------------------------------
----------- N  342.6335 40.2195 
8.519 0.000 262.3978 422.8692 TECH 
 10748.51 30524.87 0.352 0.726 
-50146.93 71643.95 VOC  -90133.74 
33984.22 -2.652 0.010 -157930.4 
-22337.07 GEN  -143362.4 27852.8 
-5.147 0.000 -198927.2 -87797.57 
_cons  88469.29 28849.56 3.067 0.003 
 30916.01 146022.6 ------------------------
--------------------------------------------------
----
The t ratio for the coefficient of VOC is -2.65, 
so one concludes that the overheads of vocational 
schools are significantly lower than those of 
skilled workers schools, at the 1 significance 
level.
24 
 72THE EFFECTS OF CHANGING THE REFERENCE CATEGORY
 . reg COST N TECH VOC GEN Source  SS 
 df MS Number of obs  
 74 --------------------------------------- 
 F( 4, 69)  29.63 Model  
9.2996e11 4 2.3249e11 Prob gt 
F  0.0000 Residual  5.4138e11 69 
7.8461e09 R-squared  
0.6320 --------------------------------------- 
 Adj R-squared  0.6107 Total  
1.4713e12 73 2.0155e10 Root 
MSE  88578 ------------------------------
------------------------------------------------ 
 COST  Coef. Std. Err. t 
Pgtt 95 Conf. Interval ----------------
--------------------------------------------------
----------- N  342.6335 40.2195 
8.519 0.000 262.3978 422.8692 TECH 
 10748.51 30524.87 0.352 0.726 
-50146.93 71643.95 VOC  -90133.74 
33984.22 -2.652 0.010 -157930.4 
-22337.07 GEN  -143362.4 27852.8 
-5.147 0.000 -198927.2 -87797.57 
_cons  88469.29 28849.56 3.067 0.003 
 30916.01 146022.6 ------------------------
--------------------------------------------------
----
General schools clearly have lower overhead costs 
than the skilled workers schools, according to 
the regression.
25 
 73THE EFFECTS OF CHANGING THE REFERENCE CATEGORY
 . reg COST N TECH WORKER VOC --------------------
--------------------------------------------------
-------- COST  Coef. Std. Err. 
t Pgtt 95 Conf. Interval ----------
--------------------------------------------------
----------------- N  342.6335 
40.2195 8.519 0.000 262.3978 
422.8692 TECH  154110.9 26760.41 
5.759 0.000 100725.3 207496.4 WORKER 
 143362.4 27852.8 5.147 0.000 
87797.57 198927.2 VOC  53228.64 
31061.65 1.714 0.091 -8737.646 
115194.9 _cons  -54893.09 26673.08 
-2.058 0.043 -108104.4 
-1681.748 ----------------------------------------
-------------------------------------- . reg COST 
N TECH VOC GEN -----------------------------------
------------------------------------------- 
COST  Coef. Std. Err. t Pgtt 
 95 Conf. Interval ------------------------
--------------------------------------------------
--- N  342.6335 40.2195 8.519 
0.000 262.3978 422.8692 TECH  
10748.51 30524.87 0.352 0.726 
-50146.93 71643.95 VOC  -90133.74 
33984.22 -2.652 0.010 -157930.4 
-22337.07 GEN  -143362.4 27852.8 
-5.147 0.000 -198927.2 -87797.57 
_cons  88469.29 28849.56 3.067 0.003 
 30916.01 146022.6 ------------------------
--------------------------------------------------
----
Note that there are some differences in the 
standard errors. The standard error of the 
coefficient of N is unaffected. 
26 
 74THE EFFECTS OF CHANGING THE REFERENCE CATEGORY
 . reg COST N TECH WORKER VOC --------------------
--------------------------------------------------
-------- COST  Coef. Std. Err. 
t Pgtt 95 Conf. Interval ----------
--------------------------------------------------
----------------- N  342.6335 
40.2195 8.519 0.000 262.3978 
422.8692 TECH  154110.9 26760.41 
5.759 0.000 100725.3 207496.4 WORKER 
 143362.4 27852.8 5.147 0.000 
87797.57 198927.2 VOC  53228.64 
31061.65 1.714 0.091 -8737.646 
115194.9 _cons  -54893.09 26673.08 
-2.058 0.043 -108104.4 
-1681.748 ----------------------------------------
-------------------------------------- . reg COST 
N TECH VOC GEN -----------------------------------
------------------------------------------- 
COST  Coef. Std. Err. t Pgtt 
 95 Conf. Interval ------------------------
--------------------------------------------------
--- N  342.6335 40.2195 8.519 
0.000 262.3978 422.8692 TECH  
10748.51 30524.87 0.352 0.726 
-50146.93 71643.95 VOC  -90133.74 
33984.22 -2.652 0.010 -157930.4 
-22337.07 GEN  -143362.4 27852.8 
-5.147 0.000 -198927.2 -87797.57 
_cons  88469.29 28849.56 3.067 0.003 
 30916.01 146022.6 ------------------------
--------------------------------------------------
----
The one test involving the dummy variables that 
can be performed with either specification is the 
test of whether the overhead costs of general 
schools and skilled workers schools are 
different. The choice of specification can make 
no difference to the outcome of this test. The 
only difference is caused by the fact that the 
regression coefficient has become negative in the 
second specification. The standard error is the 
same, so the t statistic has the same absolute 
magnitude and the outcome of the test must be the 
same.
27 
 75THE EFFECTS OF CHANGING THE REFERENCE CATEGORY
 . reg COST N T