An Efficient Data Envelopment Analysis with a large data set in Stata - PowerPoint PPT Presentation

About This Presentation
Title:

An Efficient Data Envelopment Analysis with a large data set in Stata

Description:

An Efficient Data Envelopment Analysis with a large data set in ... Malmquist Index Analysis with the Panel Data Basic Concept of Malmquist Index The User Written ... – PowerPoint PPT presentation

Number of Views:426
Avg rating:3.0/5.0
Slides: 40
Provided by: 6649848
Category:

less

Transcript and Presenter's Notes

Title: An Efficient Data Envelopment Analysis with a large data set in Stata


1
An Efficient Data Envelopment Analysis with a
large data set in Stata
  • 15-16 July, 2010
  • Boston10 Stata Conference
  • Choonjoo Lee, Kyoung-Rok Lee
  • sarang90_at_kndu.ac.kr, bloom.rampike_at_gmail.com
  • Korea National Defense University

2
Contents
  • Part I. A Large Data Set in Stata/DEA
  • Large Data Set in DEA?
  • Computational Aspects of Large Data Set
  • The Scope of this Study
  • Efficiency Matters in Stata/DEA/Linear
    Programming
  • Tasks to be covered
  • Part II. Malmquist Index Analysis with the Panel
    Data
  • Basic Concept of Malmquist Index
  • The User Written Command malmq

3
  • Part I. A Large Data Set in Stata/DEA
  • Large Data Set in DEA?
  • Computational Aspects of Large Data Set
  • The Scope of this Study
  • Efficiency Matters in Stata/DEA/Linear
    Programming
  • Tasks to be covered

4
Large Data Set in DEA?
  • Graphical illustration of DEA concept

5
Large Data Set in DEA?
  • Variables and Observation Constraints by the
    Features of DEA Domain Programs(Language)
  • Statistical Package based DEA Programs
  • Spreadsheet based DEA Programs
  • Language based DEA Codes
  • Performance of Linear Program(LP) Efficiency and
    Accuracy
  • LP is the Critical Component of DEA Program
  • Approaches to Solve LP Simplex, Interior Point
    Methods(IPMs)
  • ? Numerous Variants of the Basic LP Approach
  • DEA Report Format(User Interface Design)
  • Results(input, output)
  • Graphical Display
  • Log

6
Computational Aspects of Large Data Set
  • Matrix Size for the Data Set in Matrix Format
  • of rows and columns(variables and observations)
    allowed by the Program
  • The storage limit of the computer memory
  • upgrade of computer technology, the way to access
    the data in the memory
  • Matrix Density
  • of nonzeros of the matrix
  • How many zero elements in the matrix?
  • A Computationally Demanding Procedure of DEA due
    to the LP
  • The number of iterations needed to solve a
    problem grows exponentionally as a function of
    variables and observations
  • Numerical Difficulties
  • Inaccuracy and inefficiency due to the Floating
    Point Arithmetic with finite precision
  • Numerical Precision due to the binary
    representation of number

7
The Scope of this Study
  • Performance of DEA code
  • Linear Program/Simplex Method
  • Computational Technique
  • Illustration
  • Panel Data in DEA
  • Malmquist Index Analysis

8
Efficiency Matters in Stata/DEA/LP
  • DEA program demands heavy computation
  • Computation time heavily depends on the number of
    observations(DMUs), variables(inputs, outputs),
    LP process, etc.
  • Stata uses RAM(memory) to store data
  • The memory size matters for the large data set

9
Efficiency Matters in Stata/DEA/LP
  • The performance of Input Oriented DEA models

Model Computation(sec) Memory Major Areas Revised
5-2-2-V1 20 1G
5-2-2-V2 (released) lt2 lt300M Basic feasible solution
5-5-5-V3 lt1 lt300M Revised Simplex Method
365-1-5-V1 ? 6G
365-1-5-V2 14600 6G Two-stage LP
365-1-5-V3 (under development) 20 lt300M Mata, Tolerance
? Stata SE
10
Efficiency Matters in Stata/DEA/LP
  • Understanding the difference of computation

Method Operation Pivoting Pricing Total
Tableau Simplex Multiplication,Division (m1)(n-m1) m(n-m)n1
Tableau Simplex Addition,Subtraction m(n-m1) m(n-m1)
Revised Simplex Multiplication,Division (m1)2 m(n-m) m(n-m)(m1)2
Revised Simplex Addition,Subtraction m(m1) m(n-m) m(n1)
  • if the number of observations(n) becomes
    significantly larger than the number of
    variables(m)?

11
Efficiency Matters in Stata/DEA/LP
  • Tableau and Revised Simplex in DEA/LP
  • Data
  • Source Cooper et al.(2006), table3-7

Store Input Data Input Data Output Data Output Data
Store Employee Area Sales Profit
A 10 20 70 6
B 15 15 100 3
C 20 30 80 5
D 25 15 100 2
E 12 9 90 8
12
Efficiency Matters in Stata/DEA/LP
  • Tableau and Revised Simplex in DEA/LP
  • For DMU A

Store Input Data Input Data Output Data Output Data
Store Employee Area Sales Profit
A 10 20 70 6
  • The Basic DEA Models

Orientation Constant Return to Scale Variable Returns to Scale
Input Oriented Min ? s.t. ?xA - X? 0 Y? -yA 0 ? 0 Min ? s.t. ?xA - X? 0 Y? -yA 0 e?1 ? 0
Output Oriented Max ? s.t. xA - Xµ 0 ?yA -yµ 0 µ 0 Max ? s.t. xA - Xµ 0 ?yA -yµ 0 e?1 µ 0
13
Efficiency Matters in Stata/DEA/LP
  • Program Structure

14
Efficiency Matters in Stata/DEA/LP
  • Program Syntax
  • dea ivars ovars if in , rts(crs
    vrs drs irs) ort(in out) stage(1 2) trace
    saving(filename)
  • rts(crs vrs drs irs) specifies the returns
    to scale. The default, rts(crs), specifies
    constant returns to scale.
  • ort(in out) specifies the orientation. The
    default is ort(in), meaning input-oriented DEA.
  • stage(1 2) specifies the way to identify all
    efficiency slacks. The default is stage(2),
    meaning two-stage DEA.
  • trace specifies to save all the sequences
    displayed in the Results window in the dea.log
    file. The default is to save the final results in
    the dea.log file.
  • saving(filename) specifies that the results be
    saved in filename.dta.

15
Efficiency Matters in Stata/DEA/LP
  • Develop the Basic Data Bank(input oriented CRS)
  • Canonical form
  • Standard form

Min ? s.t. 10? - 10?A - 15?B - 20?C - 25?D - 12?E 0 20? - 20?A - 15?B - 30?C - 15?D - 9?E 0 70?A 100?B 80?C 100?D 90?E 70 6?A 3?B 5?C 2?D 8?E 6
Min ? s.t. 10? - 10?A - 15?B - 20?C - 25?D - 12?E - S1- x1 0 20? - 20?A - 15?B - 30?C - 15?D - 9?E - S2- x2 0 70?A 100?B 80?C 100?D 90?E - S1 x3 70 6?A 3?B 5?C 2?D 8?E -S2 x4 6
16
Efficiency Matters in Stata/DEA/LP
  • Model V1 Tableau DEA

X ? ?A ?B ?C ?D ?E S1- S2- S1 S2 x1 x2 x3 x4 RHS MRT
1 0 0 0 0 0 0 0 0 0 0 -1 -1 -1 -1 0
x1 0 10 -10 -15 -20 -25 -12 -1 0 0 0 1 0 0 0 0
x2 0 20 -20 -15 -30 -15 -9 0 -1 0 0 0 1 0 0 0
x3 0 0 70 100 80 100 90 0 0 -1 0 0 0 1 0 70
x4 0 0 6 3 5 2 8 0 0 0 -1 0 0 0 1 6

1 30 46 73 35 62 77 -1 -1 -1 -1 0 0 0 0 76
x1 0 10 -10 -15 -20 -25 -12 -1 0 0 0 1 0 0 0 0
x2 0 20 -20 -15 -30 -15 -9 0 -1 0 0 0 1 0 0 0
x3 0 0 70 100 80 100 90 0 0 -1 0 0 0 1 0 70 70/90
x4 0 0 6 3 5 2 8 0 0 0 -1 0 0 0 1 6 6/8

? 1 30 -47/4 353/8 -105/8 171/4 0 -1 -1 -1 69/8 0 0 0 -77/8 73/4
x1 0 10 -1 -21/2 -25/2 -22 0 -1 0 0 -3/2 1 0 0 3/2 9
x2 0 20 -53/4 -93/8 -195/8 -51/4 0 0 -1 0 -9/8 0 1 0 9/8 27/4
x3 0 0 5/2 265/4 95/4 155/2 0 0 0 -1 45/4 0 0 1 -45/4 5/2 10/265
?E 0 0 6/8 3/8 5/8 2/8 1 0 0 0 -1/8 0 0 0 1/8 6/8 1/2
17
Efficiency Matters in Stata/DEA/LP
  • Model V1 Tableau DEA
  • Efficiency score(?) of DMU A is 14/15

Z ? ?A ?B ?C ?D ?E S1- S2- S1 S2 RHS MRT
? 1 0 0 -11/70 -32/35 -89/70 0 -39/350 1/175 -1/70 0 1
?A 0 0 1 1/7 6/21 -33/21 0 -6/35 3/35 -1/70 0 1 35/3
? 0 1 0 -11/70 -32/35 -267/210 0 -39/350 1/175 -1/70 0 1 175/1
S2 0 0 0 41/7 43/21 152/21 0 4/105 -2/105 -159/1855 1 0
?E 0 0 0 49/8 59/24 182/21 1 1/6 -1/12 -159/2120 0 0

? 1 0 -1/15 -1/6 -14/15 -7/6 0 -1/10 0 -1/75 0 14/15
S2- 0 0 35/3 5/3 10/3 -55/3 0 -2 1 -1/6 0 35/3
? 0 1 -1/15 -1/6 -14/15 -7/6 0 -1/10 0 -1/15 0 14/15
S2 0 0 2/9 53/9 19/9 62/9 0 0 0 -4/45 1 2/9
?E 0 0 35/36 451/72 177/72 257/36 1 0 0 -4/45 0 35/36
18
Efficiency Matters in Stata/DEA/LP
  • Model V3 Revised DEA

19
Efficiency Matters in Stata/DEA/LP
  • Model V3 Revised DEA
  • Step1 Set up the initial tableau factors.
  • Step2 Find entering variable.
  • Step3 Find leaving variable.
  • Step4 Update the tableau. (Update the basis.)

cN
cB
X ? ?A ?B ?C ?D ?E S1- S2- S1 S2 x1 x2 x3 x4 RHS
1 0 0 0 0 0 0 0 0 0 0 -1 -1 -1 -1 0
x1 0 10 -10 -15 -20 -25 -12 -1 0 0 0 1 0 0 0 0
x2 0 20 -20 -15 -30 -15 -9 0 -1 0 0 0 1 0 0 0
x3 0 0 70 100 80 100 90 0 0 -1 0 0 0 1 0 70
x4 0 0 6 3 5 2 8 0 0 0 -1 0 0 0 1 6
N
B
b
20
Efficiency Matters in Stata/DEA/LP
  • Model V3 Revised DEA
  • - 1st step The initial tableau factors.
  • B xB
    CB CBB-1
  • - 2nd step Finding entering variable
  • cN -cBB-1N Max value is selected as a entering
    variable

? ?A ?B ?C ?D ?E S1- S2- S1 S2
30 46 73 35 62 77 -1 -1 -1 -1
Max
- 3rd step Finding entering variable B-1N

MinxB/(B-1N) , , 70/90, 6/8 6/8 (?x4)
21
Efficiency Matters in Stata/DEA/LP
  • Model V3 Revised DEA
  • - 4th step Update the tableau

cN
cB
X ? ?A ?B ?C ?D ?E S1- S2- S1 S2 x1 x2 x3 x4 RHS
1 0 0 0 0 0 0 0 0 0 0 -1 -1 -1 -1 0
x1 0 10 -10 -15 -20 -25 -12 -1 0 0 0 1 0 0 0 0
x2 0 20 -20 -15 -30 -15 -9 0 -1 0 0 0 1 0 0 0
x3 0 0 70 100 80 100 90 0 0 -1 0 0 0 1 0 70
x4 0 0 6 3 5 2 8 0 0 0 -1 0 0 0 1 6
N
B
b
X ? ?A ?B ?C ?D x4 S1- S2- S1 S2 x1 x2 x3 x4 RHS
1 0 0 0 0 0 -1 0 0 0 0 -1 -1 -1 0 0
x1 0 10 -10 -15 -20 -25 0 -1 0 0 0 1 0 0 -12 0
x2 0 20 -20 -15 -30 -15 0 0 -1 0 0 0 1 0 -9 0
x3 0 0 70 100 80 100 0 0 0 -1 0 0 0 1 90 70
?E 0 0 6 3 5 2 1 0 0 0 -1 0 0 0 8 6
22
Tasks to be covered
  • Computational Accuracy
  • Example Obtaining Inverse Matrix
  • Matrix D

1 1.341099143 -61.13394928 0.4455321 1.883781314 2.587946653
0 0 0 0.0588235 0 0
0 0.116421975 -6.672515869 -0.110761 0.495342732 -0.097138606
0 -0.172319263 -19.71403694 -0.262333 -0.074690066 1.54739666
0 -0.046367686 -4.060891628 -0.082268 -0.009800959 0.25169459
0 0.105886854 4.651313305 0.1136269 -0.015884314 0.037229143
23
Tasks to be covered
  • Computational Accuracy
  • Example Obtaining Inverse Matrix
  • Inverse matrix D by Stata/Mata luinv (D)

1 162470623.2 -4.022811871 -81235306 487411816.6 81235289.98
0 -147760451.4 -0.087162294 73880208 -443281245.5 -73880196.74
0 3410527.559 0.007873073 -1705264 10231581.38 1705263.517
0 16.99999999 -2.96E-17 -2.77E-08 1.66E-07 2.77E-08
0 86785601.44 2.18378179 -43392792 260356746.7 43392788.04
0 31184842.39 0.196004759 -15592418 93554511.28 15592419.02
24
Tasks to be covered
  • Computational Accuracy
  • Example Obtaining Inverse Matrix
  • Inverse matrix D by Stata/Mata luinv (D)

25
Tasks to be covered
  • Computational Accuracy
  • Example Obtaining Inverse Matrix
  • DD-1 in Stata/Mata(default tolerance)

1 5.96E-08 2.36E-08 -3.73E-08 5.96E-08 -7.45E-08
0 1.000000003 -1.74E-18 -1.63E-09 9.78E-09 1.63E-09
0 4.66E-10 1 -1.63E-09 -2.98E-08 -3.96E-09
0 -1.49E-08 1.81E-09 1 0 -7.45E-09
0 -2.79E-09 2.95E-10 4.66E-10 0.999999989 -1.40E-09
0 4.66E-09 3.84E-11 -1.28E-09 7.45E-09 1.000000001
  • Should it be Identity Matrix?

26
Tasks to be covered
  • Computational Accuracy
  • Example Obtaining Inverse Matrix
  • DD-1 in Excel

1 5.96046E-08 -7.77156E-16 7.45058E-09 -5.96046E-08 -1.49012E-08
0 0.999999999 2.72414E-17 0 7.31257E-09 0
0 4.19095E-09 1 6.98492E-10 1.49012E-08 7.21775E-09
0 1.49012E-08 0 0.999999996 0 0
0 9.31323E-10 -3.46945E-17 -4.65661E-10 0.999999996 -9.31323E-10
0 -4.88944E-09 4.85723E-17 4.19095E-09 -2.42144E-08 1
  • Where the computational inaccuracy comes from?

27
Tasks to be covered
  • Computational Accuracy
  • One of the possible reasons Decimal and Binary
    numbers
  • 17(decimal number)
  • 17 / 2 1
  • 8 / 2 0
  • 4 / 2 0
  • 2 / 2 0
  • 1 / 2 1
  • 10001(binary number)
  • How computer saves a0.75, b0.70.05,
    c0.60.10.05?

28
Tasks to be covered
  • Accuracy
  • Tolerance
  • to set upper or lower limit on the number of
    iterations.
  • to stop an unattended run if the algorithm falls
    into a cycle
  • Preprocessing Scaling
  • to improve the numerical gap and get a safe
    solution.
  • Ex) Rank(D)

29
  • Part II. Malmquist Index Analysis with the Panel
    Data
  • Basic Concept of Malmquist Index
  • The User Written Command malmq

30
Basic Concept of Malmquist Index
  • Malmquist Productivity Index(MPI) measures the
    productivity changes along with time variations
    and can be decomposed into changes in efficiency
    and technology.

31
Basic Concept of Malmquist Index
32
Basic Concept of Malmquist Index
The input oriented MPI can be expressed in terms
of input oriented CRS efficiency as Equation 1
and 2 using the observations at time t and t1.
33
Basic Concept of Malmquist Index
The input oriented geometric mean of MPI can be
decomposed using the concept of input oriented
technical change and input oriented efficiency
change as given in equation 4.
34
The User written command malmq
  • Program Syntax
  • malmq ivars ovars if in , ort(in
    out) period(varname) trace saving(filename)
  • ort(in out) specifies the orientation. The
    default is ort(in), meaning input-oriented DEA.
  • period(varname) identifies the time variable.
  • trace specifies to save all the sequences
    displayed in the Results window in the malmq.log
    file. The default is to save the final results in
    the malmq.log file.
  • saving(filename) specifies that the results be
    saved in filename.dta.

35
The User written command malmq
  • Example
  • Data

36
The User written command malmq
  • Example
  • Result

37
The User written command malmq
  • Example
  • Result

38
Notes
  • The data and code related to the presentation
    will be available from the Conference website.

39
References
  • Cooper, W. W., Seiford, L. M., Tone, A. (2006).
    Introduction to Data Envelopment Analysis and Its
    Uses, Springer ScienceBusiness Media.
  • Ji, Y., Lee, C. (2010). Data Envelopment
    Analysis, The Stata Journal, 10(no.2),
    pp.267-280.
  • Lee, C., Ji, Y. (2009). Data Envelopment
    Analysis in Stata, DC09 Stata Conference.
  • Maros, Istvan. (2003). Computational techniques
    of the simplex method, Kluwer Academic Publishers.
Write a Comment
User Comments (0)
About PowerShow.com