Title: PM1
1Process mining Discovering Process Models from
Event Logs
- Prof.dr.ir. Wil van der Aalst
- Eindhoven University of Technology, P.O.Box 513,
NL-5600 MB, - Eindhoven, The Netherlands.
2Outline
- Who we are ...
- IT group
- selected research projects
- Process mining
- purpose
- basic idea
- (re)discovery problem
- mining algorithm a(W)
- comparison
- example/tools
- case study
- Conclusion
3Who we are ...
4Information Technology (IT) group at EUT
- IT group (35 persons), Department of Technology
Management, Eindhoven University of Technology. - Three subgroups
- Business Process Management(workflow management,
Petri nets, mining, ...) - ICT Architectures(agents, transactions, ...)
- Software Engineering(software quality, ...)
5Selected research projects
- process mining
- workflow verification
- workflow patterns
- web services composition languages
- case handling
- XRL/flower
- business process improvement
- ...
- In most cases using/extending Petri net theory!
6Workflow verification Woflan
- Can interface with Staffware, Protos, COSA,
Meteor. - Can handle Event-driven Process Chains (ARIS)
7Workflow patterns
- The academicresponse
- A quest for the basic requirements
- 20 basic patterns
- 20 systems evaluated
- Joint work with QUT, ATOS, etc.
- http//www.tm.tue.nl/it/research/patterns
- /- 150 pageviews per working day (gt25.000 in
total)
8Web services composition languages
- Also process support.
- Standards considered are BPML, BPEL4WS, XLANG,
WSFL, WSCI. - Joint work with QUT (Brisbane, Australia).
9Process mining
- Team members
- Wil van der Aalst
- Ton Weijters
- Laura Maruster
- Ana-Karla Medeiros
- Boudewijn van Dongen
- Eric Verbeek
10Business Process Management
11No feedback loop
12The basic idea
process mining
13Toy example
case 1 task A case 2 task A case 3 task A
case 3 task B case 1 task B case 1 task
C case 2 task C case 4 task A case 2
task B case 2 task D case 5 task A case 4
task C case 1 task D case 3 task C case
3 task D case 4 task B case 5 task E
case 5 task D case 4 task D
ABCD cases 1,3 ACBD cases 2,4 AED case 5
14Result A Petri net model
a(W)
ABCD ACBD AED
Petri nets are used as a formalism, the target
language can be different, e.g., Event-driven
Process Chains.
15Focus of this presentation is on the following
theoretical question
16- Assumption complete workflow logs without noise.
- Let T be a set of tasks. s Î T is a workflow
trace and W Í T is a workflow log. - Let W be a workflow log over T, i.e., W Í T. Let
a,b Î T - a gt W b if and only if there is a trace s t1 t2
t3 ¼tn-1 and i Î 1, ¼, n-2 such that s Î W and
ti a and ti1 b, - a W b if and only if a gt W b and not (b gt W a),
- a W b if and only if not(a gt W b) and not(b gt
W a), and - a W b if and only if a gt W b and b gt W a.
- Let N (P,T,F) be a sound WF-net, i.e., N Î W. W
is a workflow log of N if and only if W Í T and
every trace s Î W is a firing sequence of N
starting in state i, i.e., (N,i)\protectsñ. - W is a complete workflow log of N if and only if
(1) for any workflow log W of N gt W Í gt W and
(2) for any t Î T there is a s Î W such that t Î
s.
17Example 1
W A B C D, A C B D, A E D
case 1 task A case 2 task A case 3 task A
case 3 task B case 1 task B case 1 task
C case 2 task C case 4 task A case 2
task B case 2 task D case 5 task A case 4
task C case 1 task D case 3 task C case
3 task D case 4 task B case 5 task E
case 5 task D case 4 task D
A gt W B A gt W C A gt W E B gt W C B gt W D C gt W
B C gt W D E gt W D
AW B A W C A W E B W D C W D E W D
B W C C W B
W rest
XW Y xor YW X xor X W Y xor X W Y
Log is complete if this relation cannot be
extended
18Example 2
W A B C D, A C B D is complete
A gt W B A gt W C B gt W C B gt W D C gt W B C gt W D
AW B A W C B W D C W D
B W C C W B
W rest
19Example 3
W A B D, A C D is complete
AW B A W C B W D C W D
A gt W B A gt W C B gt W D C gt W D
W none
W rest
20Causal relations imply connecting places
- Let N (P,T,F) be a sound WF-net and let W be a
complete workflow log of N. For any a,b Î T a W
b implies a Ç b ¹ Æ. - I.e., if there is a causal relation between two
transitions according to the workflow log, then
there has to be a place connecting these two
transitions. - Surprisingly this holds for any sound WF-net!
AW B A W C B W D C W D
21Connecting places often imply causal relations
- Let N (P,T,F) be a sound SWF-net and let W be a
complete workflow log of N. For any a,b Î T a
Ç b ¹ Æ and b Ç a Æ implies a W b. - No short loops (i.e., loops of length 1 or 2).
- Structured Workflow Nets (SWF-nets) have no
implicit places and the following two constructs
cannot be used
22Example 4 loops of length 1 are harmful
AW B A W D B W D
There is a place connecting B to B but not B W
B.
23Example 5 loops of length 2 are harmful
There is a place connecting B to C but not B W C
(because C can be followed directly by B).
AW B B W D
There is a place connecting C to B but not C W B
(because B can be followed directly by C).
24Example 6 Implicit places remain undetected
AW B B W C
More complex examples can be given showing that
the two other requirements for non-SWF-nets are
needed.
25Parallelism can often be detected
- Let N (P,T,F) be a sound SWF-net such that for
any a,b Î T a Ç b Æ or b Ç a Æ and let
W be a complete workflow log of N. - If a,b Î T and a Ç b ¹ Æ, then a W b.
- If a,b Î T and a Ç b ¹ Æ, then a W b.
- If a,b,t Î T, a W t, b W t, and a Wb, then a
Ç b Çt ¹ Æ. - If a,b,t Î T, t W a, t W b, and a Wb, then a
Ç b Çt ¹ Æ. - This is a complex way of stating that for sound
SWF-nets without short loops, it is possible to
distinguish XOR-splits from AND-splits and
XOR-joins from AND-joins.
26Mining algorithm a(W)
- Let W be a workflow log over T. a(W) is defined
as follows. - TW t Î T s Î W t Î s,
- TI t Î T s Î W t first(s) ,
- TO t Î T s Î W t last(s) ,
- XW (A,B) A Í TW Ù B Í TW Ù "a Î A"b Î B
a W b Ù "a1,a2 Î A a1W a2 Ù "b1,b2 Î B
b1W b2 , - YW (A,B) Î X "(A,B) Î XA Í A ÙB Í BÞ
(A,B) (A,B) , - PW p(A,B) (A,B) Î YW ÈiW,oW,
- FW (a,p(A,B)) (A,B) Î YW Ù a Î A È
(p(A,B),b) (A,B) Î YW Ù b Î B È (iW,t)
t Î TI È (t,oW) t Î TO, and - a(W) (PW,TW,FW).
27Solution to the rediscovery problem
- Let N (P,T,F) be a sound SWF-net and let W be a
complete workflow log of N. If for all a,b Î T a
Çb Æ or b Ça Æ, then a(W) N modulo
renaming of places. - I.e., any sound SWF-net without short loops can
be rediscovered!
28Example 7 Sound SWF-net without short loops
29Example 8 A WF-net with an implicit place
a(W)
30Example 9 Loop of length 1
a(W)
31Example 10 Loop of length 2
a(W)
32Example 11 Loop of length 3
a(W)
No problem!
33Example 12 Non-free-choice constructs may be
harmful
a(W)
34Example 13 Free-choice is not enough
a(W)
Behaviorally equivalent!
35Example 14 Example with hidden tasks ?
Any suggestions?
36Simplification!
a(W)
Behaviorally equivalent!
37Results and issues
- Proven to be correct for a large class of
processes. - Notion of completeness is needed (direct
successor relation). - Can handle parallelism and time.
- Open issues
- noise
- incomplete logs
- data
- advanced process patterns (hidden tasks, NFC,
etc.) - behavioral equivalence
- On each of these issues we have some preliminary
results.
38Scientific competition
- J.E. Cook (and A.L. Wolf) New Mexico State
University/ University of Colorado, USA - J. Herbst (and D. Karagiannis) DaimlerChrysler,
Germany - R. Agrawal, D. Gunopulos, M.K. Maxeiner,
K. Küspert, and F. Leymann IBM, Germany - G. Schimm OFFIS, Germany
- S.Y. Hwang et al. Sun Yeat-Sen University,
Taiwan - M. Golani and S.S. Pinter IBM, Israel
- D. Grigori, F. Casati, et al. HP, USA
- Our approach differs because we incorporate time
and noise and take parallelism as a starting
point.
39Practical competition (ARIS PPM)
- IDS Scheer's ARIS Process Performance Manager.
- No process mining but interesting links with
systems like SAP.
40Tools/standards for process mining
41Example processing customer orders
Example in Staffware 7 tasks and all basic
routing constructs
42Fragment of Staffware log
- Case 21
- Diractive Description Event User
yyyy/mm/dd hhmm - --------------------------------------------------
-------------------------- - Start
swdemo_at_staffw_edl 2003/02/05 1500 - Register order Processed To
swdemo_at_staffw_edl 2003/02/05 1500 - Register order Released By
swdemo_at_staffw_edl 2003/02/05 1500 - Prepare shipment Processed To
swdemo_at_staffw_edl 2003/02/05 1500 - (Re)send bill Processed To
swdemo_at_staffw_edl 2003/02/05 1500 - (Re)send bill Released By
swdemo_at_staffw_edl 2003/02/05 1501 - Receive payment Processed To
swdemo_at_staffw_edl 2003/02/05 1501 - Prepare shipment Released By
swdemo_at_staffw_edl 2003/02/05 1501 - Ship goods Processed To
swdemo_at_staffw_edl 2003/02/05 1501 - Ship goods Released By
swdemo_at_staffw_edl 2003/02/05 1502 - Receive payment Released By
swdemo_at_staffw_edl 2003/02/05 1502 - Archive order Processed To
swdemo_at_staffw_edl 2003/02/05 1502 - Archive order Released By
swdemo_at_staffw_edl 2003/02/05 1502 - Terminated
2003/02/05 1502 - Case 22
43Fragment of XML file
- lt?xml version"1.0"?gt
- lt!DOCTYPE WorkFlow_log SYSTEM "http//www.tm.tue.n
l/it/research/workflow/mining/WorkFlow_log.dtd"gt - ltWorkFlow_loggt
- ltsource program"staffware"/gt
- ltprocess id"main_process"gt
- ltcase id"case_0"gt
- ltlog_linegt
- lttask_namegtCase startlt/task_namegt
- ltevent kind"normal"/gt
- ltdategt05-02-2003lt/dategt
- lttimegt1504lt/timegt
- lt/log_linegt
- ltlog_linegt
- lttask_namegtRegister orderlt/task_namegt
- ltevent kind"schedule"/gt
- ltdategt05-02-2003lt/dategt
- lttimegt1504lt/timegt
44EMiT
Focus on time and causality.
45Thumb
Focus on noise.
46Thumb is able to deal with noise (D/F-graphs)
10 noise
no noise
causality
47Real case CJIB
- Processing of fines
- 130136 cases
- 99 different activities
48Process in EMiT
49Complete process model
Validated by CJIB
50SAP R/3
51Conclusion
- Process mining is both a scientific and practical
challenge. - Preliminary results are promising.
- Challenging problems
- Finding the right data in real information
systems. - Dealing with noise and incompleteness.
- Dealing with advanced synchronization patterns.
- Dealing with hidden tasks/behavioral equivalence.