Title: Sequence Classification Using Statistical Pattern Recognition
1Sequence Classification Using Statistical Pattern
Recognition
.
- José Antonio Iglesias, Agapito Ledezma,
- and Araceli Sanchis
- Computer Science Department
- Universidad Carlos III de Madrid
- Avda. de la Universidad, 30. 28911 Leganés, Spain
- jiglesia, ledezma, masm_at_inf.uc3m.es
2.
Outline
- Motivation and Introduction
- Sequence classification
- Our approach
- Library Creation
- Classification
- Target Environment
- Description
- Experiments and Results
- Conclusions and Future Works
1
3Outline
- Motivation and Introduction
- Sequence classification
- Our approach
- Library Creation
- Classification
- Target Environment
- Description
- Experiments and Results
- Conclusions and Future Works
1
4.
Motivation
- Opponent behavior Modelling / Classification
- (Environment soccer simulation domain)
2
5.
Introduction
- Behavior Classification
- Behavior as sequence of elements
- Sequence Classification
- Sequence
- set of elements ordered so that they can be
labelled with the positive integers
(Merriam-Webster Dictionary)
3
6Outline
- Motivation and Introduction
- Sequence classification
- Our approach
- Library Creation
- Classification
- Target Environment
- Description
- Experiments Results
- Conclusions and Future Works
4
7Sequence classification
- Given
- Classes c1, c2, cn
- Sequence E e1, e2, en
- Determine
- Which class ci ? C does the sequence E belong
to.
5
8.
Outline
- Motivation and Introduction
- Sequence classification
- Our approach
- Library Creation
- Classification
- Target Environment
- Description
- Experiments Results
- Conclusions and Future Works
6
9.
Our approach
pwd fs fg
vi man ls
finger more ls ...
vi more ls
SEQUENCE CLASS Classification Result
Sequence 1 Class 1
Sequence 2 Class 2
Sequence n Class n
Sequence to classify
Compare_Patterns
On-Line Sequence Classification
Compare_Patterns
Compare_Patterns
Pattern Library
Library Creation
Classification
7
10Outline
- Motivation and Introduction
- Sequence classification
- Our approach
- Library Creation
- Classification
- Target Environment
- Description
- Experiments Results
- Conclusions and Future Works
8
11.
Library Creation
- Trie (retrieval) data structure
- Special search tree used for storing elements and
its prefixes. - Every node
- represents an element
- stores useful information (times appeared,)
9
12Library Creation - An example trie
pwd vi pwd vi pwd ls
- Sequence to insert initially in the trie
- pwd ? vi ? pwd ? vi ? pwd ? ls
Sequence
10
13Library Creation - An example trie
pwd vi pwd vi pwd ls
- Sequence to insert initially in the trie
- pwd ? vi ? pwd ? vi ? pwd ? ls
Sub-sequence length 3 pwd ? vi ? pwd ?
vi ? pwd ? ls Sub-sequences to insert in the
trie pwd ? vi ? pwd and vi ? pwd
? ls
Sequence
10
14Library Creation - An example trie
- Sub-sequences to insert in the trie
- pwd ? vi ? pwd and vi ? pwd ? ls
Root
11
15Library Creation - An example trie
- Sub-sequences to insert in the trie
- pwd ? vi ? pwd and vi ? pwd ? ls
Root
11
16Library Creation - An example trie
- Sub-sequences to insert in the trie
- pwd ? vi ? pwd and vi ? pwd ? ls
Root
11
17Library Creation - An example trie
- Sub-sequences to insert in the trie
- pwd ? vi ? pwd and vi ? pwd ? ls
Root
11
18Library Creation - An example trie
- Sub-sequences to insert in the trie
- pwd ? vi ? pwd and vi ? pwd ? ls
Root
11
19Library Creation - An example trie
- Sub-sequences to insert in the trie
- pwd ? vi ? pwd and vi ? pwd ? ls
Root
11
20Library Creation - An example trie
- Sub-sequences to insert in the trie
- pwd ? vi ? pwd and vi ? pwd ? ls
Root
11
21Library Creation - An example trie
- pwd ? vi ? pwd ? vi ? pwd ? ls
pwd vi pwd vi pwd ls
Root
11
22Library Creation - Evaluating Dependences
- Evaluate the relation/dependence between an
element and its prefix - Two approaches
- Frequency-based method.
- Statistical dependence method.
- Our approach Statistical Value used Chi-square
value. - This value is stored in every node of the trie
12
23.
Library Creation - Evaluating Dependences
(Rowi Total x Columnj Total)
Expected (Eij)
Grand Total
(Oij - Eij ) 2
r
k
X2 ? ?
Eij
i1
j1
2 x 2 Contingency Table
O11 How many times the current node/element is
followed by its prefix. O12 How many times the
current node/element is followed by a different
prefix. O21 How many times a different prefix
(of the same length) is followed by the same
node. O22 How many times a different prefix (of
the same length) is followed by a different node.
13
24.
Library Creation - Evaluating Dependences
Sequence Pattern Trie
Root
- A Sequence Pattern Trie is created for each
class.
14
25Outline
- Motivation and Introduction
- Sequence classification
- Our approach
- Library Creation
- Classification
- Target Environment
- Description
- Experiments Results
- Conclusions and Future Works
15
26.
Classification
pwd fs fg
vi man ls
finger more ls ...
vi more ls
Testing Trie
Sequence 1 Class 1
Sequence 2 Class 2
Sequence n Class n
Sequence to classify
ONLINE SEQUENCE CLASS
Compare_Patterns
Class Trie
On-Line Sequence Classification
Compare_Patterns
Compare_Patterns
Pattern Library
Classification
Library Creation
16
27.
Classification Comparing Process
Class Trie
Testing Trie
Root
Root
ls 2
pwd 3
vi 2
pwd 3
vi 2
vi 1 5.1
who 2 3.5
vi 1 7.1
pwd 2 1.5
who 1 4.3
pwd 1 7.3
ls 1 0.3
If the node (and its prefix) are in both Tries
If ( abs(chi2TestingTrie chi2ClassTrie)
ThresholdValue ) Similarity
between both tries. Result ?
ElementTestingTrie, PrefixTestingTrie,
Chi2TestingTrie
17
28.
Classification Comparing Process
Class Trie
Testing Trie
Root
Root
ls 2
pwd 3
vi 2
pwd 3
vi 2
vi 1 5.1
who 2 3.5
vi 1 7.1
pwd 2 1.5
who 1 4.3
pwd 1 7.3
ls 1 0.3
If the node (and its prefix) are in both Tries
If (abs(5.1 7.1) ThresholdValue )
Similarity between both tries. Result
? vi , pwd, 5.1
17
29.
Classification Comparing Process
Class Trie
Testing Trie
Root
Root
ls 2
pwd 3
vi 2
pwd 3
vi 2
vi 1 5.1
who 2 3.5
vi 1 7.1
pwd 2 1.5
who 1 4.3
pwd 1 7.3
ls 1 0.3
If the node (and its prefix) are only in the
Testing Trie Difference between both
tries. Result ? ElementTestingTrie,
PrefixTestingTrie, (Chi2TestingTrie -1)
17
30.
Classification Comparing Process
Class Trie
Testing Trie
Root
Root
ls 2
pwd 3
vi 2
pwd 3
vi 2
vi 1 5.1
who 2 3.5
vi 1 7.1
pwd 2 1.5
who 1 4.3
pwd 1 7.3
ls 1 0.3
If the node (and its prefix) are only in the
Testing Trie Difference between both
tries. Result ? who, pwd ? vi, (-4.3)
17
31.
Classification Comparing Process
Class Trie
Testing Trie
Root
pwd 3
vi 2
vi 1 5.1
who 2 3.5
who 1 4.3
If the node (and its prefix) are only in the
Testing Trie Difference between both
tries. Result ? who, vi, (-3.5)
17
32.
Classification Comparing Process
- Result
- Element1, Prefix1, Value1
- Element2, Prefix2, Value2
- Element3, Prefix3, Value3
- Element4, Prefix4, Value4
-
- Elementn, Prefixn, Valuen
Each comparison (ClassTrie, TestingTrie) A
comparision value
Comparison Value
18
33.
Classification Comparing Process
- Result
- vi, pwd, 5.1
- who, pwd ?vi, - 4.3
- who, pwd, - 3.5
- 2.7
Comparison Value
18
34.
Classification
pwd fs fg
vi man ls
finger more ls ...
vi more ls
Sequence 1 Class 1
Sequence 2 Class 2
Sequence n Class n
Sequence to classify
ONLINE SEQUENCE CLASS
Compare_Patterns
comparision value
On-Line Sequence Classification
Compare_Patterns
comparision value
Compare_Patterns
Pattern Library
comparision value
Library Creation
Classification
19
35.
Classification
pwd fs fg
vi man ls
finger more ls ...
vi more ls
Sequence 1 Class 1
Sequence 2 Class 2
Sequence n Class n
Sequence to classify
ONLINE SEQUENCE CLASS
Compare_Patterns
comparision value
On-Line Sequence Classification
Compare_Patterns
comparision value
Greatest Comparison Value
Compare_Patterns
Pattern Library
comparision value
Library Creation
Classification
20
36Outline
- Motivation and Introduction
- Sequence classification
- Our approach
- Library Creation
- Classification
- Target Environment
- Description
- Experiments Results
- Conclusions and Future Works
21
37Environment UNIX command line sequences
- Command histories of 9 UNIX computer users at
over 2 years - UCI Repository of ML Database Newman C., Hettich
S., Merz, C. (1998)
Start session 1 cd /private/docs ls
-laF more cat foo.txt bar.txt zorch.txt gt
a.txt exit End session 1 Start session
2 cd /games/ xquake fg
SOF cd lt1gt ls -laF more cat lt3gt gt lt1gt exit
EOF
one "file name" argument
three "file name" arguments
one "file name" argument
22
38Outline
- Motivation and Introduction
- Sequence classification
- Our approach
- Library Creation
- Classification
- Target Environment
- Description
- Experiments Results
- Conclusions and Future Works
23
39.
Experiments UNIX command line sequences
- 9 files (users) containing from about 10.000
to 60.000 commands each. -
- 1. Extracting Patterns A trie is created for
each user ? Pattern Library
24
40.
Experiments UNIX command line sequences
- 9 files (users) containing from about 10.000 to
60.000 commands each. -
- 1. Extracting Patterns A trie is created for
each user ? Pattern Library
2. Classification Algorithm Sequence to
classify (sequences of very different sizes) ?
? Classified in the class with the greatest
value (result value).
24
41.
Experiments UNIX command line sequences
- 9 files (users) containing from about 10.000 to
60.000 commands each. -
- 1. Extracting Patterns A trie is created for
each user ? Pattern Library
2. Classification Algorithm Sequence to
classify (sequences of very different sizes) ?
? Classified in the class with the greatest
value (result value).
- 3. Evaluating the result
- Calculate
- difference between the greatest value and the
second greatest value () - difference between the real classification
value and the greatest value (-) - (The greater the difference, the better the
classification)
24
42.
Results UNIX command line sequences
Unix Commands Classification User 6
Classification Value
average of 25 simulation results
Length of the Sequence to classify
25
43.
Results UNIX command line sequences
Minimum length for classifying a UNIX Computer
User correctly
Length of the Sequence to classify
Unix Computer User (Class)
26
44Outline
- Motivation and Introduction
- Sequence classification
- Our approach
- Library Creation
- Classification
- Target Environment
- Description
- Experiments Results
- Conclusions and Future Works
27
45Conclusions
- A threshold must be found
- Long time for creating the tries
- Results depend on the length of the sub-sequences
- used to create the trie
28
46Conclusions
- Effective method to classify UNIX users
- If a behavior can be represented by sequences,
- the proposed classification method can be
used - If a new class is added, only its trie must be
created - (the others are not modified)
- This method could be used for other tasks
- sequence prediction, sequence clustering
- RoboCup Coach 2006 Competition (succesfully
results)
29
47Future Works
- Pattern Library ? One Trie for all classes
(users). - Classification method without threshold value
- Analysis comparing our approach to others (HMMs)
-
30
48Sequence Classification Using Statistical Pattern
Recognition
.
Thank you!
- José Antonio Iglesias, Agapito Ledezma,
- and Araceli Sanchis
- Computer Science Department
- Universidad Carlos III de Madrid
- Avda. de la Universidad, 30. 28911 Leganés, Spain
- jiglesia, ledezma, masm_at_inf.uc3m.es
49Sequence Classification Using Statistical Pattern
Recognition
.
Questions
- José Antonio Iglesias, Agapito Ledezma,
- and Araceli Sanchis
- Computer Science Department
- Universidad Carlos III de Madrid
- Avda. de la Universidad, 30. 28911 Leganés, Spain
- jiglesia, ledezma, masm_at_inf.uc3m.es
50Sequence Classification Using Statistical Pattern
Recognition
.
Related to Questions...
- José Antonio Iglesias, Agapito Ledezma,
- and Araceli Sanchis
- Computer Science Department
- Universidad Carlos III de Madrid
- Avda. de la Universidad, 30. 28911 Leganés, Spain
- jiglesia, ledezma, masm_at_inf.uc3m.es
29
51Experiments UNIX command line sequences
SOF ls -laF More cd lt4gt
SOF cd lt1gt ls -laF more cat lt3gt gt
SOF ls lt1gt exit lt1gt ls -laF xquake fg
SOF vi lt1gt vi lt3gt ls -la cat lt2gt
ClassUser1
Test User
USER 0 Class0
USER 1 Class1
USER 8 Class8
User On-Line vs Class User0 ? 21 User On-Line vs
Class User1 ? 49 User On-Line vs Class User2 ?
9 User On-Line vs Class User3 ? 3 User On-Line vs
Class User4 ? 12 User On-Line vs Class User5 ?
29 User On-Line vs Class User6 ? -1 User On-Line
vs Class User7 ? 0 User On-Line vs Class User8 ?
11
User On-Line ? Class c
Sequence Classification
Pattern Library
52Experiments UNIX command line sequences
SOF ls -laF More cd lt4gt
SOF cd lt1gt ls -laF more cat lt3gt gt
SOF ls lt1gt exit lt1gt ls -laF xquake fg
SOF vi lt1gt vi lt3gt ls -la cat lt2gt
ClassUser1
Test User
USER 0 Class0
USER 1 Class1
USER 8 Class8
User On-Line vs Class User0 ? 21 User On-Line vs
Class User1 ? 49 User On-Line vs Class User2 ?
9 User On-Line vs Class User3 ? 3 User On-Line vs
Class User4 ? 12 User On-Line vs Class User5 ?
29 User On-Line vs Class User6 ? -1 User On-Line
vs Class User7 ? 0 User On-Line vs Class User8 ?
11
User On-Line ? Class c
Sequence Classification
Pattern Library
53Experiments UNIX command line sequences
SOF ls -laF More cd lt4gt
SOF cd lt1gt ls -laF more cat lt3gt gt
SOF ls lt1gt exit lt1gt ls -laF xquake fg
SOF vi lt1gt vi lt3gt ls -la cat lt2gt
ClassUser1
Correctly Classified
Test User
USER 0 Class0
USER 1 Class1
USER 8 Class8
User On-Line vs Class User0 ? 21 User On-Line vs
Class User1 ? 49 User On-Line vs Class User2 ?
9 User On-Line vs Class User3 ? 3 User On-Line vs
Class User4 ? 12 User On-Line vs Class User5 ?
29 User On-Line vs Class User6 ? -1 User On-Line
vs Class User7 ? 0 User On-Line vs Class User8 ?
11
User On-Line ? Class c
Sequence Classification
Pattern Library
54Experiments UNIX command line sequences
SOF ls -laF More cd lt4gt
SOF cd lt1gt ls -laF more cat lt3gt gt
SOF ls lt1gt exit lt1gt ls -laF xquake fg
SOF vi lt1gt vi lt3gt ls -la cat lt2gt
ClassUser1
Correctly Classified
Test User
USER 0 Class0
USER 1 Class1
USER 8 Class8
User On-Line vs Class User0 ? 21 User On-Line vs
Class User1 ? 49 User On-Line vs Class User2 ?
9 User On-Line vs Class User3 ? 3 User On-Line vs
Class User4 ? 12 User On-Line vs Class User5 ?
29 User On-Line vs Class User6 ? -1 User On-Line
vs Class User7 ? 0 User On-Line vs Class User8 ?
11
20
User On-Line ? Class c
Sequence Classification
Pattern Library
55Experiments UNIX command line sequences
SOF ls -laF More cd lt4gt
SOF cd lt1gt ls -laF more cat lt3gt gt
SOF ls lt1gt exit lt1gt ls -laF xquake fg
SOF vi lt1gt vi lt3gt ls -la cat lt2gt
ClassUser2
Test User
USER 0 Class0
USER 1 Class1
USER 8 Class8
User On-Line vs Class User0 ? 21 User On-Line vs
Class User1 ? 49 User On-Line vs Class User2 ?
9 User On-Line vs Class User3 ? 3 User On-Line vs
Class User4 ? 12 User On-Line vs Class User5 ?
29 User On-Line vs Class User6 ? -1 User On-Line
vs Class User7 ? 0 User On-Line vs Class User8 ?
11
User On-Line ? Class c
Sequence Classification
Pattern Library
56Experiments UNIX command line sequences
SOF ls -laF More cd lt4gt
SOF cd lt1gt ls -laF more cat lt3gt gt
SOF ls lt1gt exit lt1gt ls -laF xquake fg
SOF vi lt1gt vi lt3gt ls -la cat lt2gt
ClassUser2
Test User
USER 0 Class0
USER 1 Class1
USER 8 Class8
User On-Line vs Class User0 ? 21 User On-Line vs
Class User1 ? 49 User On-Line vs Class User2 ?
9 User On-Line vs Class User3 ? 3 User On-Line vs
Class User4 ? 12 User On-Line vs Class User5 ?
29 User On-Line vs Class User6 ? -1 User On-Line
vs Class User7 ? 0 User On-Line vs Class User8 ?
11
User On-Line ? Class c
Sequence Classification
Pattern Library
57Experiments UNIX command line sequences
SOF ls -laF More cd lt4gt
SOF cd lt1gt ls -laF more cat lt3gt gt
SOF ls lt1gt exit lt1gt ls -laF xquake fg
SOF vi lt1gt vi lt3gt ls -la cat lt2gt
ClassUser2
NO Correctly Classified
Test User
USER 0 Class0
USER 1 Class1
USER 8 Class8
User On-Line vs Class User0 ? 21 User On-Line vs
Class User1 ? 49 User On-Line vs Class User2 ?
9 User On-Line vs Class User3 ? 3 User On-Line vs
Class User4 ? 12 User On-Line vs Class User5 ?
29 User On-Line vs Class User6 ? -1 User On-Line
vs Class User7 ? 0 User On-Line vs Class User8 ?
11
User On-Line ? Class c
Sequence Classification
Pattern Library
58Experiments UNIX command line sequences
SOF ls -laF More cd lt4gt
SOF cd lt1gt ls -laF more cat lt3gt gt
SOF ls lt1gt exit lt1gt ls -laF xquake fg
SOF vi lt1gt vi lt3gt ls -la cat lt2gt
ClassUser2
NO Correctly Classified
Test User
USER 0 Class0
USER 1 Class1
USER 8 Class8
User On-Line vs Class User0 ? 21 User On-Line vs
Class User1 ? 49 User On-Line vs Class User2 ?
9 User On-Line vs Class User3 ? 3 User On-Line vs
Class User4 ? 12 User On-Line vs Class User5 ?
29 User On-Line vs Class User6 ? -1 User On-Line
vs Class User7 ? 0 User On-Line vs Class User8 ?
11
- 40
User On-Line ? Class c
Sequence Classification
Pattern Library