Title: GUI implementation for Supervised and Unsupervised
1GUI implementation for Supervised and
Unsupervised SUBDUE System
2Introduction
- Data Mining is a very vast field, which is
rapidly developing today. It is the use of
artificial intelligence to find common,
interesting, or previously unknown patterns in
large databases. - One method for discovering knowledge in
structural data is the identification of common
substructures (concepts represented as graphs)
within the data.
2
3Introduction (Contd..)
- The SUBDUE system, developed by Cook and Holder
Cook and Holder, 1999 performs data mining on
databases represented as graphs, i.e. it
discovers interesting substructures in structural
data. - This project deals with the conversion of a
textual representation of the data into a
graphical visualization of the data The input
data given to the program is the outputs provided
by the existing SUBDUE.
3
4The Subdue System
- The SUBDUE system is a data-mining tool that
discovers interesting substructures in structural
data - By compressing previously discovered
substructures in the data multiple passes of
SUBDUE produce a hierarchical description of the
structural regularities in the data.
4
5Unsupervised Subdue
Example of the input graph to the unsupervised
version of the subdue and the discovered
substructure.
5
6Unsupervised (Contd..)
- The substructure discovery algorithm used by
SUBDUE is beam search. A substructure consists of
a definition and a set of instances. The
substructure definition is a connected set of
vertices and edges that define a SUBGRAPH within
G. An instance of a substructure is a SUBGRAPH of
G that matches, graph theoretically, to the
definition. The algorithm for the discovery is
given below.
6
7Unsupervised Contd.)
- SUBDUE (Graph, BeamWidth, MaxBest, MaxSubSize,
Limit) - ParentList
- ChildList
- BestList
- ProcessedSubs0
- Create a substructure from each unique vertex
label and its single-vertex instances insert the
resulting substructures in ParentList - While ProcessedSubs lt Limit and ParentList is
not empty do - While ParentList is not empty do
- Parent RemoveHead (ParentList)
- Extend each instance of Parent in all possible
ways - Group the extended instances into Child
substructures - For each Child do
- If SizeOf (Child) lt MaxSubSize then
- Evaluate the Child
7
8Unsupervised Contd.)
Insert Child in ChildList in order by value If
Length (ChildList)gtBeamWidth then Destroy the
substructure at the end of the ChildList Processed
Subs ProcessedSubs 1 Insert Parent in
BestList in order by value If Length (BestList) gt
MaxBest then Destroy the substructure at the end
of the BestList Switch ParentList and
ChildList return BestList
8
9Supervised
- SUBDUE performs supervised graph based relational
concept learning. The SUBDUE concept learner
accepts both a positive graph and a negative
graph , and evaluates substructures based on
their compression of the positive graph and lack
of compression on the negative graph.
9
10Supervised Contd.
10
11Supervised (Contd.)
- In the figure above each object has a shape and
is related to other objects using the binary
relation on and shape. The discovered
substructure is as shown in the figure. As seen
above the SUBDUE discovered substructure is from
the positive example and not from the negative
example. For this example the best substructure,
which gives the maximum compression, is triangle
on a square.
11
12Subdue - Chemical Compounds
- A DNA sequence can be represented as a very
simple linear graph, and higher-level
relationships between different parts of a
sequence can be mapped to additional edges in
graph. - SUBDUE system discovers patterns in the input
graph in polynomial time. - SUBDUE system is capable of discovering known
patterns in the DNA sequence of yeast,as well as
patterns in yeast DNA that are known to be
important in other organism, but which have not
yet been shown to play a role in yeast.
12
13Subdue - Chemical Compounds
- A DNA sequence can be represented as a very
simple linear graph, and higher-level
relationships between different parts of a
sequence can be mapped to additional edges in
graph. - SUBDUE system discovers patterns in the input
graph in polynomial time. - SUBDUE system is capable of discovering known
patterns in the DNA sequence of yeast,as well as
patterns in yeast DNA that are known to be
important in other organism, but which have not
yet been shown to play a role in yeast.
13
14Subdue - Chemical Compounds
- The figure shown above is the backbone
representation which gives more meaningful graphs
then the linear representation. - This representation separated the base names (A,
C, T, G) from the vertices representing
themselves. - The backbone representation mimics the actual
chemical structure of the DNA molecule, in which
the DNA bases are connected by deoxyribose sugars
to a linear phosphate backbone.
14
15GUI Design
- Requirements
- The requirements of the GUI are as follows.
- File Dialog Boxes should be added for better user
access to the input files. - The entire visual representation of the graphs
needs to be shown on the screen. Sometimes these
representations exceed the length of the screen.
To accommodate these large graphs, scrollbars
need to be incorporated into the design.
15
16GUI Design
- User interfaces must be provided so that the user
can interact with the GUI for displaying the
results of each and every iteration. So a button
called Next Iteration, which activates the
display of substructures on the screen, needs to
be incorporated. - A Button called the Compress button should be
provided to the GUI. This button enables the user
to see the compressed graph.
16
17GUI Design
- For the supervised version of SUBDUE, both
negative and positive graphs need to be
displayed. - The vertices of the graph should display their
labels inside the vertices. - Since directed edges are used, the arrows with
appropriate directions should be displayed. - The language to be used for implementation should
be portable and be able to run from a browser. It
should also have good GUI components. So JDK 1.2
was used to implement the program.
17
18Implemenation
- Unsupervised SUBDUE
- Unsupervised SUBDUE GUI requires two input files.
A position file which determines the graph
position and the output file from the SUBDUE
which is parsed as per the requirements of the
GUI by the conversion program. - The driver class of the applet is the
unsupervised class. This class initializes the
applet. When user clicks the next iteration
button canvas2 class is invoked and best
substructures found in that iteration is
displayed. These substructures are arranged by
their MDL value.
18
19Implemenation
- Unsupervised SUBDUE
- When user clicks the compress button then canvas1
class is invoked. This class compresses the input
graph by replacing the instances of the best
substructure of the iteration by single vertices.
The compressed graph will be further compressed
when the using the results of that
iteration.when-compressed button is clicked. The
flow diagrams are as shown below.
19
20Implemenation
- Supervised GUI Implemenation
- Supervised SUBDUE GUI requires three input
files.Two position files which determines the
positive graph position and the negative graph
position and the output file from the SUBDUE
which is parsed as per the requirements of the
GUI by the conversion program. - The driver class of the applet is the supervised
class. This class initializes the applet. When
user clicks the next iteration button canvas2
class is invoked and best substructures found in
that iteration is displayed. These substructures
are arranged by their MDL value.
20
21Implemenation
- Supervised GUI Implemenation
- When user clicks the compress button then canvas1
class is invoked. This class compresses the input
graph by replacing the instances of the best
substructure of the iteration by single vertices.
The compressed graph will be further compressed
when the using the results of that
iteration.when-compressed button is clicked. The
flow diagrams are as shown below.
21
22Implemenation
- Parser for the GUI input.
- The Conversion program parses the output from the
subdue to make it compatible to the GUI program.
This can parse both supervised and unsupervised
GUI.The conversion program takes in longer
phrases and replaces it by shorter ones It
eliminates blank lines and arranges the output in
the format required for the GUI input. - Each line starts with a small phrase for E.g.
It(iteration number ), C(for compression),
v(vertices),e(edges),Val(Value), It(instances).
22
23Implementation
- Parser for the GUI input.
- The phrase Best substructures indicates the
starting of the iteration and the phrase Graph
is compressed using best substructure. Indicates
a compressed graph information
23
24Demo Of the Project
24