GUI implementation for Supervised and Unsupervised - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

GUI implementation for Supervised and Unsupervised

Description:

The substructure discovery algorithm used by SUBDUE is beam search. A substructure consists of a definition and a set of instances. ... – PowerPoint PPT presentation

Number of Views:69
Avg rating:3.0/5.0
Slides: 25
Provided by: pxsre
Category:

less

Transcript and Presenter's Notes

Title: GUI implementation for Supervised and Unsupervised


1
GUI implementation for Supervised and
Unsupervised SUBDUE System
2
Introduction
  • Data Mining is a very vast field, which is
    rapidly developing today. It is the use of
    artificial intelligence to find common,
    interesting, or previously unknown patterns in
    large databases.
  • One method for discovering knowledge in
    structural data is the identification of common
    substructures (concepts represented as graphs)
    within the data.

2
3
Introduction (Contd..)
  • The SUBDUE system, developed by Cook and Holder
    Cook and Holder, 1999 performs data mining on
    databases represented as graphs, i.e. it
    discovers interesting substructures in structural
    data.
  • This project deals with the conversion of a
    textual representation of the data into a
    graphical visualization of the data The input
    data given to the program is the outputs provided
    by the existing SUBDUE.

3
4
The Subdue System
  • The SUBDUE system is a data-mining tool that
    discovers interesting substructures in structural
    data
  • By compressing previously discovered
    substructures in the data multiple passes of
    SUBDUE produce a hierarchical description of the
    structural regularities in the data.

4
5
Unsupervised Subdue
Example of the input graph to the unsupervised
version of the subdue and the discovered
substructure.
5
6
Unsupervised (Contd..)
  • The substructure discovery algorithm used by
    SUBDUE is beam search. A substructure consists of
    a definition and a set of instances. The
    substructure definition is a connected set of
    vertices and edges that define a SUBGRAPH within
    G. An instance of a substructure is a SUBGRAPH of
    G that matches, graph theoretically, to the
    definition. The algorithm for the discovery is
    given below.

6
7
Unsupervised Contd.)
  • SUBDUE (Graph, BeamWidth, MaxBest, MaxSubSize,
    Limit)
  • ParentList
  • ChildList
  • BestList
  • ProcessedSubs0
  • Create a substructure from each unique vertex
    label and its single-vertex instances insert the
    resulting substructures in ParentList
  • While ProcessedSubs lt Limit and ParentList is
    not empty do
  • While ParentList is not empty do
  • Parent RemoveHead (ParentList)
  • Extend each instance of Parent in all possible
    ways
  • Group the extended instances into Child
    substructures
  • For each Child do
  • If SizeOf (Child) lt MaxSubSize then
  • Evaluate the Child

7
8
Unsupervised Contd.)
Insert Child in ChildList in order by value If
Length (ChildList)gtBeamWidth then Destroy the
substructure at the end of the ChildList Processed
Subs ProcessedSubs 1 Insert Parent in
BestList in order by value If Length (BestList) gt
MaxBest then Destroy the substructure at the end
of the BestList Switch ParentList and
ChildList return BestList
8
9
Supervised
  • SUBDUE performs supervised graph based relational
    concept learning. The SUBDUE concept learner
    accepts both a positive graph and a negative
    graph , and evaluates substructures based on
    their compression of the positive graph and lack
    of compression on the negative graph.

9
10
Supervised Contd.

10
11
Supervised (Contd.)
  • In the figure above each object has a shape and
    is related to other objects using the binary
    relation on and shape. The discovered
    substructure is as shown in the figure. As seen
    above the SUBDUE discovered substructure is from
    the positive example and not from the negative
    example. For this example the best substructure,
    which gives the maximum compression, is triangle
    on a square.

11
12
Subdue - Chemical Compounds
  • A DNA sequence can be represented as a very
    simple linear graph, and higher-level
    relationships between different parts of a
    sequence can be mapped to additional edges in
    graph.
  • SUBDUE system discovers patterns in the input
    graph in polynomial time.
  • SUBDUE system is capable of discovering known
    patterns in the DNA sequence of yeast,as well as
    patterns in yeast DNA that are known to be
    important in other organism, but which have not
    yet been shown to play a role in yeast.

12
13
Subdue - Chemical Compounds
  • A DNA sequence can be represented as a very
    simple linear graph, and higher-level
    relationships between different parts of a
    sequence can be mapped to additional edges in
    graph.
  • SUBDUE system discovers patterns in the input
    graph in polynomial time.
  • SUBDUE system is capable of discovering known
    patterns in the DNA sequence of yeast,as well as
    patterns in yeast DNA that are known to be
    important in other organism, but which have not
    yet been shown to play a role in yeast.

13
14
Subdue - Chemical Compounds
  • The figure shown above is the backbone
    representation which gives more meaningful graphs
    then the linear representation.
  • This representation separated the base names (A,
    C, T, G) from the vertices representing
    themselves.
  • The backbone representation mimics the actual
    chemical structure of the DNA molecule, in which
    the DNA bases are connected by deoxyribose sugars
    to a linear phosphate backbone.

14
15
GUI Design
  • Requirements
  • The requirements of the GUI are as follows.
  • File Dialog Boxes should be added for better user
    access to the input files.
  • The entire visual representation of the graphs
    needs to be shown on the screen. Sometimes these
    representations exceed the length of the screen.
    To accommodate these large graphs, scrollbars
    need to be incorporated into the design.

15
16
GUI Design
  • User interfaces must be provided so that the user
    can interact with the GUI for displaying the
    results of each and every iteration. So a button
    called Next Iteration, which activates the
    display of substructures on the screen, needs to
    be incorporated.
  • A Button called the Compress button should be
    provided to the GUI. This button enables the user
    to see the compressed graph.

16
17
GUI Design
  • For the supervised version of SUBDUE, both
    negative and positive graphs need to be
    displayed.
  • The vertices of the graph should display their
    labels inside the vertices.
  • Since directed edges are used, the arrows with
    appropriate directions should be displayed.
  • The language to be used for implementation should
    be portable and be able to run from a browser. It
    should also have good GUI components. So JDK 1.2
    was used to implement the program.

17
18
Implemenation
  • Unsupervised SUBDUE
  • Unsupervised SUBDUE GUI requires two input files.
    A position file which determines the graph
    position and the output file from the SUBDUE
    which is parsed as per the requirements of the
    GUI by the conversion program.
  • The driver class of the applet is the
    unsupervised class. This class initializes the
    applet. When user clicks the next iteration
    button canvas2 class is invoked and best
    substructures found in that iteration is
    displayed. These substructures are arranged by
    their MDL value.

18
19
Implemenation
  • Unsupervised SUBDUE
  • When user clicks the compress button then canvas1
    class is invoked. This class compresses the input
    graph by replacing the instances of the best
    substructure of the iteration by single vertices.
    The compressed graph will be further compressed
    when the using the results of that
    iteration.when-compressed button is clicked. The
    flow diagrams are as shown below.

19
20
Implemenation
  • Supervised GUI Implemenation
  • Supervised SUBDUE GUI requires three input
    files.Two position files which determines the
    positive graph position and the negative graph
    position and the output file from the SUBDUE
    which is parsed as per the requirements of the
    GUI by the conversion program.
  • The driver class of the applet is the supervised
    class. This class initializes the applet. When
    user clicks the next iteration button canvas2
    class is invoked and best substructures found in
    that iteration is displayed. These substructures
    are arranged by their MDL value.

20
21
Implemenation
  • Supervised GUI Implemenation
  • When user clicks the compress button then canvas1
    class is invoked. This class compresses the input
    graph by replacing the instances of the best
    substructure of the iteration by single vertices.
    The compressed graph will be further compressed
    when the using the results of that
    iteration.when-compressed button is clicked. The
    flow diagrams are as shown below.

21
22
Implemenation
  • Parser for the GUI input.
  • The Conversion program parses the output from the
    subdue to make it compatible to the GUI program.
    This can parse both supervised and unsupervised
    GUI.The conversion program takes in longer
    phrases and replaces it by shorter ones It
    eliminates blank lines and arranges the output in
    the format required for the GUI input.
  • Each line starts with a small phrase for E.g.
    It(iteration number ), C(for compression),
    v(vertices),e(edges),Val(Value), It(instances).

22
23
Implementation
  • Parser for the GUI input.
  • The phrase Best substructures indicates the
    starting of the iteration and the phrase Graph
    is compressed using best substructure. Indicates
    a compressed graph information

23
24
Demo Of the Project
24
Write a Comment
User Comments (0)
About PowerShow.com