Views: Fundamental Building Blocks in the Process of Knowledge Discovery PowerPoint PPT Presentation

presentation player overlay
1 / 26
About This Presentation
Transcript and Presenter's Notes

Title: Views: Fundamental Building Blocks in the Process of Knowledge Discovery


1
Views Fundamental Building Blocks in the Process
of Knowledge Discovery
  • Hideo Bannai1 , Yoshinori Tamada2
  • Osamu Maruyama3, Kenta Nakai1, Satoru Miyano1
  • 1 Human Genome Center, Inst. of
    Medical Science,
    Univ. of Tokyo
  • 2 Dept. of Mathematical Sciences,
    Tokai Univ.
  • 3 Faculty of Mathematics, Kyushu
    Univ.

2
Motivation
  • Speed up the computational knowledge discovery
    process as a whole

3
The Knowledge Discovery Process
Evaluation/ Interpretation
Fayyad et al. 1996
  • The keys to success
  • Develop a good algorithm for the problem
  • Find or generate appropriate attributes
  • Allow human intervention

Data Mining
Transformation
Patterns
Preprocessing
Transformed Data
Selection
Preprocessed Data
Data
Target Data
4
Contents
  • Basic Concepts
  • Entity, View, View Operation, View Design
  • Example discovery process
  • Characterization of sub-cellular localization
    signals

5
Basic Concepts - Entity
  • An entity is any object.
  • Example
  • The set E of S. cerevisiae gene names. Each gene
    name is an entity.
  • YAL069W, YAL068C, YLR188W,

6
Basic Concepts - View
  • A view is a function over entities
  • v E ? R call R the range set
  • Example
  • View over the set of S. cerevisiae genes names

atgatcgtaaataa
atgcatcgtaaatc
atgattgtaagaat
7
Basic Concepts - View Operation
  • A view operator is a function that creating new
    views from existing views.
  • A function ? R?R over range sets induces a
    view operator ?v?v
  • v E ? R, ?R?R
  • ?v ? ?v v E ? R ? R
  • ?v(e) ?(v(e))

8
Basic Concepts - View Operation
  • Example codon translation

MIVNNTH
MHRKSLRR
MIVRMIRL
9
Basic Concepts - View Operation
  • Example String Matching

Parameter patternatcgt
true
true
false
10
Basic Concepts - View Operation
  • Hypothesis Generation is a view operation
    (Hypothesis view)
  • Let E entity set, V views available
  • Supervised Learning (v target view)
  • SLearner E, V, v ? v where for most
    e?E, v(e) v(e)
  • Unsupervised Learning
  • ULearnerE, V ? v

11
Basic Concepts - View Operation
  • Example ID3 algorithm and Decision Tree View

ID3
POS
POS
NEG
NEG
POS
POS
12
View Design
  • Views and view operators can be combined to
    create new views.
  • We call the structure of this combination, the
    view design
  • Careful design of views by experts is an
    interface for human intervention

13
The Knowledge Discovery Process
  • Data Preparation Transformation of entity set
    using views
  • Data Mining Generation of new views from
    existing views and entities
  • Interpretation/Evaluation Evaluation of the
    generated views
  • Knowledge Consolidation Using newly created
    views, for the next discovery task

14
Sub-cellular Localization Signals
  • Proteins are carried to specific locations after
    synthesized in the cytosol.
  • A short amino acid sequence holds the information
    (localization signal)
  • Signals at the N-terminal
  • signal peptide (SP)
  • mitochondrial targeting peptide (mTP)
  • chloroplast transit peptide (cTP)

15
Previous Work
  • PSORT 1992 Nakai, Kanehisa
  • An expert system to predict localization sites
  • Looks at the amino acid frequency of first 20
    residues -gt linear discrimination
  • MitoProt 1996 Claros?Vincens
  • Linear discrimination using 47 numerical
    attributes trying to model various knowledge
    about the signals (Mitochondrial only)
  • TargetP 2000 Emanuelsson et al
  • Neural network system (Best predictor so far)

16
Our Approach
  • Find simple interpretable rules, which are still
    accurate
  • Design views based on discussion with an expert
  • Data
  • 940 sequences of SP(269), mTP(368), cTP(141),
    Other(162)

17
View Design for Experiments
View Design 1 (binary classifier)
  • Can we find structural characteristics?

MLRSAVRLAGKDVRFGEDA
(0, 15)
MLRSAVRLAGKDVRF
001000100000010
R?1 Other?0
True
0100010
18
View Design for Experiments
View Design 2 (binary classifier)
  • What about overall composition?

MLRSAVRLAGKDVRFGEDA
(5, 10)
VRLAGKDVRF
0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0
Net Charge
0.1
False
0.8
19
View Design for Experiments
View Design 3 (binary classifier)
View Design 1
View Design 2
Logical And
20
View Design for Experiments
  • Decision list of View Design 2 and 3
  • Hypothesis generation
  • DListE, view2, view3, v ?

P1
P2
P3
sequence
SP? View Design 2
Other? View Design 3
mTP? View Design 3
no
no
no
no
cTP
yes
yes
yes
SP
Other
mTP
21
Results
  • 5-fold cross validation
  • Accuracy was close to TargetP
  • Our view
    TargetP
  • SP 237/269 88.1 245/269 91.1
  • mTP 304/368 82.6 300/368 81.5
  • cTP 112/141 79.4 120/141 85.1
  • Other 141/162 87.0 137/162 84.0

22
Results
  • Other vs (cTP mTP)

View Design 1
View Design 2
Logical And
23
Results
  • The rules discovered were interpreted by an
    expert
  • We were able to extract interpretable knowledge
    about the signals.
  • Hydropathy of SP
  • A pattern capturing the amphiphilic a-helix of
    cTP and mTP
  • The expert was surprised that such simple rules
    could predict so accurately

24
Hypothesis Creator
  • Library implementing views, and view operators
    http//www.HypothesisCreator.net/

25
HC Core Team
  • Yoshinori Tamada
  • Osamu Maruyama
  • Hideo Bannai
  • Satoru Miyano

26
Collaborators
  • Takao Goto
  • Hirotaka Katou
  • Kentaro Kawamoto
  • Hideo Miyake
  • Kaori Samuraki

27
Alphabet Indexing
  • A mapping from one character to another.
  • It is generally used to classify an alphabet, to
    a smaller alphabet.
  • Example
  • K,R,D,E ? 1
  • ACFGHILMNPQSTVWY ? 2
  • MALRAVRSVRA ? 22212212212

28
AAindex Database
  • An entry in the AAindex database is
  • a mapping from an amino acid to a numeric value,
    representing biochemical properties of the amino
    acid.
  • Example
Write a Comment
User Comments (0)
About PowerShow.com