Title: talk in taiwan, 27/8/2001
1From Informatics to Bioinformatics
Limsoon Wong Kent Ridge Digital Labs Singapore
2What is Bioinformatics?
3What are the Themes of Bioinformatics?
Bioinformatics Data Mgmt Knowledge
Discovery Data Mgmt Integration
Transformation Cleansing Knowledge Discovery
Statistics Algorithms Databases
4What are the Benefits of Bioinformatics?
- To the patient
- Better drug, better treatment
- To the pharma
- Save time, save cost, make more
- To the scientist
- Better science
5Data Integration
- A DOE impossible query
- For each gene on a given cytogenetic band, find
its non-human homologs.
6Data Integration Results
- sybase-add (nameGDB", ...)
- create view L from locus_cyto_location using
GDB - create view E from object_genbank_eref using GDB
- select
- accn g.genbank_ref, nonhuman-homologs
H - from
- L as c, E as g,
- (select u
- from g.genbank_ref.na-get-homolog-summary
as u - where not(u.title string-islike "Human")
andalso - not(u.title string-islike
"H.sapien")) as H - where
- c.chrom_num "22 andalso
- g.object_id c.locus_id andalso
- not (H )
- Using Kleisli
- Clear
- Succint
- Efficient
- Handles
- heterogeneity
- complexity
7Data Warehousing
(uid 6138971, title "Homo sapiens
adrenergic ...", accession "NM_001619",
organism "Homo sapiens", taxon 9606,
lineage "Eukaryota", "Metazoa", , seq
"CTCGGCCTCGGGCGCGGC...", feature
(name "source", continuous true,
position (accn "NM_001619",
start 0, end 3602,
negative false), anno
(anno_name "organism", descr "Homo
sapiens"), ), )
- Motivation
- efficiency
- availabilty
- denial of service
- data cleansing
- Requirements
- efficient to query
- easy to update.
- model data naturally
8Data Warehousing Results
- Relational DBMS is insufficient because it forces
us to fragment data into 3NF. - Kleisli turns flat relational DBMS into nested
relational DBMS. It can use flat relational DBMS
such as Sybase, Oracle, MySQL, etc. to be its
updatable complex object store. It can even use
all of these systems simultaneously!
! Log in oracle-cplobj-add (name "db", ...) !
Define table create table GP (uid "NUMBER",
detail "LONG") using db ! Populate table with
GenPept reports select uid x.uid, detail x
into GP from aa-get-seqfeat-general "PTP as
x using db ! Map GP to that table create view
GP from GP using db ! Run a queryto get title
of 131470 select x.detail.title from GP as
x where x.uid 131470
9Epitope Prediction
TRAP-559AA MNHLGNVKYLVIVFLIFFDLFLVNGRDVQNNIVDEIKYS
E EVCNDQVDLYLLMDCSGSIRRHNWVNHAVPLAMKLIQQLN LNDNAIH
LYVNVFSNNAKEIIRLHSDASKNKEKALIIIRS LLSTNLPYGRTNLTDA
LLQVRKHLNDRINRENANQLVVIL TDGIPDSIQDSLKESRKLSDRGVKI
AVFGIGQGINVAFNR FLVGCHPSDGKCNLYADSAWENVKNVIGPFMKAV
CVEVEK TASCGVWDEWSPCSVTCGKGTRSRKREILHEGCTSEIQEQ CE
EERCPPKWEPLDVPDEPEDDQPRPRGDNSSVQKPEENI IDNNPQEPSPN
PEEGKDENPNGFDLDENPENPPNPDIPEQ KPNIPEDSEKEVPSDVPKNP
EDDREENFDIPKKPENKHDN QNNLPNDKSDRNIPYSPLPPKVLDNERKQ
SDPQSQDNNGN RHVPNSEDRETRPHGRNNENRSYNRKYNDTPKHPEREE
HE KPDNNKKKGESDNKYKIAGGIAGGLALLACAGLAYKFVVP GAATPY
AGEPAPFDETLGEEDKDLDEPEQFRLPEENEWN
10Epitope Prediction Results
- Prediction by our ANN model for HLA-A11
- 29 predictions
- 22 epitopes
- 76 specificity
- Prediction by BIMAS matrix for HLA-A1101
Number of experimental
binders 19 (52.8) 5 (13.9)
12 (33.3)
Rank by BIMAS
11Gene Expression Analysis
- Clustering gene expression profiles
- Classifying gene expression profiles
- find stable differentially expressed genes
12Gene Expression Analysis Results
- The Discovery System
- Correlation test
- Voter selection
- Class prediction
13Protein Interaction Extraction
What are the protein-protein interaction
pathways from the latest reported discoveries?
14Protein Interaction Extraction Results
- Rule-based system for processing free texts in
scientific abstracts - Specialized in
- extracting protein names
- extracting protein-protein interactions
Jak1
15Transcription Start Prediction
16Transcription Start Prediction Results
17Medical Record Analysis
- Looking for patterns that are
- valid
- novel
- useful
- understandable
18Medical Record Analysis Results
- DeEPs, a novel emerging pattern method
- Beats C4.5, CBA, LB, NB, TAN in 21 out of 32 UCI
benchmarks - Works for gene expressions
19Behind the Scene
- Research
- Vladimir Bajic
- Vladimir Brusic
- Jinyan Li
- See-Kiong Ng
- Limsoon Wong
- Louxin Zhang
- Business
- Peter Saunders
- Industry Assignees
- Hao Han (gX)
- Rahul Despande (MC)
- Engineering
- Allen Chong
- Judice Koh
- SPT Krishnan
- Seng Hong Seah
- Guanglan Zhang
- Zhuo Zhang
- Students
- Huiqing Liu
- Song Zhu
- Kun Yu