Databases as Analytical Engines for Drug Discovery

About This Presentation

Title:

Databases as Analytical Engines for Drug Discovery

Description:

Databases as Analytical Engines for Drug Discovery Susie Stephens Principal Product Manager, Life Sciences Oracle Corporation susie.stephens_at_oracle.com – PowerPoint PPT presentation

Number of Views:89

Avg rating:3.0/5.0

Slides: 45

Provided by: Charlie145

Learn more at: https://homepage.cs.uri.edu

Category:

more less

Transcript and Presenter's Notes

Title: Databases as Analytical Engines for Drug Discovery

1
Databases as Analytical Engines for Drug Discovery

Susie StephensPrincipal Product Manager, Life
Sciences
Oracle Corporation
susie.stephens_at_oracle.com

2
Outline

Data Challenges
Case Studies
Summary

3
Access Distributed Data
External Sites
UltraSearch
Distributed query
MySQL
Flat files
Sybase
SRS
DBlinks
Transparent Gateway
Generic Connectivity
Transparent Gateway
External Table
4
Integrate a Variety of Data Types

CLOBs
XML
Text
Images
Video
Relational
Users Defined Objects

Nucleotide Sequences
Gene Expression Data
Papers
Cell Histology Images
Protein Folding Video
SwissProt
KEGG
Chemical Structures

XML
5
Manage Vast Quantities of Data

Partitioning
Oracle Data Guard
Real Application Clusters (RAC)
Automated Storage Management
Adaptive Instance Tuning
Automated Application and SQL Tuning
Automated Database Diagnostic Monitor (ADDM)
Scheduling

6
Collaborate Securely

Integrated communications
Single enterprise search
Flexible access
Fine grained access control
Auditing
Workflow
Personalized portal

7
Find Patterns and Insights

Oracle Data Mining
Find relationships clusters
Oracle Discoverer Oracle OLAP
Interactive query drill-down
Statistics
mean, stdev, median, correlations, linear
regression
Oracle Text
Cluster Classify documents of interest
Table Functions
Implement complex algorithms within the database

8
Outline

Data Challenges
Case Studies
Summary

9
Regular Expression Searches

A powerful method of describing both simple
complex patterns for searching manipulating
A multilingual regular expression support for SQL
PL/SQL string types
Follows POSIX style Regexp syntax
Support standard Regexp operators
Includes common extensions such as
case-insensitive matching, sub-expression
back-references, etc.
Compatible with popular Regexp implementations
like GNU, Perl, Awk

10
Case Study Retrieve Protein Data from SGD using
Regular Expressions
Case study courtesy of Prolexys Pharmaceuticals,
Inc.
11
HTTP Raw Data
lt/scriptgt lt/headgtltbodygtltbody bgcolor'FFFFFF'gt ltt
able cellpadding"2" width"100" cellspacing"0"
border"0"gtlttrgtlttd colspan"4"gtlthr width"100"
/gtlt/tdgtlt/trgtlttrgtlttd valign"middle"
align"right"gtlta href"http//www.yeastgenome.org/
"gtltimg alt"SGD" border"0" src"http//www.yeastg
enome.org/images/SGD-to.gif" /gtlt/agtlt/tdgtltth
valign"middle" nowrap"1"gtQuick Searchlt/thgtlttd
valign"middle" align"left"gtltform method"post"
action"http//db.yeastgenome.org/cgi-bin/SGD/sear
ch/quickSearch" enctype"application/x-www-form-ur
lencoded"gt ltinput type"text" name"query"
size"13" /gtltinput type"submit" name"Submit"
value"Submit" /gt lt/formgtlt/tdgtltth valign"middle"
align"left"gtlta href"http//www.yeastgenome.org/s
itemap.html"gtSite Maplt/agt lta href"http//www.ye
astgenome.org/HelpContents.shtml"gtHelplt/agt lta
href"http//www.yeastgenome.org/SearchContents.sh
tml"gtFull Searchlt/agt lta href"http//www.yeastge
nome.org/"gtHomelt/agtlt/thgtlt/trgtlttrgtlttd align"left"
colspan"4"gtlttable cellpadding"1" width"100"
cellspacing"0" border"0"gtlttr align"center"
bgcolor"navajowhite"gtlttdgtltfont size"-1"gtlta
href"http//www.yeastgenome.org/ComContents.shtml
"gtCommunity Infolt/agtlt/fontgtlt/tdgtlttdgtltfont
size"-1"gtlta href"http//www.yeastgenome.org/Subm
itContents.shtml"gtSubmit Datalt/agtlt/fontgtlt/tdgtlttdgtlt
font size"-1"gtlta href"http//seq.yeastgenome.org
/cgi-bin/SGD/nph-blast2sgd"gtBLASTlt/agtlt/fontgtlt/tdgtlt
tdgtltfont size"-1"gtlta href"http//seq.yeastgenome
.org/cgi-bin/SGD/web-primer"gtPrimerslt/agtlt/fontgtlt/t
dgtlttdgtltfont size"-1"gtlta href"http//seq.yeastgen
ome.org/cgi-bin/SGD/PATMATCH/nph-patmatch"gtPatMatc
hlt/agtlt/fontgtlt/tdgtlttdgtltfont size"-1"gtlta
href"http//db.yeastgenome.org/cgi-bin/SGD/seqToo
ls"gtGene/Seq Resourceslt/agtlt/fontgtlt/tdgtlttdgtltfont
size"-1"gtlta href"http//www.yeastgenome.org/Vl-y
east.shtml"gtVirtual Librarylt/agtlt/fontgtlt/tdgtlttdgtltfo
nt size"-1"gtlta href"http//db.yeastgenome.org/cg
i-bin/SGD/suggestion"gtContact SGDlt/agtlt/fontgtlt/tdgtlt
/trgtlt/tablegtlt/tdgtlt/trgtlttrgtlttd colspan"4"gtlthr
width"100" /gtlt/tdgtlt/trgtlt/tablegtlttable
cellpadding"0" width"100" cellspacing"0"
border"0"gtlttrgtlttd width"10"gtltbr /gtlt/tdgtlttd
valign"middle" align"center" width"80"gtlth1gtSeq
uence for a region of YDR099W/BMH2lt/h1gtlt/tdgtlttd
valign"middle" align"right" width"10"gtlt/tdgtlt/t
rgtlt/tablegtltp /gtltcentergtlta target"infowin"
href"http//db.yeastgenome.org/cgi-bin/SGD/sugges
tion"gtSend questions or suggestions to
SGDlt/agtlt/centergtltp /gtltp /gtltcentergtlta
target"infowin" href"http//seq.yeastgenome.org/
cgi-bin/SGD/nph-blast2sgd?nameYDR099Wampsuffix
prot"gtBLAST searchlt/agt lta target"infowin"
href"http//seq.yeastgenome.org/cgi-bin/SGD/nph-f
astasgd?nameYDR099Wampsuffixprot"gtFASTA
searchlt/agtlt/centergtltp /gtltcentergtlthr width"35"
/gtlt/centergtltp /gtltfont color"FF0000"gtltstronggtProte
in translation of the coding sequence.lt/stronggtlt/f
ontgtltp /gtltp /gtOther Formats Available lta
href"http//db.yeastgenome.org/cgi-bin/SGD/getSeq
?mappmapampseqYDR099Wampflankl0ampflankr
0amprev"gtGCGlt/agtltpregtgtYDR099W Chr 4
MSQTREDSVYLAKLAEQAERYEEMVENMKAVASSGQELSVEERNLLSVA
YKNVIGARRAS WRIVSSIEQKEESKEKSEHQVELIRSYRSKIETELTKI
SDDILSVLDSHLIPSATTGESK VFYYKMKGDYHRYLAEFSSGDAREKAT
NSSLEAYKTASEIATTELPPTHPIRLGLALNFS VFYYEIQNSPDKACHL
AKQAFDDAIAELDTLSEESYKDSTLIMQLLRDNLTLWTSDISES GQEDQ
QQQQQQQQQQQQQQQQAPAEQTQGEPTK lt/pregtlthr size"2"
width"75"gt lttable width"100"gtlttrgtlttd
valign"top" align"left"gtlta href"http//www.yeas
tgenome.org/"gtltimg border"0" src"http//www.yeas
tgenome.org/images/arrow.small.up.gif" /gtReturn
to SGDlt/agtlt/tdgtlttd valign"bottom"
align"right"gtltform method"post"
action"http//db.yeastgenome.org/cgi-bin/SGD/sugg
estion" enctype"application/x-www-form-urlencoded
" target"infowin" name"suggestion"gt ltinput
type"hidden" name"script_name"
value"/cgi-bin/SGD/getSeq" /gtltinput
type"hidden" name"server_name"
value"db.yeastgenome.org" /gtltinput type"hidden"
name"query_string" value"seqYDR099Wampflankl
0ampflankr0ampmapp3map" /gtlta
href"javascriptdocument.suggestion.submit()"gtSen
d a Message to the SGD Curatorsltimg border"0"
src"http//www.yeastgenome.org/images/mail.gif"
/gtlt/agt lt/formgtlt/tdgtlt/trgtlt/tablegtlt/bodygtlt/htmlgt
12
Function to Parse out AA Sequence
create or replace function orf2seq (
p_orf in varchar2 ) return varchar2 is
v_stream clob strt number begin
-- Retrieve the HTTP stream v_stream
httpuritype.getclob(httpuritype.createuri(
'http//db.yeastgenome.org/cgi-bin/SGD
/getSeq?seq'p_orf
'flankl0flankr0mapp3map') )
-- Trim off the head of the stream
strt dbms_lob.instr(v_stream, 'Submit', 1,
1) -- Strip out control characters, new
lines, etc. v_stream
regexp_replace(dbms_lob.substr(v_stream, 4000,
strt), 'cntrl', '') -- Return the
AA sequence return(regexp_substr(dbms_lob
.substr(v_stream, 4000, strt), 'upper10,')
) end
13
AA Sequence for ORF YDR099W
SQLgt select orf2seq('YDR099W') from
dual ORF2SEQ('YDR099W') ------------------------
--------------------------------------------------
------ MSQTREDSVYLAKLAEQAERYEEMVENMKAVASSGQELSVEE
RNLLSVAYKNVIGARRASWRIVSSIEQKEESKEKSEHQVELIRSYRSKIE
TELTKISDDILSVLDSHLIPSATTGESKVFYYKMKGDYHRYLAEFSSGDA
REKATNSSLEAYKTASEIATTELPPTHPIRLGLALNFSVFYYEIQNSPDK
ACHLAKQAFDDAIAELDTLSEESYKDSTLIMQLLRDNLTLWTSDISESGQ
EDQQQQQQQQQQQQQQQQQAPAEQTQGEPTK Elapsed
000001.24
SQLgt insert into pseq (orf_id, sequence) 2
values ('YDR099W', orf2seq('YDR099W'))
14
Case Study Motif Searching in Proteins

PROSITE database of protein sequence motifs
ID TYR_PHOSPHO_SITE PATTERN
AC PS00007
DT APR-1990 (CREATED) APR-1990 (DATA UPDATE)
APR-1990 (INFO UPDATE)
DE Tyrosine kinase phosphorylation site
PA RK-x(2,3)-DE-x(2,3)-Y
CC /TAXO-RANGE??E?V CC /SITE5,phosphorylation
CC /SKIP-FLAGTRUE
DO PDOC00007
Source http//www.expasy.org/prosite/ps_frequent_
patterns.txt
TKP Pattern RK-x(2,3)-DE-x(2,3)-Y
RArginine, KLysine, DAspartate, EGlutamate,
YTyrosine, xany AA
Oracle10g Regular Expression Equivalent
RK.2,3DE.2,3Y

Case study courtesy of Prolexys Pharmaceuticals,
Inc.
15
SQL to Retrieve All Proteins Interacting with TKP
select distinct substr(a.refseq_id, 1,
9) refseq_id, length(a.seq_string_varchar)
seq_length, regexp_instr(a.seq_string_var
char, 'RK.2,3DE.2,3Y', 1, 1)
motif_offs1, regexp_instr(a.seq_string_var
char, 'RK.2,3DE.2,3Y', 1, 2)
motif_offs2, regexp_instr(a.seq_string_var
char, 'RK.2,3DE.2,3Y', 1, 3)
motif_offs3, regexp_instr(a.seq_string_var
char, 'RK.2,3DE.2,3Y', 1, 4)
motif_offs4 from target_db a,
y2h_interaction_p b where a.refseq_id
like 'NP' and regexp_like(a.seq_string_va
rchar, 'RK.2,3DE.2,3Y') and
(substr(a.refseq_id,1,9) b.bait_refseq or
substr(a.refseq_id,1,9) b.prey_refseq)
16
Query Results
REFSEQ_ID SEQ_LENGTH MOTIF1_OFFS
MOTIF2_OFFS MOTIF3_OFFS MOTIF4_OFFS ----------
-- ---------- ----------- ----------- -----------
----------- NP_003961 1465
14 202
347 537 NP_003968 330
241 0
0
0 NP_003983 490
8 50
62 93 NP_004001
3562 3085 0
0 0 ...
MHHCKRYRSPEPDPYLSYRWKRRRSYSREHEGRLRYPSRREPPPRRSRS
RSHDRLPYQRRYRERRDSDTYRCEERSPSFGEDYYGPSRSRHRRRSRERG
PYRTRKHAHHCHKRRTRSCSSASSRSQQSSKRTGRSVEDDKEGHLVCRIG
DWLQERYEIVGNLGEGTFGKVVECLDHARGKSQVALKIIRNVGKYREAAR
LEINVLKKIKEKDKENKFLCVLMSDWFNFHGHMCIAFELLGKNTFEFLKE
NNFQPYPLPHVRHMAYQLCHALRFLHENQLTHTDLKPENILFVNSEFETL
YNEHKSCEEKSVKNTSIRVADFGSATFDHEHHTTIVATRHYRPPEVILEL
GWAQPCDVWSIGCILFEYYRGFTLFQTHENREHLVMMEKILGPIPSHMIH
RTRKQKYFYKGGLVWDENSSDGRYVKENCKPLKSYMLQDSLEHVQLFDLM
RRMLEFDPAQRITLAEALLHPFFAGLTPEERSFHTSRNPSR
17
SQL to Retrieve Motif Frequency by Protein
select c.refseq_id "Refseq ID", rs2desc(c.refseq
_id) "Protein Description", a.cnt
"Repetitions", b.ps_ac "Prosite AC", b.descr
"Motif Description" from motif_data a, ps_data
b, target_dbp c where a.ps_ac b.ps_ac and
a.sequence_id c.sequence_id order by 3 desc,
1
18
Query Results
Refseq ID Protein Description
Repetitions Prosite AC Motif
Description --------------- ----------------------
-------- ----------- ------------
------------------------------ NP_055995.2
spectrin repeat containing, 145
PS00006 Casein kinase II phosphorylation
site nuclear envelope
2
NP_056363.1 bullous pemphigoid antigen 1,
132 PS00006 Casein kinase II
phosphorylation site
230/240kDa
NP_001139.2 ankyrin 2, neuronal
115 PS00006 Casein
kinase II phosphorylation site

NP_066267.1 ankyrin 3, node of Ranvier
110 PS00006 Casein kinase II
phosphorylation site
(ankyrin G)
NP_056363.1 bullous pemphigoid
antigen 1, 102 PS00005 Protein
kinase C phosphorylation site
230/240kDa
NP_005520.2 heparan sulfate
proteoglycan 2 97 PS00008
N-myristoylation site
(perlecan) NP_066267.1 ankyrin 3, node of
Ranvier 97 PS00005 Protein
kinase C phosphorylation site
(ankyrin G)
P_001139.2 ankyrin 2,
neuronal 96 PS00005
Protein kinase C phosphorylation site

NP_115495.1 monogenic, audiogenic
seizure 95 PS00006 Casein kinase II
phosphorylation site
susceptibility 1 homolog (mouse)
...
19
Regular Expression Searches Quote

"Thanks to Oracle 10g's Regular Expressions (RE)
query support, it's no longer necessary to export
data from the database, process it with a RE
enabled tool and then import the data back into
the database. Now, RE processing can be handled
with a single query." - Marcel Davidson, Head of
Database Administration, Myriad Proteomics

20
Oracle Data Mining BLAST

Implemented using a table function interface
BLAST search functions can be placed in SQL
queries
Different functions for match align
Combination of SQL queries BLAST is very
powerful flexible

21
Case Study BLAST as a Sequence Identification
Tool

Identify protein with high sequence similarity
and the functional class
select function, COUNT(seq_id) f_count
from (select t.seq_id, t.score, t.expect,
g.function
from SwissProt_DB g,
Table(BLASTP_MATCH(
AEQAERYDDMAAAMKRY,
cursor (select seq_id, sequence
from SwissProt_DB),
5)) t / expect_value /
where t.seq_id g.seq_id)
group by function / swissprot kw /
order by f_count

function, f_count
GROUP BY
seq_id, function
t.seq_id g.seq_id
seq_id, score, expect
SwissProt_DB
BLASTP_MATCH
query_sequence, parameters
SwissProt_DB
22
Case Study Homology Search between Yeast and
Human Data
Yeast Protein Interactome
Human Protein Interactome
Homology Mapping
A
X
Determined experimentally with Y2H
C
Determined experimentally with Y2H
B
Y
Z
Inferred through BLAST
Interlogs (AX, BY) and (AX, BZ)
Case study courtesy of Prolexys Pharmaceuticals,
Inc.
23
Batch BLAST Human (query) vs. Yeast (subject)

for v1 in c1 loop
insert into yeast_human_homolog (
human_refseq,
yeast_orf_name,
score,
expect
)
select
v1.refseq_id,
t.t_seq_id,
t.score,
t.expect
from
table ( blastp_match (
v1.sequence_string,
cursor ( select a.yeast_acn, a.yeast_seq
from yeast_prot_seq a )
)
) t

24
BLAST Results
Yeast Yeast Human
Human Expect 1 Expect 2
Gene 1 Gene 2 Refseq 1
Refseq 2 ------- ------- -----------
----------- -------- -------- YAR018C
YIL061C NP_XXXXX1.1 NP_YYYYY1.1
4.79E-12 4.58E-06 YBL016W YDL159W
NP_XXXXX2.1 NP_YYYYY2.1 1.11E-08
5.25E-10 YBL016W YDL159W NP_XXXXX3.1
NP_YYYYY3.1 2.63E-10 9.04E-11 YBL016W
YDL159W NP_XXXXX4.1 NP_YYYYY4.1 4.57E-07
8.33E-09 YBL016W YDL159W NP_XXXXX5.1
NP_YYYYY5.1 1.57E-22 1.11E-08 YBL063W
YIL061C NP_XXXXX6.1 NP_YYYYY6.1
3.17E-64 8.67E-06 YBL063W YIL061C
NP_XXXXX7.1 NP_YYYYY7.1 2.30E-06
4.58E-06 YBR109C YDR356W NP_XXXXX8.1
NP_YYYYY8.1 1.78E-07 7.74E-11 YBR109C
YDR356W NP_XXXXX9.1 NP_YYYYY9.1 1.24E-08
7.74E-11 YBR109C YDR356W NP_XXXX10.1
NP_YYYY10.1 5.19E-07 2.80E-20 YBR109C
YDR356W NP_XXXX11.1 NP_YYYY11.1
3.92E-10 4.39E-11 YBR109C YFR014C
NP_XXXX12.1 NP_YYYY12.1 3.67E-48
6.91E-17 YBR109C YOL016C NP_XXXX13.1
NP_YYYY13.1 3.67E-48 1.82E-17
Yeast Interactors
Human Interactors
Interlogs
25
BLAST Quote

"Oracle 10g's new BLAST feature will enable us to
easily integrate multiple types of genomic and
proteomic data for complicated queries used in
the mining of our proprietary protein-protein
interaction and cDNA sequence datasets." - Jake
Chen, Principal Bioinformatics Scientist, Myriad
Proteomics

26
Spatial Network Data Model

Data model for managing graph (link-node)
structures
Rich graph analysis functions
Supports variety of network structures
(hierarchical, directed, undirected, random,
scale-free)
Framework for applying network constraints and
rules (e.g. path length, cost, minimum bounding
rectangle)
Bundled Java visualiser APIs for 3rd party
tools, application development

27
Case Study Integration Architecture
Native Formats
NREF
EMBL
GO
KEGG
BIND
AFCS
Distributed Database layer

Data type determines available routes
Routes can be determined using semantics

NDM layer (semantic layer)
Nodes
Edges
Graph
Network Route
Case study courtesy of Beyond Genomics, Inc.
28
Network Data Model Quote

"Beyond Genomics, Inc., as a leading systems
biology company, believes that Oracle 10g's
network data model will significantly advance the
integration of metabolomic, proteomic,
transcriptomic, and clinical data sets and the
applications that derive value from these data."
Eric Neumann, Vice President Strategic
Informatics, Beyond Genomics, Inc.

29
Oracle Data Mining

Unsupervised Learning
Hierarchical K-means Cluster
O-Cluster
Non-Negative Matrix Factorization
Apriori
Supervised Learning
Naïve Bayes
Adaptive Bayes Network
Support Vector Machines
PredictorVariance
ODM can mine structured data, text data, or
structured and text data

30
K-Means Clustering

Hierarchical k-means produces tree of clusters
All splits are binary
Each cluster has a centroid a histogram
Achieves a reliable solution in a single run
Ranked rules that describe attributes for cluster
Cluster assignments are probabilistic using a
Bayesian model
Operates on very deep datasets by using a
summarization module

31
Case Study Brain Tumor Clustering

Collection of 42 Human Brain tumors and 7,129
gene expression profiles
Clustering of samples according to their gene
expression profiles
It is an example of class and taxonomy discovery
Does the data cluster according to the known
biological classes?

42 Tumor Samples
Normal Cerebellum MD (4)
Malignant Gliomas MGlio (10)
Medulloblastomas MD (10)
Rhabdoid tumors Rhabdoid (10)
Primitive Neuroectodermal PNET (8)

Pomeroy et al Nature 415, 24, p436 (2002).
32
ODM Hierarchical k-Means Clustering
Node 1
Node 2
Node 3
Node 6
Node 4
Node 7
Node 5
Glioblastoma Normal
Medulloblastoma Rhabdoid
Cluster Cluster
Cluster Cluster
33
Literature Results using Hierarchical Clustering
From Pomeroy et al Nature 415, 24, p436 (2002).
34
Association Rules

Captures frequent co-occurrences of
items/attribute values
(A, B) gt C occurrence or A and B together
implies C
Can be applied in different scenarios
Market basket analysis
Pattern discovery
Predictive applications
ODM uses SQL-based implementation of Apriori
algorithm

35
Case Study Analysis of Trends in a Patient Group
Clinical Table of 60 Medulloblastoma Patients 7
Clinical attributes Subtype classic or
desmoplastic medulloblastoma Size (tumor size)
T1-T4 Stage M0-M4 Sex M, F Age (range)
0-5, 5-10, 10-15. Outcome S (treatment
success), F (treatment
failure) Chemo (regime type) 0,1,2,3,4,5,6
Pomeroy et al Nature 415, 24, p436 (2002).
36
Association Rules Results
Over 100 rules reflecting factual or known
relationships in data Age1 THEN
SexM (confidence 0.8) Interpretation Most
5-10 year-old patients are male SubtypeDesmoplas
tic THEN StageM0 (confidence
0.79) Interpretation Most desmoplastic patients
in the study have stage M0
37
Association Rules Results
Other interesting trends StageM0 THEN
OutcomeS (confidence 0.74) Interpretation
Stage M0 vs non-M0 is a predictor of treatment
outcome StageM0 AND SizeT3 AND Chemo1 THEN
OutcomeS (confidence 0.92) Interpretation
Most patients with stage M0, size T3 who received
chemo regime 1 had good response to treatment
38
Support Vector Machines

SVM provides a very general multi-purpose and
powerful classifier
SVM does not require feature selection and can
work well with thousands of input features
SVM is accurate and can approximate complex
functional relationships
SVM works in binary, multi-class, sparse (text)
classification and regression
SVM is easy to train and apply and can be used
in discovery mode or in production automated
methodologies

39
Case Study Classification of Normal Human Tissue
and Tumors

Multiple Examples (14) of normal human tissue and
tumors
Could a single model distinguish normal vs
cancer?
Train set 200 samples, test set 80 samples
Microarrays profiles for 7,129 genes

Normal Tissue vs. Cancer
S. Ramaswamy et al, Proc. Natl. Acad. Sci. USA
98 15149-15154 (2001)
40
Support Vector Machines Results
Normal vs. Cancer (Multiple types) SVM Test Set
Predictions
Predicted Normal
Cancer Actual Normal 16 10
Cancer 3 51 Test set
accuracy 83.75
(Naïve Bayes
75)
41
Classification of Multiple Tumor Types
DNA Microarray Data for 14 Tumor Classes
Published Datasets

S. Ramaswamy et al, Proc. Natl. Acad. Sci. USA
98 15149-15154 (2001)
C. Yeang et al, Procs. of ISMB 2001.
Bioinformatics Discovery Note, 11-7, (2001)

42
Results of Multiple Tumor Type Analysis

Gene expression profiles for 7,129 genes
Datasets tumor type composition
9 minutes training time on 500MHz Netra
78.3 accuracy for multi-tumor molecular
classification

Tumor Class Train Test Tumor Class Train Test
Breast (BR) 8 3 Uterus (UT) 8 2
Prostate (PR) 8 2 Leukemia (LE) 24 6
Lung (LU) 8 3 Renal (RE) 8 3
Colorectal (CO) 8 5 Pancreas (PA) 8 3
Lymphoma (LY) 16 6 Ovary (OV) 8 3
Bladder (BL) 8 3 Mesothelioma (MS) 8 3
Melanoma (ML) 8 2 Brain (BR) 16 4
43
Outline

Data Challenges
Case Studies
Summary

44
Summary

Databases have functionality to access and
integrate distributed data
There are data management, performance and
security benefits to performing analytics in
databases
A range of analytical functionality is now
available in databases

Write a Comment

User Comments (0)