Title: Databases as Analytical Engines for Drug Discovery
1Databases as Analytical Engines for Drug Discovery
- Susie StephensPrincipal Product Manager, Life
Sciences - Oracle Corporation
- susie.stephens_at_oracle.com
2Outline
- Data Challenges
- Case Studies
- Summary
3Access Distributed Data
External Sites
UltraSearch
Distributed query
MySQL
Flat files
Sybase
SRS
DBlinks
Transparent Gateway
Generic Connectivity
Transparent Gateway
External Table
4Integrate a Variety of Data Types
- CLOBs
- XML
- Text
- Images
- Video
- Relational
- Users Defined Objects
- Nucleotide Sequences
- Gene Expression Data
- Papers
- Cell Histology Images
- Protein Folding Video
- SwissProt
- KEGG
- Chemical Structures
XML
5Manage Vast Quantities of Data
- Partitioning
- Oracle Data Guard
- Real Application Clusters (RAC)
- Automated Storage Management
- Adaptive Instance Tuning
- Automated Application and SQL Tuning
- Automated Database Diagnostic Monitor (ADDM)
- Scheduling
6Collaborate Securely
- Integrated communications
- Single enterprise search
- Flexible access
- Fine grained access control
- Auditing
- Workflow
- Personalized portal
7Find Patterns and Insights
- Oracle Data Mining
- Find relationships clusters
- Oracle Discoverer Oracle OLAP
- Interactive query drill-down
- Statistics
- mean, stdev, median, correlations, linear
regression - Oracle Text
- Cluster Classify documents of interest
- Table Functions
- Implement complex algorithms within the database
8Outline
- Data Challenges
- Case Studies
- Summary
9Regular Expression Searches
- A powerful method of describing both simple
complex patterns for searching manipulating - A multilingual regular expression support for SQL
PL/SQL string types - Follows POSIX style Regexp syntax
- Support standard Regexp operators
- Includes common extensions such as
case-insensitive matching, sub-expression
back-references, etc. - Compatible with popular Regexp implementations
like GNU, Perl, Awk
10Case Study Retrieve Protein Data from SGD using
Regular Expressions
Case study courtesy of Prolexys Pharmaceuticals,
Inc.
11HTTP Raw Data
lt/scriptgt lt/headgtltbodygtltbody bgcolor'FFFFFF'gt ltt
able cellpadding"2" width"100" cellspacing"0"
border"0"gtlttrgtlttd colspan"4"gtlthr width"100"
/gtlt/tdgtlt/trgtlttrgtlttd valign"middle"
align"right"gtlta href"http//www.yeastgenome.org/
"gtltimg alt"SGD" border"0" src"http//www.yeastg
enome.org/images/SGD-to.gif" /gtlt/agtlt/tdgtltth
valign"middle" nowrap"1"gtQuick Searchlt/thgtlttd
valign"middle" align"left"gtltform method"post"
action"http//db.yeastgenome.org/cgi-bin/SGD/sear
ch/quickSearch" enctype"application/x-www-form-ur
lencoded"gt ltinput type"text" name"query"
size"13" /gtltinput type"submit" name"Submit"
value"Submit" /gt lt/formgtlt/tdgtltth valign"middle"
align"left"gtlta href"http//www.yeastgenome.org/s
itemap.html"gtSite Maplt/agt lta href"http//www.ye
astgenome.org/HelpContents.shtml"gtHelplt/agt lta
href"http//www.yeastgenome.org/SearchContents.sh
tml"gtFull Searchlt/agt lta href"http//www.yeastge
nome.org/"gtHomelt/agtlt/thgtlt/trgtlttrgtlttd align"left"
colspan"4"gtlttable cellpadding"1" width"100"
cellspacing"0" border"0"gtlttr align"center"
bgcolor"navajowhite"gtlttdgtltfont size"-1"gtlta
href"http//www.yeastgenome.org/ComContents.shtml
"gtCommunity Infolt/agtlt/fontgtlt/tdgtlttdgtltfont
size"-1"gtlta href"http//www.yeastgenome.org/Subm
itContents.shtml"gtSubmit Datalt/agtlt/fontgtlt/tdgtlttdgtlt
font size"-1"gtlta href"http//seq.yeastgenome.org
/cgi-bin/SGD/nph-blast2sgd"gtBLASTlt/agtlt/fontgtlt/tdgtlt
tdgtltfont size"-1"gtlta href"http//seq.yeastgenome
.org/cgi-bin/SGD/web-primer"gtPrimerslt/agtlt/fontgtlt/t
dgtlttdgtltfont size"-1"gtlta href"http//seq.yeastgen
ome.org/cgi-bin/SGD/PATMATCH/nph-patmatch"gtPatMatc
hlt/agtlt/fontgtlt/tdgtlttdgtltfont size"-1"gtlta
href"http//db.yeastgenome.org/cgi-bin/SGD/seqToo
ls"gtGene/Seq Resourceslt/agtlt/fontgtlt/tdgtlttdgtltfont
size"-1"gtlta href"http//www.yeastgenome.org/Vl-y
east.shtml"gtVirtual Librarylt/agtlt/fontgtlt/tdgtlttdgtltfo
nt size"-1"gtlta href"http//db.yeastgenome.org/cg
i-bin/SGD/suggestion"gtContact SGDlt/agtlt/fontgtlt/tdgtlt
/trgtlt/tablegtlt/tdgtlt/trgtlttrgtlttd colspan"4"gtlthr
width"100" /gtlt/tdgtlt/trgtlt/tablegtlttable
cellpadding"0" width"100" cellspacing"0"
border"0"gtlttrgtlttd width"10"gtltbr /gtlt/tdgtlttd
valign"middle" align"center" width"80"gtlth1gtSeq
uence for a region of YDR099W/BMH2lt/h1gtlt/tdgtlttd
valign"middle" align"right" width"10"gtlt/tdgtlt/t
rgtlt/tablegtltp /gtltcentergtlta target"infowin"
href"http//db.yeastgenome.org/cgi-bin/SGD/sugges
tion"gtSend questions or suggestions to
SGDlt/agtlt/centergtltp /gtltp /gtltcentergtlta
target"infowin" href"http//seq.yeastgenome.org/
cgi-bin/SGD/nph-blast2sgd?nameYDR099Wampsuffix
prot"gtBLAST searchlt/agt lta target"infowin"
href"http//seq.yeastgenome.org/cgi-bin/SGD/nph-f
astasgd?nameYDR099Wampsuffixprot"gtFASTA
searchlt/agtlt/centergtltp /gtltcentergtlthr width"35"
/gtlt/centergtltp /gtltfont color"FF0000"gtltstronggtProte
in translation of the coding sequence.lt/stronggtlt/f
ontgtltp /gtltp /gtOther Formats Available lta
href"http//db.yeastgenome.org/cgi-bin/SGD/getSeq
?mappmapampseqYDR099Wampflankl0ampflankr
0amprev"gtGCGlt/agtltpregtgtYDR099W Chr 4
MSQTREDSVYLAKLAEQAERYEEMVENMKAVASSGQELSVEERNLLSVA
YKNVIGARRAS WRIVSSIEQKEESKEKSEHQVELIRSYRSKIETELTKI
SDDILSVLDSHLIPSATTGESK VFYYKMKGDYHRYLAEFSSGDAREKAT
NSSLEAYKTASEIATTELPPTHPIRLGLALNFS VFYYEIQNSPDKACHL
AKQAFDDAIAELDTLSEESYKDSTLIMQLLRDNLTLWTSDISES GQEDQ
QQQQQQQQQQQQQQQQAPAEQTQGEPTK lt/pregtlthr size"2"
width"75"gt lttable width"100"gtlttrgtlttd
valign"top" align"left"gtlta href"http//www.yeas
tgenome.org/"gtltimg border"0" src"http//www.yeas
tgenome.org/images/arrow.small.up.gif" /gtReturn
to SGDlt/agtlt/tdgtlttd valign"bottom"
align"right"gtltform method"post"
action"http//db.yeastgenome.org/cgi-bin/SGD/sugg
estion" enctype"application/x-www-form-urlencoded
" target"infowin" name"suggestion"gt ltinput
type"hidden" name"script_name"
value"/cgi-bin/SGD/getSeq" /gtltinput
type"hidden" name"server_name"
value"db.yeastgenome.org" /gtltinput type"hidden"
name"query_string" value"seqYDR099Wampflankl
0ampflankr0ampmapp3map" /gtlta
href"javascriptdocument.suggestion.submit()"gtSen
d a Message to the SGD Curatorsltimg border"0"
src"http//www.yeastgenome.org/images/mail.gif"
/gtlt/agt lt/formgtlt/tdgtlt/trgtlt/tablegtlt/bodygtlt/htmlgt
12Function to Parse out AA Sequence
create or replace function orf2seq (
p_orf in varchar2 ) return varchar2 is
v_stream clob strt number begin
-- Retrieve the HTTP stream v_stream
httpuritype.getclob(httpuritype.createuri(
'http//db.yeastgenome.org/cgi-bin/SGD
/getSeq?seq'p_orf
'flankl0flankr0mapp3map') )
-- Trim off the head of the stream
strt dbms_lob.instr(v_stream, 'Submit', 1,
1) -- Strip out control characters, new
lines, etc. v_stream
regexp_replace(dbms_lob.substr(v_stream, 4000,
strt), 'cntrl', '') -- Return the
AA sequence return(regexp_substr(dbms_lob
.substr(v_stream, 4000, strt), 'upper10,')
) end
13AA Sequence for ORF YDR099W
SQLgt select orf2seq('YDR099W') from
dual ORF2SEQ('YDR099W') ------------------------
--------------------------------------------------
------ MSQTREDSVYLAKLAEQAERYEEMVENMKAVASSGQELSVEE
RNLLSVAYKNVIGARRASWRIVSSIEQKEESKEKSEHQVELIRSYRSKIE
TELTKISDDILSVLDSHLIPSATTGESKVFYYKMKGDYHRYLAEFSSGDA
REKATNSSLEAYKTASEIATTELPPTHPIRLGLALNFSVFYYEIQNSPDK
ACHLAKQAFDDAIAELDTLSEESYKDSTLIMQLLRDNLTLWTSDISESGQ
EDQQQQQQQQQQQQQQQQQAPAEQTQGEPTK Elapsed
000001.24
SQLgt insert into pseq (orf_id, sequence) 2
values ('YDR099W', orf2seq('YDR099W'))
14Case Study Motif Searching in Proteins
- PROSITE database of protein sequence motifs
- ID TYR_PHOSPHO_SITE PATTERN
- AC PS00007
- DT APR-1990 (CREATED) APR-1990 (DATA UPDATE)
APR-1990 (INFO UPDATE) - DE Tyrosine kinase phosphorylation site
- PA RK-x(2,3)-DE-x(2,3)-Y
- CC /TAXO-RANGE??E?V CC /SITE5,phosphorylation
- CC /SKIP-FLAGTRUE
- DO PDOC00007
- Source http//www.expasy.org/prosite/ps_frequent_
patterns.txt - TKP Pattern RK-x(2,3)-DE-x(2,3)-Y
- RArginine, KLysine, DAspartate, EGlutamate,
YTyrosine, xany AA - Oracle10g Regular Expression Equivalent
- RK.2,3DE.2,3Y
Case study courtesy of Prolexys Pharmaceuticals,
Inc.
15SQL to Retrieve All Proteins Interacting with TKP
select distinct substr(a.refseq_id, 1,
9) refseq_id, length(a.seq_string_varchar)
seq_length, regexp_instr(a.seq_string_var
char, 'RK.2,3DE.2,3Y', 1, 1)
motif_offs1, regexp_instr(a.seq_string_var
char, 'RK.2,3DE.2,3Y', 1, 2)
motif_offs2, regexp_instr(a.seq_string_var
char, 'RK.2,3DE.2,3Y', 1, 3)
motif_offs3, regexp_instr(a.seq_string_var
char, 'RK.2,3DE.2,3Y', 1, 4)
motif_offs4 from target_db a,
y2h_interaction_p b where a.refseq_id
like 'NP' and regexp_like(a.seq_string_va
rchar, 'RK.2,3DE.2,3Y') and
(substr(a.refseq_id,1,9) b.bait_refseq or
substr(a.refseq_id,1,9) b.prey_refseq)
16Query Results
REFSEQ_ID SEQ_LENGTH MOTIF1_OFFS
MOTIF2_OFFS MOTIF3_OFFS MOTIF4_OFFS ----------
-- ---------- ----------- ----------- -----------
----------- NP_003961 1465
14 202
347 537 NP_003968 330
241 0
0
0 NP_003983 490
8 50
62 93 NP_004001
3562 3085 0
0 0 ...
MHHCKRYRSPEPDPYLSYRWKRRRSYSREHEGRLRYPSRREPPPRRSRS
RSHDRLPYQRRYRERRDSDTYRCEERSPSFGEDYYGPSRSRHRRRSRERG
PYRTRKHAHHCHKRRTRSCSSASSRSQQSSKRTGRSVEDDKEGHLVCRIG
DWLQERYEIVGNLGEGTFGKVVECLDHARGKSQVALKIIRNVGKYREAAR
LEINVLKKIKEKDKENKFLCVLMSDWFNFHGHMCIAFELLGKNTFEFLKE
NNFQPYPLPHVRHMAYQLCHALRFLHENQLTHTDLKPENILFVNSEFETL
YNEHKSCEEKSVKNTSIRVADFGSATFDHEHHTTIVATRHYRPPEVILEL
GWAQPCDVWSIGCILFEYYRGFTLFQTHENREHLVMMEKILGPIPSHMIH
RTRKQKYFYKGGLVWDENSSDGRYVKENCKPLKSYMLQDSLEHVQLFDLM
RRMLEFDPAQRITLAEALLHPFFAGLTPEERSFHTSRNPSR
17SQL to Retrieve Motif Frequency by Protein
select c.refseq_id "Refseq ID", rs2desc(c.refseq
_id) "Protein Description", a.cnt
"Repetitions", b.ps_ac "Prosite AC", b.descr
"Motif Description" from motif_data a, ps_data
b, target_dbp c where a.ps_ac b.ps_ac and
a.sequence_id c.sequence_id order by 3 desc,
1
18Query Results
Refseq ID Protein Description
Repetitions Prosite AC Motif
Description --------------- ----------------------
-------- ----------- ------------
------------------------------ NP_055995.2
spectrin repeat containing, 145
PS00006 Casein kinase II phosphorylation
site nuclear envelope
2
NP_056363.1 bullous pemphigoid antigen 1,
132 PS00006 Casein kinase II
phosphorylation site
230/240kDa
NP_001139.2 ankyrin 2, neuronal
115 PS00006 Casein
kinase II phosphorylation site
NP_066267.1 ankyrin 3, node of Ranvier
110 PS00006 Casein kinase II
phosphorylation site
(ankyrin G)
NP_056363.1 bullous pemphigoid
antigen 1, 102 PS00005 Protein
kinase C phosphorylation site
230/240kDa
NP_005520.2 heparan sulfate
proteoglycan 2 97 PS00008
N-myristoylation site
(perlecan) NP_066267.1 ankyrin 3, node of
Ranvier 97 PS00005 Protein
kinase C phosphorylation site
(ankyrin G)
P_001139.2 ankyrin 2,
neuronal 96 PS00005
Protein kinase C phosphorylation site
NP_115495.1 monogenic, audiogenic
seizure 95 PS00006 Casein kinase II
phosphorylation site
susceptibility 1 homolog (mouse)
...
19Regular Expression Searches Quote
- "Thanks to Oracle 10g's Regular Expressions (RE)
query support, it's no longer necessary to export
data from the database, process it with a RE
enabled tool and then import the data back into
the database. Now, RE processing can be handled
with a single query." - Marcel Davidson, Head of
Database Administration, Myriad Proteomics
20Oracle Data Mining BLAST
- Implemented using a table function interface
- BLAST search functions can be placed in SQL
queries - Different functions for match align
- Combination of SQL queries BLAST is very
powerful flexible -
21Case Study BLAST as a Sequence Identification
Tool
- Identify protein with high sequence similarity
and the functional class - select function, COUNT(seq_id) f_count
- from (select t.seq_id, t.score, t.expect,
g.function - from SwissProt_DB g,
- Table(BLASTP_MATCH(
- AEQAERYDDMAAAMKRY,
- cursor (select seq_id, sequence
- from SwissProt_DB),
- 5)) t / expect_value /
- where t.seq_id g.seq_id)
- group by function / swissprot kw /
- order by f_count
function, f_count
GROUP BY
seq_id, function
t.seq_id g.seq_id
seq_id, score, expect
SwissProt_DB
BLASTP_MATCH
query_sequence, parameters
SwissProt_DB
22Case Study Homology Search between Yeast and
Human Data
Yeast Protein Interactome
Human Protein Interactome
Homology Mapping
A
X
Determined experimentally with Y2H
C
Determined experimentally with Y2H
B
Y
Z
Inferred through BLAST
Interlogs (AX, BY) and (AX, BZ)
Case study courtesy of Prolexys Pharmaceuticals,
Inc.
23Batch BLAST Human (query) vs. Yeast (subject)
- for v1 in c1 loop
- insert into yeast_human_homolog (
- human_refseq,
- yeast_orf_name,
- score,
- expect
- )
- select
- v1.refseq_id,
- t.t_seq_id,
- t.score,
- t.expect
- from
- table ( blastp_match (
- v1.sequence_string,
- cursor ( select a.yeast_acn, a.yeast_seq
- from yeast_prot_seq a )
- )
- ) t
24BLAST Results
Yeast Yeast Human
Human Expect 1 Expect 2
Gene 1 Gene 2 Refseq 1
Refseq 2 ------- ------- -----------
----------- -------- -------- YAR018C
YIL061C NP_XXXXX1.1 NP_YYYYY1.1
4.79E-12 4.58E-06 YBL016W YDL159W
NP_XXXXX2.1 NP_YYYYY2.1 1.11E-08
5.25E-10 YBL016W YDL159W NP_XXXXX3.1
NP_YYYYY3.1 2.63E-10 9.04E-11 YBL016W
YDL159W NP_XXXXX4.1 NP_YYYYY4.1 4.57E-07
8.33E-09 YBL016W YDL159W NP_XXXXX5.1
NP_YYYYY5.1 1.57E-22 1.11E-08 YBL063W
YIL061C NP_XXXXX6.1 NP_YYYYY6.1
3.17E-64 8.67E-06 YBL063W YIL061C
NP_XXXXX7.1 NP_YYYYY7.1 2.30E-06
4.58E-06 YBR109C YDR356W NP_XXXXX8.1
NP_YYYYY8.1 1.78E-07 7.74E-11 YBR109C
YDR356W NP_XXXXX9.1 NP_YYYYY9.1 1.24E-08
7.74E-11 YBR109C YDR356W NP_XXXX10.1
NP_YYYY10.1 5.19E-07 2.80E-20 YBR109C
YDR356W NP_XXXX11.1 NP_YYYY11.1
3.92E-10 4.39E-11 YBR109C YFR014C
NP_XXXX12.1 NP_YYYY12.1 3.67E-48
6.91E-17 YBR109C YOL016C NP_XXXX13.1
NP_YYYY13.1 3.67E-48 1.82E-17
Yeast Interactors
Human Interactors
Interlogs
25BLAST Quote
- "Oracle 10g's new BLAST feature will enable us to
easily integrate multiple types of genomic and
proteomic data for complicated queries used in
the mining of our proprietary protein-protein
interaction and cDNA sequence datasets." - Jake
Chen, Principal Bioinformatics Scientist, Myriad
Proteomics
26Spatial Network Data Model
- Data model for managing graph (link-node)
structures - Rich graph analysis functions
- Supports variety of network structures
(hierarchical, directed, undirected, random,
scale-free) - Framework for applying network constraints and
rules (e.g. path length, cost, minimum bounding
rectangle) - Bundled Java visualiser APIs for 3rd party
tools, application development
27Case Study Integration Architecture
Native Formats
NREF
EMBL
GO
KEGG
BIND
AFCS
Distributed Database layer
- Data type determines available routes
- Routes can be determined using semantics
NDM layer (semantic layer)
Nodes
Edges
Graph
Network Route
Case study courtesy of Beyond Genomics, Inc.
28Network Data Model Quote
- "Beyond Genomics, Inc., as a leading systems
biology company, believes that Oracle 10g's
network data model will significantly advance the
integration of metabolomic, proteomic,
transcriptomic, and clinical data sets and the
applications that derive value from these data."
Eric Neumann, Vice President Strategic
Informatics, Beyond Genomics, Inc.
29Oracle Data Mining
- Unsupervised Learning
- Hierarchical K-means Cluster
- O-Cluster
- Non-Negative Matrix Factorization
- Apriori
- Supervised Learning
- Naïve Bayes
- Adaptive Bayes Network
- Support Vector Machines
- PredictorVariance
- ODM can mine structured data, text data, or
structured and text data
30K-Means Clustering
- Hierarchical k-means produces tree of clusters
- All splits are binary
- Each cluster has a centroid a histogram
- Achieves a reliable solution in a single run
- Ranked rules that describe attributes for cluster
- Cluster assignments are probabilistic using a
Bayesian model - Operates on very deep datasets by using a
summarization module
31Case Study Brain Tumor Clustering
- Collection of 42 Human Brain tumors and 7,129
gene expression profiles - Clustering of samples according to their gene
expression profiles - It is an example of class and taxonomy discovery
- Does the data cluster according to the known
biological classes?
- 42 Tumor Samples
- Normal Cerebellum MD (4)
- Malignant Gliomas MGlio (10)
- Medulloblastomas MD (10)
- Rhabdoid tumors Rhabdoid (10)
- Primitive Neuroectodermal PNET (8)
Pomeroy et al Nature 415, 24, p436 (2002).
32ODM Hierarchical k-Means Clustering
Node 1
Node 2
Node 3
Node 6
Node 4
Node 7
Node 5
Glioblastoma Normal
Medulloblastoma Rhabdoid
Cluster Cluster
Cluster Cluster
33Literature Results using Hierarchical Clustering
From Pomeroy et al Nature 415, 24, p436 (2002).
34Association Rules
- Captures frequent co-occurrences of
items/attribute values - (A, B) gt C occurrence or A and B together
implies C - Can be applied in different scenarios
- Market basket analysis
- Pattern discovery
- Predictive applications
- ODM uses SQL-based implementation of Apriori
algorithm
35Case Study Analysis of Trends in a Patient Group
Clinical Table of 60 Medulloblastoma Patients 7
Clinical attributes Subtype classic or
desmoplastic medulloblastoma Size (tumor size)
T1-T4 Stage M0-M4 Sex M, F Age (range)
0-5, 5-10, 10-15. Outcome S (treatment
success), F (treatment
failure) Chemo (regime type) 0,1,2,3,4,5,6
Pomeroy et al Nature 415, 24, p436 (2002).
36Association Rules Results
Over 100 rules reflecting factual or known
relationships in data Age1 THEN
SexM (confidence 0.8) Interpretation Most
5-10 year-old patients are male SubtypeDesmoplas
tic THEN StageM0 (confidence
0.79) Interpretation Most desmoplastic patients
in the study have stage M0
37Association Rules Results
Other interesting trends StageM0 THEN
OutcomeS (confidence 0.74) Interpretation
Stage M0 vs non-M0 is a predictor of treatment
outcome StageM0 AND SizeT3 AND Chemo1 THEN
OutcomeS (confidence 0.92) Interpretation
Most patients with stage M0, size T3 who received
chemo regime 1 had good response to treatment
38Support Vector Machines
- SVM provides a very general multi-purpose and
powerful classifier - SVM does not require feature selection and can
work well with thousands of input features - SVM is accurate and can approximate complex
functional relationships - SVM works in binary, multi-class, sparse (text)
classification and regression - SVM is easy to train and apply and can be used
in discovery mode or in production automated
methodologies
39Case Study Classification of Normal Human Tissue
and Tumors
- Multiple Examples (14) of normal human tissue and
tumors - Could a single model distinguish normal vs
cancer? - Train set 200 samples, test set 80 samples
- Microarrays profiles for 7,129 genes
Normal Tissue vs. Cancer
S. Ramaswamy et al, Proc. Natl. Acad. Sci. USA
98 15149-15154 (2001)
40Support Vector Machines Results
Normal vs. Cancer (Multiple types) SVM Test Set
Predictions
Predicted Normal
Cancer Actual Normal 16 10
Cancer 3 51 Test set
accuracy 83.75
(Naïve Bayes
75)
41Classification of Multiple Tumor Types
DNA Microarray Data for 14 Tumor Classes
Published Datasets
- S. Ramaswamy et al, Proc. Natl. Acad. Sci. USA
98 15149-15154 (2001) - C. Yeang et al, Procs. of ISMB 2001.
Bioinformatics Discovery Note, 11-7, (2001)
42Results of Multiple Tumor Type Analysis
- Gene expression profiles for 7,129 genes
- Datasets tumor type composition
-
- 9 minutes training time on 500MHz Netra
- 78.3 accuracy for multi-tumor molecular
classification
Tumor Class Train Test Tumor Class Train Test
Breast (BR) 8 3 Uterus (UT) 8 2
Prostate (PR) 8 2 Leukemia (LE) 24 6
Lung (LU) 8 3 Renal (RE) 8 3
Colorectal (CO) 8 5 Pancreas (PA) 8 3
Lymphoma (LY) 16 6 Ovary (OV) 8 3
Bladder (BL) 8 3 Mesothelioma (MS) 8 3
Melanoma (ML) 8 2 Brain (BR) 16 4
43Outline
- Data Challenges
- Case Studies
- Summary
44Summary
- Databases have functionality to access and
integrate distributed data - There are data management, performance and
security benefits to performing analytics in
databases - A range of analytical functionality is now
available in databases