Data Management for Frontiers at the Interface Between Computing and Biology PowerPoint PPT Presentation

presentation player overlay
About This Presentation
Transcript and Presenter's Notes

Title: Data Management for Frontiers at the Interface Between Computing and Biology


1
Data Management for Frontiers at the Interface
Between Computing and Biology
  • Jim Gray
  • Microsoft Research

2
Cosmic Questions
  • Where are we today?
  • Where in 5 years?
  • What are the key questions?
  • What am I doing next?
  • What are the barriers?
  • What hinders collaboration?
  • What changes needed in education?

3
How much information is there?
Yotta Zetta Exa Peta Tera Giga Mega Kilo
Everything! Recorded
  • Soon everything can be recorded and indexed
  • Most bytes will never be seen by humans.
  • Human attention is the precious resource.
  • Automatic Capture, store, organize, analyze,
    summarize
  • Manual visualize/iterate

All Books MultiMedia
All LoC books (words)
.Movie
A Photo
A Book
24 Yecto, 21 zepto, 18 atto, 15 femto, 12 pico, 9
nano, 6 micro, 3 milli
4
Plumbing
  • Everything can be online
  • Storage is nearing 1 K/TeraByte,
  • Networking is 1 / delivered GB
  • Software is cheap or free
  • Systems are becoming self-managing

5
Data Management Systems
  • Can ingest/store/search/analyze Tera Bytes
  • Numbers
  • Text
  • Some progress on objects
  • But semantics have to come from the domain
  • Good science and engineering, butFlopped in
    marketplace.

6
Basic Problems
  • Data Acquisition
  • I do not much to say here
  • Data Ingest
  • This is a huge problem
  • Data Organization Access
  • This is what databases are good at for text
    numbers
  • For semantic data it requires domain specific
    tools.
  • Data Publication/ Discovery/ Interchange
  • Requires good standards
  • We have syntactic standards, Semantic standards
    are needed.

7
My 1 Problem Data Interchange(includes
publication and discovery)
  • What does the data mean?
  • The answer is 42.
  • Units?
  • Precision? Accuracy?
  • How was the number derived?
  • How can you tell me what it means (without us
    talking on the phone or you visiting my
    laboratory)
  • Need standard terminology, and standard formats.
  • Hard to do for new stuff.

8
Great Hope Promise
  • XML is the answer
  • Reality XML is one layer up from Unicode.
  • Can describe structured information
  • But not process, not meaning, not
  • Answer 2 Objects
  • SOAP, Web Services,
  • Probably a better answer
  • But still needs tools to make it workable.

9
  • Discussion

10
Giffords List
  • Data Interchange
  • Scale whats big
  • Quality how do you keep it up
  • DBs need more semantics.
Write a Comment
User Comments (0)
About PowerShow.com