Database Management IS698 - PowerPoint PPT Presentation

1 / 13
About This Presentation
Title:

Database Management IS698

Description:

Database Management IS698 Week1: Motivation & DBMS Architecture Overview Min Song IS NJIT About me I was a visiting professor in the Department of Computer and ... – PowerPoint PPT presentation

Number of Views:60
Avg rating:3.0/5.0
Slides: 14
Provided by: MinS4
Category:

less

Transcript and Presenter's Notes

Title: Database Management IS698


1
Database Management IS698
  • Week1 Motivation DBMS Architecture Overview
  • Min Song
  • IS
  • NJIT

2
About me
  • I was a visiting professor in the Department of
    Computer and Information Sciences at Temple
    University.
  • I also worked full time at Thomson Scientific for
    about 7 and half years as a senior software
    engineer.
  • I received my Ph.D. from the College of
    Information Science and Technology, Drexel
    University, 2005, and his M.S.degree from the
    School of Information and Library Sciences at
    Indiana University in 1996.

3
My Research Interests
  • Text Mining
  • Digital Libraries
  • Bioinformatics
  • Information Retrieval
  • Large scale Web-based Systems

4
Recent Research Activities
  • Co-organizer of First International Workshop on
    Text Mining in Bioinformatics (TMBIO 2006) in
    conjunction with CIKM.
  • Co-editor of Handbook on Text and Web Mining
    Technologies, Idea Group Inc.
  • Guest editor of an special issue in BMC
    Bioinformatics

5
Why databases? Why DB research?
  • The technology trend angle emphasis in CS/IS
    research has shifted from computation to
    information management. Evidence
  • Hardware high-performance computer companies on
    hard times (Thinking Machines, KSR, Cray, SGI?). 
    The exemplary success story in massive
    parallelism Teradata (now sold by NCR).  Been
    around since the 70's.  "Shared-Nothing"
    (sometimes called "clusters etc.)  Successes
    have been largely database-centric.
  • "Low-end" users scramble to webspace reflects
    desire to give/receive info. Success of these
    efforts is questionable, and the disorganization
    will get worse as things grow.
  • "High-end" users scientists, the biggest users
    of high-powered computation, now have data
    management problems that exceed their appetite
    for cycles
  • Other researchers architecture, OS,
    theoreticians, AI are all moving this way.
  • PS you will see all this in the job market!

6
Why databases? Why DB research? Continue
  • The utilitarian angle "Database the boring part
    of accounting"? Not anymore! Interesting,
    world-changing apps
  • digital libraries
  • digital asset mgmt'' -- i.e., multimedia
    entertainment
  • digital mapping geo apps
  • scientific applications earth science, DNA,
    molecular docking, experiment management, etc.
  • decision-support, data analysis "mining"

7
Why databases? Why DB research? continue
  • The intellectual angle
  • Big, beautiful ideas relational model
    languages, concurrency control, query processing,
    etc.
  • Real, meaty systems work the serious 24x7, high
    performance, complex systems engineering domain
  • Room for both kinds of contributions, separately
    and simultaneously
  • plenty of room to take an idea from theory to
    practice
  • lots of useful research left to do

8
An outline of ongoing database research
  • Big massive datasets
  • Tertiary storage EOSDIS 1 Tb/day, keep it all
    for 15 years
  • Parallelism data parallelism is natural in a
    DBMS. How to do DB operations in parallel and
    balance load well? WalMart (365 node, 6Tb online,
    4billion row table, 200million updates daily,
    4000 queries/day, 1500 users/week, 4 min DS
    response time w/ avg. 60000 rows out)
  • Data Analysis, Data Mining and Text Mining given
    huge amounts of data, try to find interesting
    information in the data.  What is the "killer
    query"?
  • Wide wide-scale distribution
  • World-Wide Web (a bad example)
  • Distributed databases for the 00's autonomous
    "pay play" databases

9
An outline of ongoing database research continue
  • Complex complex datatypes and their associated
    lookups
  • complex base types geographic data, multimedia,
    scientific data, CAD data, textual data
  • complex objects
  • extensible query processing engines
  • indexing new data types
  • Old hetero the data integration problem
  • schema integration trying to figure out how
    different schemas fit together. Hard!!!
  • DBMS integration trying to semi-transparently
    glue different kinds of database systems together

10
DBMS History
  • late 60's network (CODASYL) hierarchical (IMS)
    DBMS.
  • Low-level record-at-a-time'' DML, i.e. physical
    data structures reflected in DML (no data
    independence)
  • 1970 Codd's paper. The most influential paper in
    DB research. Set-at-a-time DML. Data
    independence. Allows for schema and physical
    storage structures to change under the covers''.
    Truly important theory, led to "paradigm shift"
    in thinking and in practice.   (Papadimitriou
    "as clear a paradigm shift as we can hope to find
    in computer science").  Turing award.
  • early-to-mid-70's raging debate between the two
    camps. "great debate" in 1975
  • mid 70's 2 full-function (sort of) prototypes.
    Ancestors of essentially all today's commercial
    systems
  • Ingres UCB 1974-77
  • a pickup team'', including Stonebraker Wong.
    early and pioneering. begat Ingres Corp (CA),
    Sybase, MS SQL Server, Britton-Lee, Wang's PACE.
  • System R IBM San Jose (now Almaden)
  • 15 PhDs. begat IBM's SQL/DS DB2, Oracle, HP's
    Allbase, Tandem's Non-Stop SQL. System R arguably
    got more stuff right''

11
DBMS History continue
  • Both were viable starting points, proved
    practicality of relational approach. Beautiful
    example of theory  -gt practice!!
  • early 80's commercialization of relational
    systems
  • mid 80's SQL becomes intergalactic standard''.
  • DB2 becomes IBM's flagship product.
  • IMS sunseted''
  • today network hierarchical essentially dead
    (though commonly in use!)
  • relational is mainstream
  • SQL ( perhaps RDBMS) too flawed to last in
    current form.
  • semantically flawed in various ways (Date, 1985).
  • in an effort to fix it up, standards committees
    are making a mess
  • design by committee leads to kitchen sink
  • standards body as designers, rather than
    codifiers
  • leads to wasting time (Sybase) or irrelevance of
    standard (Informix IBM shipping SQL3 before
    standardized)
  • various players in research, industry and both
    scrambling to standardize the "next thing"

12
Modern DBMS taxonomy
  • Functionality RDBMS, OODBMS, ORDBMS.
  • RDBMS query in, data out.
  • simple data model tables with rows and columns,
    simple data types.
  • widely standardized definitions, languages
  • clean mathematical foundation
  • OODBMS term is somewhat nebulous. usually, a
    persistent programming environment
  • no queries (or only VERY simple ones).
  • data model comes from PL, includes lots of good
    OO stuff.
  • theoretical "foundations" after the fact, very
    complicated.
  • ORDBMS term is getting better defined as
    products mature (Informix, IBM)
  • an attempt to provide best of both worlds
    queries rich data types.
  • query interface.
  • Rich data types with lots of OO features, esp.
    object identity, type-extensibility and
    inheritance.
  • Basic outer'' data type is relation, with
    extensible data types in the fields.
  • relational theory applies to outer operations
    only

13
Modern DBMS taxonomy Cotinue
  • Implementation
  • Single-Site (i.e. traditional)
  • Parallel lots of tightly-coupled machines solve
    one query together. A database supercomputer.
  • Distributed geographically distributed machines,
    each "hosting" different data, participate in a
    more loosely coupled manner
Write a Comment
User Comments (0)
About PowerShow.com