1. Database design process - PowerPoint PPT Presentation

1 / 48
About This Presentation
Title:

1. Database design process

Description:

Contextual design: designers get immersed in the workplace. ... apply in a top-down fashion, starting from high-level output and processes. Teuhola AdvDB-1 ... – PowerPoint PPT presentation

Number of Views:43
Avg rating:3.0/5.0
Slides: 49
Provided by: jukkat
Category:

less

Transcript and Presenter's Notes

Title: 1. Database design process


1
1. Database design process
  • Assumed environment
  • Large database (big volume and large number of
    data types)
  • Multiple users (various requirements and views)
  • Transaction processing (e.g. banking, insurance,
    travel agency, etc.)
  • Nonstop operation

2
Organizational context
  • Information resource management (IRM)
  • Data is a corporate resource key to successful
    management
  • More and more functions are computerizedlarge
    volumes of data must be available
  • Complexity of data grows
  • Tendency to consolidate information resources

3
Valuable features of db systems
  • Data independence immunity to physical changes
  • Different external views to the same data
  • Integration of multiple application areas
  • High-level interfaces (like SQL) make it simple
    to develop new applications.
  • Ad-hoc queries for casual userseffective
    routine transactions

4
New trend decentralization
  • Personal computers and database-like software
    (Excel, Access)
  • Distributed and client-server DBMSsfaster local
    processing
  • Web databases
  • Information repositories (data dictionary
    systems) manage the overall metadata(schemas,
    storage structures, access paths, users,
    transactions, statistics)

5
Information system life cycle (macro life cycle)
  • 1. Feasibility analysis
  • 2. Requirements collection and analysis
  • 3. Design
  • 4. Implementation
  • 5. Validation and testing
  • 6. Operation and maintenance

6
Database system life cycle(micro life cycle)
  • 1. System definition 5. Application conversion
  • 2. Design 6. Testing and validation
  • 3. Implementation 7. Operation
  • 4. Loading of data 8. Monitoring and
    maintenance

7
Database design goals
  • Satisfy user needs
  • Support processing requirements
  • Natural structuring of information
  • Satisfy restrictions on
  • response time
  • processing time
  • storage space

8
Main phases of database design
  • 1. Requirements analysis
  • 2. Conceptual database design
  • 3. Choice of DBMS
  • 4. Data model mapping
  • 5. Physical database design
  • 6. Implementation and tuning

9
Important levels
  • Conceptual designIndependent of DBMS and data
    model,performance is not an issue
  • Logical designIndependent of DBMS, but the data
    model to be used in implementation is decided
  • Physical designTailoring to a specific DBMS,
    optimization of performance.

10
Dichotomy
  • (a) Data content and structure
  • (b) Data processing and applications
  • Closely intertwined
  • Different methdologies emphasize one or the other
    (data-driven vs. process-driven database design).

11
Phase 1 Requirements collection and analysis
  • Identify users and applications
  • Study existing documents (reports, forms,
    manuals)
  • Analyse transaction types, frequencies and flow
    of information
  • Find out users desires and priorities
  • Results are represented with text, diagrams,
    tables, charts, formal specification methods.

12
Viewpoints on requirements
  • Requirements collection is a crucial step for the
    success of database design.
  • Initial requirements often incomplete.
  • Data content should include both current and
    potential future needs.
  • Qualitative requirements guide logical design.
  • Quantitative requirements guide physical design.

13
User participation in requirements collection
  • Increases customer satisfaction
  • JAD Joint Application Design
  • Contextual design designers get immersed in the
    workplace.
  • Workflow scenarios
  • Quick prototypes

14
Processing needs
  • processes and their dependencies
  • internal structure of processes
  • required input data
  • ways of processing
  • quantities of data to be processed
  • execution frequencies
  • triggering conditions
  • inter-process communication and parallelism

15
Techniques for requirements collection
  • Top-down Hierarchical decomposition
  • Bottom-up Combine existing processes and tasks
    stepwise into larger subsystems
  • Backward-forward From known results, derive the
    processes and input data -apply in a top-down
    fashion, starting from high-level output and
    processes.

16
Results of requirements collection
  • List of data elements and their definitions
  • Description of processes and work
    tasks(frequencies, target data types, data
    volumes)
  • List of steering and planning functions
  • High-level guidelines and policies(connections
    within the organization)
  • Anticipated changes and their effects on processes

17
Automated tools for requirements collection
  • Mainly for documentation
  • CASE Computer Aided Software (Systems)
    Engineering
  • Data dictionary Repository of all
    meta-information of the enterprise larger
    thanthe database system catalog.

18
Phase 2 Conceptual db design
  • (a) Conceptual schema design
  • Entities
  • Attributes
  • Relationships
  • Generalization/specialization hierarchies
  • (b) Conceptual transaction design
  • Important transaction types (retrieval/update)
  • Characteristics (access pattern, control flow,
    I/O, frequency, response time limit)

19
Conceptual schema characteristics
  • DBMS-independent
  • High-level description of semantic content
  • Expressive
  • Simple (understandable)
  • Minimal (no overlaps)
  • Diagrammatic representation
  • Formal (accurate, unambiguous)

20
Specification of entities
  • Categories
  • concrete, functional, and abstract entities
  • Information to be recorded
  • name, synonyms, description
  • estimated number of occurrences
  • integrity constraints
  • entity life cycle
  • usage authorisation

21
Specification of attributes
  • Categories
  • identifiers, descriptors
  • Information to be recorded
  • name, synonyms, description, purpose of use
  • related entity/relationship
  • form and structure
  • dependency on other attributes
  • life cycle rules of change
  • usage authorisation

22
Specification of relationships
  • Categories
  • Structural, ordinal, containment
  • Information to be recorded
  • name, synonyms, description, life cycle
  • degree component entity types and their roles
  • cardinality ratios totality/partiality
  • Attributes
  • Integrity constraints
  • Usage authority

23
Specification of specialisation/generalisation
hierarchies
  • As general classes as possible
  • A subclass should possess clearly distinctive
    attributes (relationships)
  • Common attributes as high as possible
  • Define a subclass if its key differs from that of
    the superclass
  • Try to achieve total relationships (minimum ratio
    gt 0).

24
Schema design strategies
  • 1) Centralized (one-shot) approach
  • 2) Schema integration piecewise development
  • Incremental strategies
  • 1) Top-down start with high-level abstractions
  • 2) Bottom-up start with basic abstractions
  • 3) Inside-out Start with the central concepts
  • 4) Mixed strategy

25
Schema (view) integration
  • Develop first subschemas of different subsystems,
    and then integrate them.
  • Find the semantically best compromise
  • Software tools help, but automatic integration
    not possible
  • No performance optimization
  • Controlled redundancy

26
Reasons for differing subschemas
  • Different user views
  • Defective conceptions about state of affairs
  • Relativism Alternative (valid) ways of
    describing the same thing
  • Errors may occur, leading to inconsistencies
  • Naming is not unique

27
Conflicts between views
  • Levels of correspondence
  • Identical, equivalent, compatible, incompatible
  • Examples of conflicts
  • Entity conflicts with entity attribute
  • Entity conflicts with relationship attribute
  • Entity conflicts with relationship

28
View integration methodology
  • 1) Preparation of integrationDecide the order,
    gather extra information
  • 2) Find correspondences and conflictsName,
    type, domain, constraint conflicts
  • 3) Solve the conflictsFind the best compromises
  • 4) Merge the compatible views
  • 5) Tune/restructure

29
Evaluation of integration result
  • Completeness
  • The result is a superset of the views
  • Minimality
  • The same concept only once
  • Transitive relationships and generalizations
    eliminated
  • Understandability
  • The best naming and modelling alternatives

30
Phase 2b Transaction design
  • Parallel with conceptual schema design
  • The important transactions must be known
  • Dictate the schema contents
  • Specification functional behavior and I/O
  • Process modelling tools becoming popular (UML
    state transition diagrams, activity diagrams,
    sequence diagrams, collaboration diagrams)

31
Phase 3 Choice of DBMS
  • Technical factors
  • Type of DBMS, storage structures, access paths,
    user interfaces, API, query languages, browsers,
    report generators, graphics, etc.
  • Economic and organizational factors
  • Software, hardware, maintenance costs
  • Database creation/conversion costs
  • Personnel, training, operating costs

32
Phase 4 Logical database design
  • Two stages
  • DBMS-independent mapping EER schema ?
    logical schema
  • Tailoring the logical schema into the special
    features of the selected DBMS
  • Result
  • DDL (Data Definition Language) descriptions of
    the logical and external database schemas

33
Notes on logical design
  • Can be (partially) automated
  • Additional constraints are often needed to
    represent the full semantics of concepts
  • For relational databases, the mapping result is
    usually quite well normalised.But Functional
    dependencies between attributes should be
    detected (? BCNF).Also multi-valued dependencies
    may be found, resulting in further normalisation.

34
Phase 5 Physical database design
  • Physical internal storage schema
  • Optimisation problem
  • Basis transactions.
  • Target function e.g. overall weighted processing
    time, or number of I/O operations.
  • Additional constraints for response time
  • Combinatory optimization hard to solve
  • No general methodology exists.

35
Approaches to physical design
  • Analytical solution
  • For each alternative storage structure, construct
    a model ( cost formula), to find the best ways
    to execute the processing tasks.
  • Experimental solution
  • With an existing DBMS, test the performance of
    different storage alternatives. A query optimiser
    may be utilised to avoid loading db.

36
DBMS features affecting the physical design
  • 1. Storage structures and access methods
  • 2. Query optimization
  • 3. Locking method
  • 4. Catalog management
  • 5. Buffer management
  • 6. Logging costs
  • 7. Other factors hardware, operating system,
    network, etc.

37
Application features affecting the physical
database design
  • Data contents
  • Primary/candidate keys (? unique index)
  • Queries and transactions
  • Target relations and attributes
  • Search key (selection)
  • Type of operation (selection, join, ...)
  • Frequency of invocation
  • Time constraints

38
Decisions about indexing
  • Which attributes to index?(Logical keys access
    keys)
  • What multi-attribute indexes to create?
  • Should an index be clustering or not?(Only one
    indexper relation can be clustering)
  • Type of index? (B-tree, hash)
  • Dynamic hash? (No explicit index)

39
Heuristic index selection algor.
  • 1. Assign indexes for all attributes.
  • 2. Randomize the order of relations.
  • 3. Find the best clustering attribute for each
    relation.
  • 4. Store the current solution.
  • 5. Repeat 2-4 a few times and take the best
    solution.
  • 6. Randomize the order of relations.
  • 7. Test dropping one index at a time (accept if
    profit).
  • 8. Store the current solution.
  • 9. Repeat 6-8 a few times and take the best
    solution.
  • 10. Repeat 1-9 a few times and take the best
    solution.

40
Other physical design decisions
  • Page size (if not fixed).
  • Different types of clustering.
  • Horizontal partitioning
  • Vertical partitioning
  • Replication
  • Denormalization
  • Allocation of relations and indexes to different
    disk units declustering a single relation on
    several disk units.

41
Phase 6 Implementation and
tuning
  • Responsibility of DBA (db administrator)
  • DDL SDL statements are compiled and executed
  • Loading of database Enter new values or convert
    from the old (conversion routines need to be
    written)
  • Transactions are programmed using DML
  • Database usage can start.

42
Goals of database tuning
  • Make applications run faster
  • Lower the response times
  • Improve overall throughput of transactions
  • Avoid excessive lock contention
  • Minimize logging overhead
  • Optimize buffer size
  • Optimize scheduling of processes
  • Allocate resources efficiently

43
Basis of tuning Statistics
  • Sizes of tables
  • Distinct values per table column
  • Execution frequencies of transactions
  • Storage allocation statistics
  • Page I/O of disks hot spots
  • Locking/logging rates
  • Number of levels and leaf pages in indexes

44
Tuning of indexes
  • Possible problems
  • Some queries lack an index
  • Some indexes are not utilized
  • Some indexes are subject to frequent updates
  • Options to change indexing
  • Drop/create an index dynamically
  • Rebuild an existing (deteriorated) index
  • Switch between dense/non-dense index

45
Tuning of logical schema
  • Denormalization Put together attributes which
    are frequently needed together.
  • Alternative 3NF/BCNF solutions
  • Vertical partitioning Separate frequently used
    attributes from seldom used ones
  • Replication of attributes
  • Horizontal partitioning Subsets of tuples based
    on some classifying attribute.

46
Tuning of queries
  • Ensure that existing indexes are really used
  • Arithmetic expressions, substring comparisons and
    nested queries may prevent index usage.
  • Rephrase complex WHERE-conditions so that indexes
    can be better utilized.
  • Test replacing OR-conditions with UNION.
  • Avoid unnecessary DISTINCT keywords(causing
    duplicate elimination by sorting)
  • Choose join conditions to use clustering index

47
Tuning of queries (cont.)
  • Avoid unnecessary temporary result tables.
  • Use temporary tables to prevent repeated
    evaluation of nested subqueries.
  • Try replacing nested (IN, ALL, ANY, SOME)
    queries by unnested ones (joins)
  • Avoid excessive use of views.
  • Transform NOT-conditions by positive expressions,
    if possible.

48
Automated design tools
  • Typical CASE-tool facilities
  • Diagramming (e.g. EER)
  • Model mapping (EER ? Rel. DDL)
  • Normalization
  • Desirable characteristics
  • Simple interface
  • Both analytical and heuristic tools
  • Trade-off analysis, design verification
  • Visual design results
Write a Comment
User Comments (0)
About PowerShow.com