Title: 1. Database design process
11. Database design process
- Assumed environment
- Large database (big volume and large number of
data types) - Multiple users (various requirements and views)
- Transaction processing (e.g. banking, insurance,
travel agency, etc.) - Nonstop operation
2Organizational context
- Information resource management (IRM)
- Data is a corporate resource key to successful
management - More and more functions are computerizedlarge
volumes of data must be available - Complexity of data grows
- Tendency to consolidate information resources
3Valuable features of db systems
- Data independence immunity to physical changes
- Different external views to the same data
- Integration of multiple application areas
- High-level interfaces (like SQL) make it simple
to develop new applications. - Ad-hoc queries for casual userseffective
routine transactions
4New trend decentralization
- Personal computers and database-like software
(Excel, Access) - Distributed and client-server DBMSsfaster local
processing - Web databases
- Information repositories (data dictionary
systems) manage the overall metadata(schemas,
storage structures, access paths, users,
transactions, statistics)
5Information system life cycle (macro life cycle)
-
- 1. Feasibility analysis
- 2. Requirements collection and analysis
- 3. Design
- 4. Implementation
- 5. Validation and testing
- 6. Operation and maintenance
6Database system life cycle(micro life cycle)
- 1. System definition 5. Application conversion
- 2. Design 6. Testing and validation
- 3. Implementation 7. Operation
- 4. Loading of data 8. Monitoring and
maintenance
7Database design goals
- Satisfy user needs
- Support processing requirements
- Natural structuring of information
- Satisfy restrictions on
- response time
- processing time
- storage space
8Main phases of database design
- 1. Requirements analysis
- 2. Conceptual database design
- 3. Choice of DBMS
- 4. Data model mapping
- 5. Physical database design
- 6. Implementation and tuning
9Important levels
- Conceptual designIndependent of DBMS and data
model,performance is not an issue - Logical designIndependent of DBMS, but the data
model to be used in implementation is decided - Physical designTailoring to a specific DBMS,
optimization of performance.
10Dichotomy
- (a) Data content and structure
- (b) Data processing and applications
- Closely intertwined
- Different methdologies emphasize one or the other
(data-driven vs. process-driven database design).
11Phase 1 Requirements collection and analysis
- Identify users and applications
- Study existing documents (reports, forms,
manuals) - Analyse transaction types, frequencies and flow
of information - Find out users desires and priorities
- Results are represented with text, diagrams,
tables, charts, formal specification methods.
12Viewpoints on requirements
- Requirements collection is a crucial step for the
success of database design. - Initial requirements often incomplete.
- Data content should include both current and
potential future needs. - Qualitative requirements guide logical design.
- Quantitative requirements guide physical design.
13User participation in requirements collection
- Increases customer satisfaction
- JAD Joint Application Design
- Contextual design designers get immersed in the
workplace. - Workflow scenarios
- Quick prototypes
14Processing needs
- processes and their dependencies
- internal structure of processes
- required input data
- ways of processing
- quantities of data to be processed
- execution frequencies
- triggering conditions
- inter-process communication and parallelism
15Techniques for requirements collection
- Top-down Hierarchical decomposition
- Bottom-up Combine existing processes and tasks
stepwise into larger subsystems - Backward-forward From known results, derive the
processes and input data -apply in a top-down
fashion, starting from high-level output and
processes.
16Results of requirements collection
- List of data elements and their definitions
- Description of processes and work
tasks(frequencies, target data types, data
volumes) - List of steering and planning functions
- High-level guidelines and policies(connections
within the organization) - Anticipated changes and their effects on processes
17Automated tools for requirements collection
- Mainly for documentation
- CASE Computer Aided Software (Systems)
Engineering - Data dictionary Repository of all
meta-information of the enterprise larger
thanthe database system catalog.
18Phase 2 Conceptual db design
- (a) Conceptual schema design
- Entities
- Attributes
- Relationships
- Generalization/specialization hierarchies
- (b) Conceptual transaction design
- Important transaction types (retrieval/update)
- Characteristics (access pattern, control flow,
I/O, frequency, response time limit)
19Conceptual schema characteristics
- DBMS-independent
- High-level description of semantic content
- Expressive
- Simple (understandable)
- Minimal (no overlaps)
- Diagrammatic representation
- Formal (accurate, unambiguous)
20Specification of entities
- Categories
- concrete, functional, and abstract entities
- Information to be recorded
- name, synonyms, description
- estimated number of occurrences
- integrity constraints
- entity life cycle
- usage authorisation
21Specification of attributes
- Categories
- identifiers, descriptors
- Information to be recorded
- name, synonyms, description, purpose of use
- related entity/relationship
- form and structure
- dependency on other attributes
- life cycle rules of change
- usage authorisation
22Specification of relationships
- Categories
- Structural, ordinal, containment
- Information to be recorded
- name, synonyms, description, life cycle
- degree component entity types and their roles
- cardinality ratios totality/partiality
- Attributes
- Integrity constraints
- Usage authority
23Specification of specialisation/generalisation
hierarchies
- As general classes as possible
- A subclass should possess clearly distinctive
attributes (relationships) - Common attributes as high as possible
- Define a subclass if its key differs from that of
the superclass - Try to achieve total relationships (minimum ratio
gt 0).
24Schema design strategies
- 1) Centralized (one-shot) approach
- 2) Schema integration piecewise development
- Incremental strategies
- 1) Top-down start with high-level abstractions
- 2) Bottom-up start with basic abstractions
- 3) Inside-out Start with the central concepts
- 4) Mixed strategy
25Schema (view) integration
- Develop first subschemas of different subsystems,
and then integrate them. - Find the semantically best compromise
- Software tools help, but automatic integration
not possible - No performance optimization
- Controlled redundancy
26Reasons for differing subschemas
- Different user views
- Defective conceptions about state of affairs
- Relativism Alternative (valid) ways of
describing the same thing - Errors may occur, leading to inconsistencies
- Naming is not unique
27Conflicts between views
- Levels of correspondence
- Identical, equivalent, compatible, incompatible
- Examples of conflicts
- Entity conflicts with entity attribute
- Entity conflicts with relationship attribute
- Entity conflicts with relationship
28View integration methodology
- 1) Preparation of integrationDecide the order,
gather extra information - 2) Find correspondences and conflictsName,
type, domain, constraint conflicts - 3) Solve the conflictsFind the best compromises
- 4) Merge the compatible views
- 5) Tune/restructure
29Evaluation of integration result
- Completeness
- The result is a superset of the views
- Minimality
- The same concept only once
- Transitive relationships and generalizations
eliminated - Understandability
- The best naming and modelling alternatives
30Phase 2b Transaction design
- Parallel with conceptual schema design
- The important transactions must be known
- Dictate the schema contents
- Specification functional behavior and I/O
- Process modelling tools becoming popular (UML
state transition diagrams, activity diagrams,
sequence diagrams, collaboration diagrams)
31Phase 3 Choice of DBMS
- Technical factors
- Type of DBMS, storage structures, access paths,
user interfaces, API, query languages, browsers,
report generators, graphics, etc. - Economic and organizational factors
- Software, hardware, maintenance costs
- Database creation/conversion costs
- Personnel, training, operating costs
32Phase 4 Logical database design
- Two stages
- DBMS-independent mapping EER schema ?
logical schema - Tailoring the logical schema into the special
features of the selected DBMS - Result
- DDL (Data Definition Language) descriptions of
the logical and external database schemas
33Notes on logical design
- Can be (partially) automated
- Additional constraints are often needed to
represent the full semantics of concepts - For relational databases, the mapping result is
usually quite well normalised.But Functional
dependencies between attributes should be
detected (? BCNF).Also multi-valued dependencies
may be found, resulting in further normalisation.
34Phase 5 Physical database design
- Physical internal storage schema
- Optimisation problem
- Basis transactions.
- Target function e.g. overall weighted processing
time, or number of I/O operations. - Additional constraints for response time
- Combinatory optimization hard to solve
- No general methodology exists.
35Approaches to physical design
- Analytical solution
- For each alternative storage structure, construct
a model ( cost formula), to find the best ways
to execute the processing tasks. - Experimental solution
- With an existing DBMS, test the performance of
different storage alternatives. A query optimiser
may be utilised to avoid loading db.
36DBMS features affecting the physical design
- 1. Storage structures and access methods
- 2. Query optimization
- 3. Locking method
- 4. Catalog management
- 5. Buffer management
- 6. Logging costs
- 7. Other factors hardware, operating system,
network, etc.
37Application features affecting the physical
database design
- Data contents
- Primary/candidate keys (? unique index)
- Queries and transactions
- Target relations and attributes
- Search key (selection)
- Type of operation (selection, join, ...)
- Frequency of invocation
- Time constraints
38Decisions about indexing
- Which attributes to index?(Logical keys access
keys) - What multi-attribute indexes to create?
- Should an index be clustering or not?(Only one
indexper relation can be clustering) - Type of index? (B-tree, hash)
- Dynamic hash? (No explicit index)
39Heuristic index selection algor.
- 1. Assign indexes for all attributes.
- 2. Randomize the order of relations.
- 3. Find the best clustering attribute for each
relation. - 4. Store the current solution.
- 5. Repeat 2-4 a few times and take the best
solution. - 6. Randomize the order of relations.
- 7. Test dropping one index at a time (accept if
profit). - 8. Store the current solution.
- 9. Repeat 6-8 a few times and take the best
solution. - 10. Repeat 1-9 a few times and take the best
solution.
40Other physical design decisions
- Page size (if not fixed).
- Different types of clustering.
- Horizontal partitioning
- Vertical partitioning
- Replication
- Denormalization
- Allocation of relations and indexes to different
disk units declustering a single relation on
several disk units.
41Phase 6 Implementation and
tuning
- Responsibility of DBA (db administrator)
- DDL SDL statements are compiled and executed
- Loading of database Enter new values or convert
from the old (conversion routines need to be
written) - Transactions are programmed using DML
- Database usage can start.
42Goals of database tuning
- Make applications run faster
- Lower the response times
- Improve overall throughput of transactions
- Avoid excessive lock contention
- Minimize logging overhead
- Optimize buffer size
- Optimize scheduling of processes
- Allocate resources efficiently
43Basis of tuning Statistics
- Sizes of tables
- Distinct values per table column
- Execution frequencies of transactions
- Storage allocation statistics
- Page I/O of disks hot spots
- Locking/logging rates
- Number of levels and leaf pages in indexes
44Tuning of indexes
- Possible problems
- Some queries lack an index
- Some indexes are not utilized
- Some indexes are subject to frequent updates
- Options to change indexing
- Drop/create an index dynamically
- Rebuild an existing (deteriorated) index
- Switch between dense/non-dense index
45Tuning of logical schema
- Denormalization Put together attributes which
are frequently needed together. - Alternative 3NF/BCNF solutions
- Vertical partitioning Separate frequently used
attributes from seldom used ones - Replication of attributes
- Horizontal partitioning Subsets of tuples based
on some classifying attribute.
46Tuning of queries
- Ensure that existing indexes are really used
- Arithmetic expressions, substring comparisons and
nested queries may prevent index usage. - Rephrase complex WHERE-conditions so that indexes
can be better utilized. - Test replacing OR-conditions with UNION.
- Avoid unnecessary DISTINCT keywords(causing
duplicate elimination by sorting) - Choose join conditions to use clustering index
47Tuning of queries (cont.)
- Avoid unnecessary temporary result tables.
- Use temporary tables to prevent repeated
evaluation of nested subqueries. - Try replacing nested (IN, ALL, ANY, SOME)
queries by unnested ones (joins) - Avoid excessive use of views.
- Transform NOT-conditions by positive expressions,
if possible.
48Automated design tools
- Typical CASE-tool facilities
- Diagramming (e.g. EER)
- Model mapping (EER ? Rel. DDL)
- Normalization
- Desirable characteristics
- Simple interface
- Both analytical and heuristic tools
- Trade-off analysis, design verification
- Visual design results