1. Database design process - PowerPoint PPT Presentation

1 / 48

About This Presentation

Title:

1. Database design process

Description:

Contextual design: designers get immersed in the workplace. ... apply in a top-down fashion, starting from high-level output and processes. Teuhola AdvDB-1 ... – PowerPoint PPT presentation

Number of Views:43

Avg rating:3.0/5.0

Slides: 49

Provided by: jukkat

Category:

more less

Transcript and Presenter's Notes

Title: 1. Database design process

1
1. Database design process

Assumed environment
Large database (big volume and large number of
data types)
Multiple users (various requirements and views)
Transaction processing (e.g. banking, insurance,
travel agency, etc.)
Nonstop operation

2
Organizational context

Information resource management (IRM)
Data is a corporate resource key to successful
management
More and more functions are computerizedlarge
volumes of data must be available
Complexity of data grows
Tendency to consolidate information resources

3
Valuable features of db systems

Data independence immunity to physical changes
Different external views to the same data
Integration of multiple application areas
High-level interfaces (like SQL) make it simple
to develop new applications.
Ad-hoc queries for casual userseffective
routine transactions

4
New trend decentralization

Personal computers and database-like software
(Excel, Access)
Distributed and client-server DBMSsfaster local
processing
Web databases
Information repositories (data dictionary
systems) manage the overall metadata(schemas,
storage structures, access paths, users,
transactions, statistics)

5
Information system life cycle (macro life cycle)

1. Feasibility analysis
2. Requirements collection and analysis
3. Design
4. Implementation
5. Validation and testing
6. Operation and maintenance

6
Database system life cycle(micro life cycle)

1. System definition 5. Application conversion
2. Design 6. Testing and validation
3. Implementation 7. Operation
4. Loading of data 8. Monitoring and
maintenance

7
Database design goals

Satisfy user needs
Support processing requirements
Natural structuring of information
Satisfy restrictions on
response time
processing time
storage space

8
Main phases of database design

1. Requirements analysis
2. Conceptual database design
3. Choice of DBMS
4. Data model mapping
5. Physical database design
6. Implementation and tuning

9
Important levels

Conceptual designIndependent of DBMS and data
model,performance is not an issue
Logical designIndependent of DBMS, but the data
model to be used in implementation is decided
Physical designTailoring to a specific DBMS,
optimization of performance.

10
Dichotomy

(a) Data content and structure
(b) Data processing and applications
Closely intertwined
Different methdologies emphasize one or the other
(data-driven vs. process-driven database design).

11
Phase 1 Requirements collection and analysis

Identify users and applications
Study existing documents (reports, forms,
manuals)
Analyse transaction types, frequencies and flow
of information
Find out users desires and priorities
Results are represented with text, diagrams,
tables, charts, formal specification methods.

12
Viewpoints on requirements

Requirements collection is a crucial step for the
success of database design.
Initial requirements often incomplete.
Data content should include both current and
potential future needs.
Qualitative requirements guide logical design.
Quantitative requirements guide physical design.

13
User participation in requirements collection

Increases customer satisfaction
JAD Joint Application Design
Contextual design designers get immersed in the
workplace.
Workflow scenarios
Quick prototypes

14
Processing needs

processes and their dependencies
internal structure of processes
required input data
ways of processing
quantities of data to be processed
execution frequencies
triggering conditions
inter-process communication and parallelism

15
Techniques for requirements collection

Top-down Hierarchical decomposition
Bottom-up Combine existing processes and tasks
stepwise into larger subsystems
Backward-forward From known results, derive the
processes and input data -apply in a top-down
fashion, starting from high-level output and
processes.

16
Results of requirements collection

List of data elements and their definitions
Description of processes and work
tasks(frequencies, target data types, data
volumes)
List of steering and planning functions
High-level guidelines and policies(connections
within the organization)
Anticipated changes and their effects on processes

17
Automated tools for requirements collection

Mainly for documentation
CASE Computer Aided Software (Systems)
Engineering
Data dictionary Repository of all
meta-information of the enterprise larger
thanthe database system catalog.

18
Phase 2 Conceptual db design

(a) Conceptual schema design
Entities
Attributes
Relationships
Generalization/specialization hierarchies
(b) Conceptual transaction design
Important transaction types (retrieval/update)
Characteristics (access pattern, control flow,
I/O, frequency, response time limit)

19
Conceptual schema characteristics

DBMS-independent
High-level description of semantic content
Expressive
Simple (understandable)
Minimal (no overlaps)
Diagrammatic representation
Formal (accurate, unambiguous)

20
Specification of entities

Categories
concrete, functional, and abstract entities
Information to be recorded
name, synonyms, description
estimated number of occurrences
integrity constraints
entity life cycle
usage authorisation

21
Specification of attributes

Categories
identifiers, descriptors
Information to be recorded
name, synonyms, description, purpose of use
related entity/relationship
form and structure
dependency on other attributes
life cycle rules of change
usage authorisation

22
Specification of relationships

Categories
Structural, ordinal, containment
Information to be recorded
name, synonyms, description, life cycle
degree component entity types and their roles
cardinality ratios totality/partiality
Attributes
Integrity constraints
Usage authority

23
Specification of specialisation/generalisation
hierarchies

As general classes as possible
A subclass should possess clearly distinctive
attributes (relationships)
Common attributes as high as possible
Define a subclass if its key differs from that of
the superclass
Try to achieve total relationships (minimum ratio
gt 0).

24
Schema design strategies

1) Centralized (one-shot) approach
2) Schema integration piecewise development
Incremental strategies
1) Top-down start with high-level abstractions
2) Bottom-up start with basic abstractions
3) Inside-out Start with the central concepts
4) Mixed strategy

25
Schema (view) integration

Develop first subschemas of different subsystems,
and then integrate them.
Find the semantically best compromise
Software tools help, but automatic integration
not possible
No performance optimization
Controlled redundancy

26
Reasons for differing subschemas

Different user views
Defective conceptions about state of affairs
Relativism Alternative (valid) ways of
describing the same thing
Errors may occur, leading to inconsistencies
Naming is not unique

27
Conflicts between views

Levels of correspondence
Identical, equivalent, compatible, incompatible
Examples of conflicts
Entity conflicts with entity attribute
Entity conflicts with relationship attribute
Entity conflicts with relationship

28
View integration methodology

1) Preparation of integrationDecide the order,
gather extra information
2) Find correspondences and conflictsName,
type, domain, constraint conflicts
3) Solve the conflictsFind the best compromises
4) Merge the compatible views
5) Tune/restructure

29
Evaluation of integration result

Completeness
The result is a superset of the views
Minimality
The same concept only once
Transitive relationships and generalizations
eliminated
Understandability
The best naming and modelling alternatives

30
Phase 2b Transaction design

Parallel with conceptual schema design
The important transactions must be known
Dictate the schema contents
Specification functional behavior and I/O
Process modelling tools becoming popular (UML
state transition diagrams, activity diagrams,
sequence diagrams, collaboration diagrams)

31
Phase 3 Choice of DBMS

Technical factors
Type of DBMS, storage structures, access paths,
user interfaces, API, query languages, browsers,
report generators, graphics, etc.
Economic and organizational factors
Software, hardware, maintenance costs
Database creation/conversion costs
Personnel, training, operating costs

32
Phase 4 Logical database design

Two stages
DBMS-independent mapping EER schema ?
logical schema
Tailoring the logical schema into the special
features of the selected DBMS
Result
DDL (Data Definition Language) descriptions of
the logical and external database schemas

33
Notes on logical design

Can be (partially) automated
Additional constraints are often needed to
represent the full semantics of concepts
For relational databases, the mapping result is
usually quite well normalised.But Functional
dependencies between attributes should be
detected (? BCNF).Also multi-valued dependencies
may be found, resulting in further normalisation.

34
Phase 5 Physical database design

Physical internal storage schema
Optimisation problem
Basis transactions.
Target function e.g. overall weighted processing
time, or number of I/O operations.
Additional constraints for response time
Combinatory optimization hard to solve
No general methodology exists.

35
Approaches to physical design

Analytical solution
For each alternative storage structure, construct
a model ( cost formula), to find the best ways
to execute the processing tasks.
Experimental solution
With an existing DBMS, test the performance of
different storage alternatives. A query optimiser
may be utilised to avoid loading db.

36
DBMS features affecting the physical design

1. Storage structures and access methods
2. Query optimization
3. Locking method
4. Catalog management
5. Buffer management
6. Logging costs
7. Other factors hardware, operating system,
network, etc.

37
Application features affecting the physical
database design

Data contents
Primary/candidate keys (? unique index)
Queries and transactions
Target relations and attributes
Search key (selection)
Type of operation (selection, join, ...)
Frequency of invocation
Time constraints

38
Decisions about indexing

Which attributes to index?(Logical keys access
keys)
What multi-attribute indexes to create?
Should an index be clustering or not?(Only one
indexper relation can be clustering)
Type of index? (B-tree, hash)
Dynamic hash? (No explicit index)

39
Heuristic index selection algor.

1. Assign indexes for all attributes.
2. Randomize the order of relations.
3. Find the best clustering attribute for each
relation.
4. Store the current solution.
5. Repeat 2-4 a few times and take the best
solution.
6. Randomize the order of relations.
7. Test dropping one index at a time (accept if
profit).
8. Store the current solution.
9. Repeat 6-8 a few times and take the best
solution.
10. Repeat 1-9 a few times and take the best
solution.

40
Other physical design decisions

Page size (if not fixed).
Different types of clustering.
Horizontal partitioning
Vertical partitioning
Replication
Denormalization
Allocation of relations and indexes to different
disk units declustering a single relation on
several disk units.

41
Phase 6 Implementation and
tuning

Responsibility of DBA (db administrator)
DDL SDL statements are compiled and executed
Loading of database Enter new values or convert
from the old (conversion routines need to be
written)
Transactions are programmed using DML
Database usage can start.

42
Goals of database tuning

Make applications run faster
Lower the response times
Improve overall throughput of transactions
Avoid excessive lock contention
Minimize logging overhead
Optimize buffer size
Optimize scheduling of processes
Allocate resources efficiently

43
Basis of tuning Statistics

Sizes of tables
Distinct values per table column
Execution frequencies of transactions
Storage allocation statistics
Page I/O of disks hot spots
Locking/logging rates
Number of levels and leaf pages in indexes

44
Tuning of indexes

Possible problems
Some queries lack an index
Some indexes are not utilized
Some indexes are subject to frequent updates
Options to change indexing
Drop/create an index dynamically
Rebuild an existing (deteriorated) index
Switch between dense/non-dense index

45
Tuning of logical schema

Denormalization Put together attributes which
are frequently needed together.
Alternative 3NF/BCNF solutions
Vertical partitioning Separate frequently used
attributes from seldom used ones
Replication of attributes
Horizontal partitioning Subsets of tuples based
on some classifying attribute.

46
Tuning of queries

Ensure that existing indexes are really used
Arithmetic expressions, substring comparisons and
nested queries may prevent index usage.
Rephrase complex WHERE-conditions so that indexes
can be better utilized.
Test replacing OR-conditions with UNION.
Avoid unnecessary DISTINCT keywords(causing
duplicate elimination by sorting)
Choose join conditions to use clustering index

47
Tuning of queries (cont.)