Title: CHAPTER 5 MANAGING ORGANIZATIONAL DATA AND INFORMATION
1CHAPTER 5MANAGING ORGANIZATIONAL DATA AND
INFORMATION
2Data Hierarchy and Database Concepts
- Field - a logical grouping of characters into a
word, a small group of words, or a complete
number - Record - a logical grouping of related fields
- File - a logical grouping of related records
- Database - a logical grouping of related files
- Entity - a person, place, thing, or event about
which information is maintained - Attribute - each characteristic or quality
describing a particular entity - Key - field that has some identifying information
- Primary Key - field that uniquely identifies the
record
3Storing and Accessing Records
- Indexed Sequential Access Method (ISAM)
- uses an index of key fields to locate individual
records - index - lists the key field of each record and
where that record is physically located in
storage - track index - shows the highest value of the key
field that can be found on a specific track - Direct File Access Method
- uses the key field to locate the physical address
of a record - transform algorithm - translates the key field
directly into the records storage location on
disk
4Traditional File Environment
4
Introduction to Information Technology Turban,
Rainer and Potter Chapter 4 Computer Software
- The organization has multiple applications with
related data files
Each application has a specific data file related
to it, containing all the data records needed by
the application
5Problems with the File Environment
- data redundancy - the same piece of information
could be duplicated in several places - data inconsistency - the various copies of the
data no longer agree - Violation of data a integrity data isolation -
difficulty in accessing data from different
applications - Violation of application/data independence
6DBMS History
- Generalized Update Access Method (at the
beginning of 60s) - Integrated Data store (at the beginning of 60s)
- GUAM is transformed into Information Management
system (in the middle of 60s) - Data Base Task Group (1965)
7DBMS History (continued)
- Relational data model had been introduced by Codd
(1970) second-generation DBMSs - Third-generation DBMSs Object-Oriented and
Object-Relational - Multidimensional data model
- Hypermedia data model
8Database The Modern Approach
- Database architecture
- Centralized database
- Distributed database
- replicated database - complete copies of the
entire database are delivered to many locations - partitioned databases - these are subdivided, a
portion of the entire database in each location
9Centralized vs. Distributed Databases
User New York
User Los Angeles
Centralized Database
Distributed Database
10Creating a Database
- Conceptual design - description of the database
from the user or business perspective - Logical design abstract model of a database
showing how the data should be structured and
arranged including relationships between data - Physical design - shows the way a database is
actually arranged on storage devices
11Entity-relationship (ER) modeling
- process of planning the database design
- ER diagram graphical presentation of logical
data model - Entity classes ? Instance ? Identifiers ?
Relationships - Degree of a relationship set is a number of
participating entity sets - Binary relationships
- One-to-one (11)
- One-to-many (1N)
- Many-to-many (NN)
12Logical Data Models
- In a logically structuring database, businesses
need to consider the characteristics of the data
and how the data will be accessed. - Three common data models
- Hierarchical,
- Network,
- Relational
13Hierarchical Database Model
- structures data into an inverted tree in which
each record contains two elements rigidly - a single root or master field, often called a
key, which identifies the type location or
ordering of the records - a variable number of subordinate fields,which
defines the rest of the data within a record - all fields have only one parent, each parent
may have many children - Advantage speed and efficiency
- Problem access to data is predefined before the
programs and each relationship must be
explicitly defined when the database is created
14Network Database Model
- creates relationship among data through a
linked-list structure in which subordinate
records (members) can be linked to more than one
data element (owner) - pointer - explicit link, storage addresses that
contain the location of a related record - many-to-many relationships are possible
- complexity for every set of linked data
elements, a pair of pointers must be maintained
15Relational Database Model
- based on a simple concept of tables called
relations in order to capitalize on
characteristics of rows and columns of data - Each table corresponds to Entity class
- Each row (or tuple) contains an instance of a
given Entity a record - A cell contains an attribute value of a
particular instance of entity and is called a
field or data element
16Rules to create a relation
- Each relation (table) must have a unique name
- Each tuple(record) in the relation (table) must
have a primary key - Records may be added or stored in a relation in
any physical order - Links among the data are implicit.
17Basic Data Manipulations (Relational Databases)
- Support retrieving only necessary data
- Selecting eliminating rows according to certain
criteria - Projecting eliminating columns according to
certain criteria - Joining combining columns from two or more
tables
18Normalization
- method for data arrangement to reduce data
redundancy and to maximize data integrity - First Normal Form each field must contain
atomic value - Second Normal Form each record of 1NF table
must be identify uniquely - Third Normal Form a table in 2NF should not
contain extra non-key data related to non-key
data from another table.
19Database Management Systems
- A software program (or group of programs) that
provides access to a databases - Provides mechanisms for maintaining the integrity
of stored information, managing security and user
access, recovering information when the system
fails, and accessing various database functions
form an application
20Logical versus Physical View
- Physical view - deals with the actual, physical
arrangement and location of data in the direct
access storage devices - Logical view - represents data in a format that
is meaningful to a user and to the software
programs that process that data - A schema a description of logical structure of
all data - A subschema a description of logical view for a
user
21DBMS Components
- Data model - defines the way data are structured
- Data definition language (DDL)
- Data manipulation language (DML)
- Data Dictionary
22Data Definition Language
- Describes data (type, length, constrains)
- Relationships among data, keys
- Schema and Subschema are developed based on DDL
23Data Dictionary
- Data definitions are stored in the Data
Dictionary - Data Dictionary provides
- Standard definitions of data and relationships
among them - Assists in designing and programming
- Reduce data redundancy
- Increase data reliability
24Data Manipulation Language
- Collection of instructions and allowable
parameters for data manipulation - DML helps to
- Retrieve data
- Update data
- Add data
- Delete data
25Structured Query Language (SQL)
- Standard DDL and DML to query relational
databases - Modern DBMSs have SQL dialects
- Many of 4GLs support embedded SQL to query
database from applications. -
26SQL (Example)
- SELECT Course code, Cname, Day, Time, Location
- FROM Sections, Courses
- WHERE Courses.Course code
- (SELECT Course Code
- From Sections
- WHERE Instructor EN 12345)
27Trends in DBMS Development
- Object-oriented databases
- Object-relational databases
- Hypermedia databases
- Data warehouses
- Knowledge databases
28Object-oriented database model
- An object - a small amount of data put together
with all the data needed in order to perform an
operation with that data - An object - similar to an entity is an instance
of a Class - Attributes - characteristics that describe the
state of that object - Method - an operation, action, or a behavior the
object may undergo - Messages the form by which objects interact
with each other.
29Object-relational Databases
- Applications require to manipulate with audio,
video and graphical data - Georeferenced databases
- Multimedia databases
30Hypermedia databases
- stores chunks of information in a form of nodes
- Nodes contain multimedia data
- Nodes are connected by links established by the
user
31Knowledge Databases
- Consists of 2 parts
- Database
- Rules
- heuristics to evaluate a situation
- Analyzed the data to produce new knowledge
32Data Warehouses
- A data warehouse is multidimensional database
management system designed to support strategic
management decision making. - The data mart is a subset of data warehouse for a
single aspect of business
33Data WarehousesFramework and View
34Data Warehouses (definition)
- Is a database that contains the following
characteristics - Subject oriented
- Nonvolatile
- Integrated
- Time variant
35Data stored in a data warehouses
- Current detailed data
- Historical Data
- Lightly summarized data
- Highly summarized data
- Metadata
36Metadata
- Describes the content of database
- Is divided into
- Technical metadata
- Business metadata
37Multidimensional Database
- the data are intimately related and can be viewed
and analyzed from different perspectives, which
are called dimensions - core of data warehouses
- data are stored in arrays
- consists of at least three dimensions
38Example
Fall 2001
39Example (continue)
Winter 2002
40Example (continue)
Summer 2002
41Data Mining
- provides a means of extracting previously
unknown, predictive information from the base of
accessible data in data warehouses - discovers hidden patterns, correlations, and
relationships among organizational data - functions of data mining
- Classification forecasting
- clustering association
- sequencing