Title: CSC228H File Structures and Data Management
1CSC228H - File Structures and Data Management
What we have covered so far...
- The principles behind the permanent storage of
information - Disk organization, access time and the cost of
disk IO - Methods for organizing information within a file
(record and field - structured).
- Simple indexing to provide direct access to
information - Enhanced indexing methods for faster searches on
large files - (B-trees, Btrees, hashing, and extendible
hashing).
2CSC228H - File Structures and Data Management
Introduction to Database Management Systems
- Databases are large accumulations of data, the
data may represent - anything that a user may be interested in but
the information must - be maintained over long periods, and must be
stored in some format - that can be manipulated by a computer.
- Regardless of the contents of the Database, the
information it contains - must be organized in such a way that it can be
accessed (either for - retrieval or for modification) efficiently.
- A Database Management System, is a software
package designed to - administer the contents of a Database,
retrieve, and update information, - organize and store information into physical
files, and ensure that the - contents of the Database are consistent and up
to date.
3CSC228H - File Structures and Data Management
Examples of the use of Databases
- Storing the information required to run a
company - As a knowledge repository (or knowledge base) in
Expert Systems - Storing and managing scientific, literary, and
artistic information - (e.g. medical databases, repositories of
electronic texts, on-line - collections of music, images, sounds, and
video) - Storing and managing information about people
(the Universitys Student - Database, Citizenship Immigration, FBI Most
Wanted). - Helping the user to find and acquire a copy of
his favorite CD.
4CSC228H - File Structures and Data Management
Whats the point?
- Real Databases have to be powerful enough to
deal with millions - of records of information.
- They must do so efficiently
- All through the course we have been working on
ways to make - access to files efficient.
5CSC228H - File Structures and Data Management
A (very general) scheme of a Database
DBMS
USER
- Query Processing - Update management - Access
Strategy Planning - Maintaining consistency -
Making access secure - Physical Storage management
Data files, potentially contain millions
of records
Indexing Methods
6CSC228H - File Structures and Data Management
What we have covered so far...
We have built programs that can process simple
and compound queries, manage information on disk,
and have limited interaction capabilities.
We know how to organize information within files
(record and field structure) and when to use
different data formats (binary vs. text). We know
how files are managed by the Operating
System, and what the costs of I/O are.
We know how to access information efficiently, we
have developed methods for indexing files, either
using simple indexes, or through more complicated
structures such as B-trees or Hash tables. We
have analyzed their performance and have found
that they allow us to access large files
effivciently.
We have interacted with the Users (A.K.A. your
TA and Instructor) who are always asking that
the code perform efficiently, be written nicely,
and wants a nice looking report.
7CSC228H - File Structures and Data Management
What we need to cover...
- Methods for modeling the contents of a Database
- Methods for expressing, analyzing, and
processing queries - How the design of our file structures will
influence the final - performance of our DBMS.
8CSC228H - File Structures and Data Management
Modeling of data within a Database
- We will cover the Relational Model, which is
widely used for defining - the elements within a Database.
- Entity-Relationship diagrams, which are used for
modeling the - elements that make up our database (entities)
and the way they - relate to each other (relationships)
- Relational algebra, which is the base for
expressing queries in - the relational model
- SQL, a widely used language for data definition
and manipulation.
9CSC228H - File Structures and Data Management
Entity-Relationship diagrams
- Models the real world as a set of entities and
relationships. - Entities Are the elements about which the
Database will store - information, they refer to concepts such as
people, things, places, - organizations, or abstract ideas that have
properties for which - data is available and may be stored and
processed. - An entity is a generalization, it describes a
set of items that share - common properties. For example, the entity
student describes - the characteristics that will be kept in our
database for each - particular instance of a student.
10CSC228H - File Structures and Data Management
E-R Diagrams
- Relationships Describe the interactions and
associations between - entities. For example if we have two entities
student, and course, - we can establish the a relatioship between both
as follows - student is enrolled in course
- The is enrolled in phrase describes the
relationships between these - two entities.
- Attributes Are values that describe the
entities to which they belong, - for example last name, first name, and
student are attributes - of the student entity.
11CSC228H - File Structures and Data Management
E-R Diagrams
- Relationships will establish a link between
particular instances of - different entities, they will be used to ensure
that the consistency of - the database is maintained, and that the
correct attributes from each - entity are retrieved and combined during
queries. - Attributes are also used to identify particular
instances of an entity, - an attribute or set of attributes that can be
used to uniquely identify a - particular instance of an entity is called a
key.
12CSC228H - File Structures and Data Management
Classification of relationships
- Degree of a relationship Its the number of
entities associated through - the relationship. The most common type is the
binary relationship - (for example, student is enrolled in course is
a binary relationship) - Connectivity describes how entities map to each
other through the - relationship, there are three possible values
for connectivity - - One to one One instance of entity X is
associated with only one - instance of entity Y (e.g. Each student is
given a single email account). - - One to many One instance of entity X is
associated with one or - many instances of class Y, but each instance
of entity Y is associated - with only one instance of entity X (e.g. Each
college has many - students).
13CSC228H - File Structures and Data Management
Classification of relationships
- Connectivity (continued)
- - Many to many Many instances of entity X are
associated with many - instances of entity Y, and vice versa (e.g.
A student is enrolled in - many courses, A course has many students).
- There are other aspects of the classification,
such as Direction, Type, - and Existence that we will not cover here.
14CSC228H - File Structures and Data Management
Notation
- Please note that there is no official standard
for representing data - in E-R diagrams, and the notation used here may
change in other - domains and applications. For the present, we
will use the following - - We represent entities with rectangles, the
name of the entity is the - label of the rectangle, and it should be a
singular noun. - - Relationships are represented as lines
connecting two entities, - the description of the relationship is
written above the line, and - should contain a verb.
- - Attributes for each entity are listed within
the rectangle, they should - be singular nouns, and key attributes are
underlined
15CSC228H - File Structures and Data Management
Notation (continued)
- Connectivity is represented as the presence or
absence of a crows - foot next to each of the entities in the
relationship. If there is a crows - foot then the connectivity is many on that side
of the relationship, else - it is one.
- Examples
student
Email account
college
student
student name
address quota
is assigned an
name address
has
student name
A one to one relationship between student and
email account.
A one to many relationship between college and
student. Notice the crows foot on the side of
student.
NOTE The relational model does not permit many
to many relationships, these type of
relationships must be modeled as a set of one to
one, and one to many relationships.
16CSC228H - File Structures and Data Management
Fine, so where do we get these entities and
relationships?
- We must perform Requirements Analysis. This
brings us to the field - of Requirements Engineering.
- R.E. is in turn part of the broader discipline
of Software Engineering - which deals with the process of building
software systems given - a multitude of constraints (such as deadlines,
budget limits, technical - constraints, performance and quality
considerations, etc.) - Requirements Engineering deals with the process
of identifying - the users needs, and generating a consistent
and complete list - of requirements that can be used for
implementing a software system - that will satisfy the users needs.
17CSC228H - File Structures and Data Management
Why is this important?
- Before we can answer queries, we need to
appropriately model the - data we shall be using, the modeling of data
requires that we analyze - the domain in which it will be used, the way in
which data is generated, - processed, and used.
- It brings to light an important point Most of
the time, it is not possible to - simply sit down and code a software system.
Interaction with the user - (or customer) will be essential, and will
present its own problems. - You have already done a bit of Software
Engineering for your previous - assignments!
18CSC228H - File Structures and Data Management
Back to the E-R diagrams
- From the analysis of customer requirements, the
application domain, - and any other available source of information
we will determine - - Which entities are to be integrated into the
database, and which are - their corresponding attributes.
- - What relationships exist between entities in
the database - - The type of transactions that will be carried
out on the database - - Rules to keep the contents of the database
consistent and up to - date