CSC228H File Structures and Data Management - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

CSC228H File Structures and Data Management

Description:

collections of music, images, sounds, and video) ... organizations, or abstract ideas that have properties for which ... common properties. For example, the ... – PowerPoint PPT presentation

Number of Views:53
Avg rating:3.0/5.0
Slides: 19
Provided by: sciencela
Category:

less

Transcript and Presenter's Notes

Title: CSC228H File Structures and Data Management


1
CSC228H - File Structures and Data Management
What we have covered so far...
  • The principles behind the permanent storage of
    information
  • Disk organization, access time and the cost of
    disk IO
  • Methods for organizing information within a file
    (record and field
  • structured).
  • Simple indexing to provide direct access to
    information
  • Enhanced indexing methods for faster searches on
    large files
  • (B-trees, Btrees, hashing, and extendible
    hashing).

2
CSC228H - File Structures and Data Management
Introduction to Database Management Systems
  • Databases are large accumulations of data, the
    data may represent
  • anything that a user may be interested in but
    the information must
  • be maintained over long periods, and must be
    stored in some format
  • that can be manipulated by a computer.
  • Regardless of the contents of the Database, the
    information it contains
  • must be organized in such a way that it can be
    accessed (either for
  • retrieval or for modification) efficiently.
  • A Database Management System, is a software
    package designed to
  • administer the contents of a Database,
    retrieve, and update information,
  • organize and store information into physical
    files, and ensure that the
  • contents of the Database are consistent and up
    to date.

3
CSC228H - File Structures and Data Management
Examples of the use of Databases
  • Storing the information required to run a
    company
  • As a knowledge repository (or knowledge base) in
    Expert Systems
  • Storing and managing scientific, literary, and
    artistic information
  • (e.g. medical databases, repositories of
    electronic texts, on-line
  • collections of music, images, sounds, and
    video)
  • Storing and managing information about people
    (the Universitys Student
  • Database, Citizenship Immigration, FBI Most
    Wanted).
  • Helping the user to find and acquire a copy of
    his favorite CD.

4
CSC228H - File Structures and Data Management
Whats the point?
  • Real Databases have to be powerful enough to
    deal with millions
  • of records of information.
  • They must do so efficiently
  • All through the course we have been working on
    ways to make
  • access to files efficient.

5
CSC228H - File Structures and Data Management
A (very general) scheme of a Database
DBMS
USER
- Query Processing - Update management - Access
Strategy Planning - Maintaining consistency -
Making access secure - Physical Storage management
Data files, potentially contain millions
of records
Indexing Methods
6
CSC228H - File Structures and Data Management
What we have covered so far...
We have built programs that can process simple
and compound queries, manage information on disk,
and have limited interaction capabilities.
We know how to organize information within files
(record and field structure) and when to use
different data formats (binary vs. text). We know
how files are managed by the Operating
System, and what the costs of I/O are.
We know how to access information efficiently, we
have developed methods for indexing files, either
using simple indexes, or through more complicated
structures such as B-trees or Hash tables. We
have analyzed their performance and have found
that they allow us to access large files
effivciently.
We have interacted with the Users (A.K.A. your
TA and Instructor) who are always asking that
the code perform efficiently, be written nicely,
and wants a nice looking report.
7
CSC228H - File Structures and Data Management
What we need to cover...
  • Methods for modeling the contents of a Database
  • Methods for expressing, analyzing, and
    processing queries
  • How the design of our file structures will
    influence the final
  • performance of our DBMS.

8
CSC228H - File Structures and Data Management
Modeling of data within a Database
  • We will cover the Relational Model, which is
    widely used for defining
  • the elements within a Database.
  • Entity-Relationship diagrams, which are used for
    modeling the
  • elements that make up our database (entities)
    and the way they
  • relate to each other (relationships)
  • Relational algebra, which is the base for
    expressing queries in
  • the relational model
  • SQL, a widely used language for data definition
    and manipulation.

9
CSC228H - File Structures and Data Management
Entity-Relationship diagrams
  • Models the real world as a set of entities and
    relationships.
  • Entities Are the elements about which the
    Database will store
  • information, they refer to concepts such as
    people, things, places,
  • organizations, or abstract ideas that have
    properties for which
  • data is available and may be stored and
    processed.
  • An entity is a generalization, it describes a
    set of items that share
  • common properties. For example, the entity
    student describes
  • the characteristics that will be kept in our
    database for each
  • particular instance of a student.

10
CSC228H - File Structures and Data Management
E-R Diagrams
  • Relationships Describe the interactions and
    associations between
  • entities. For example if we have two entities
    student, and course,
  • we can establish the a relatioship between both
    as follows
  • student is enrolled in course
  • The is enrolled in phrase describes the
    relationships between these
  • two entities.
  • Attributes Are values that describe the
    entities to which they belong,
  • for example last name, first name, and
    student are attributes
  • of the student entity.

11
CSC228H - File Structures and Data Management
E-R Diagrams
  • Relationships will establish a link between
    particular instances of
  • different entities, they will be used to ensure
    that the consistency of
  • the database is maintained, and that the
    correct attributes from each
  • entity are retrieved and combined during
    queries.
  • Attributes are also used to identify particular
    instances of an entity,
  • an attribute or set of attributes that can be
    used to uniquely identify a
  • particular instance of an entity is called a
    key.

12
CSC228H - File Structures and Data Management
Classification of relationships
  • Degree of a relationship Its the number of
    entities associated through
  • the relationship. The most common type is the
    binary relationship
  • (for example, student is enrolled in course is
    a binary relationship)
  • Connectivity describes how entities map to each
    other through the
  • relationship, there are three possible values
    for connectivity
  • - One to one One instance of entity X is
    associated with only one
  • instance of entity Y (e.g. Each student is
    given a single email account).
  • - One to many One instance of entity X is
    associated with one or
  • many instances of class Y, but each instance
    of entity Y is associated
  • with only one instance of entity X (e.g. Each
    college has many
  • students).

13
CSC228H - File Structures and Data Management
Classification of relationships
  • Connectivity (continued)
  • - Many to many Many instances of entity X are
    associated with many
  • instances of entity Y, and vice versa (e.g.
    A student is enrolled in
  • many courses, A course has many students).
  • There are other aspects of the classification,
    such as Direction, Type,
  • and Existence that we will not cover here.

14
CSC228H - File Structures and Data Management
Notation
  • Please note that there is no official standard
    for representing data
  • in E-R diagrams, and the notation used here may
    change in other
  • domains and applications. For the present, we
    will use the following
  • - We represent entities with rectangles, the
    name of the entity is the
  • label of the rectangle, and it should be a
    singular noun.
  • - Relationships are represented as lines
    connecting two entities,
  • the description of the relationship is
    written above the line, and
  • should contain a verb.
  • - Attributes for each entity are listed within
    the rectangle, they should
  • be singular nouns, and key attributes are
    underlined

15
CSC228H - File Structures and Data Management
Notation (continued)
  • Connectivity is represented as the presence or
    absence of a crows
  • foot next to each of the entities in the
    relationship. If there is a crows
  • foot then the connectivity is many on that side
    of the relationship, else
  • it is one.
  • Examples

student
Email account
college
student
student name
address quota
is assigned an
name address
has
student name
A one to one relationship between student and
email account.
A one to many relationship between college and
student. Notice the crows foot on the side of
student.
NOTE The relational model does not permit many
to many relationships, these type of
relationships must be modeled as a set of one to
one, and one to many relationships.
16
CSC228H - File Structures and Data Management
Fine, so where do we get these entities and
relationships?
  • We must perform Requirements Analysis. This
    brings us to the field
  • of Requirements Engineering.
  • R.E. is in turn part of the broader discipline
    of Software Engineering
  • which deals with the process of building
    software systems given
  • a multitude of constraints (such as deadlines,
    budget limits, technical
  • constraints, performance and quality
    considerations, etc.)
  • Requirements Engineering deals with the process
    of identifying
  • the users needs, and generating a consistent
    and complete list
  • of requirements that can be used for
    implementing a software system
  • that will satisfy the users needs.

17
CSC228H - File Structures and Data Management
Why is this important?
  • Before we can answer queries, we need to
    appropriately model the
  • data we shall be using, the modeling of data
    requires that we analyze
  • the domain in which it will be used, the way in
    which data is generated,
  • processed, and used.
  • It brings to light an important point Most of
    the time, it is not possible to
  • simply sit down and code a software system.
    Interaction with the user
  • (or customer) will be essential, and will
    present its own problems.
  • You have already done a bit of Software
    Engineering for your previous
  • assignments!

18
CSC228H - File Structures and Data Management
Back to the E-R diagrams
  • From the analysis of customer requirements, the
    application domain,
  • and any other available source of information
    we will determine
  • - Which entities are to be integrated into the
    database, and which are
  • their corresponding attributes.
  • - What relationships exist between entities in
    the database
  • - The type of transactions that will be carried
    out on the database
  • - Rules to keep the contents of the database
    consistent and up to
  • date
Write a Comment
User Comments (0)
About PowerShow.com