Title: Introduction to Information Systems Analysis Database Design
1Introduction to InformationSystems
AnalysisDatabase Design Output Design
2Database Design
- Information systems Create, Read, Update, and
Delete data (CRUD) - Data is stored in records which describe one
instance of an entity (person, order, etc.) - Each file contains many similar records
- A database contains collections of interrelated
files
3Conventional Files Pros Cons
- Conventional (non-relational or flat) files are
used in many legacy systems e.g. Cobol and
FileMaker Pro use them - Fast and easy to develop
- Generally used for one specific application
- Results in duplication of data across files,
causing maintenance and integrity risks
4Relational Database Pros Cons
- Provides more flexible format for storage
- Allows multiple applications to access data
- Slightly slower than conventional files, but
improving - Higher learning curve to overcome
- Manage using a database management system (DBMS)
5Field Types
- A field is the smallest useful piece of data
- Four types
- Primary key - unique identifier
- Alternate key - alternative to the primary key
- Foreign key - links to a primary key elsewhere
- Descriptive field every attribute not a key!
- Every field should be used at least once
6Records
- A record is round piece of bumpy vinyl which
rotates on a turntable oh, wait a minute
blush - A record is a collection of fields in a specific
order or format - Each record is generally fixed in length
- Some systems can compress unused fields
- COBOL can use variable length records
7Records
- Records are read in units called blocks or pages
- A block is sometimes called a physical page
- The blocking factor describes how many logical
records are in each block - The blocking factor may be set manually for
performance reasons
8Files and Tables
- A file contains all records of a particular
structure - A table is a relational databases file
- There are many many kinds of files (see next
slide) - File organization and access may be tuned for
performance enhancement
9Types of Files
- Master files contain permanent information
- Transaction files describe business events
- Document files contain copies of historic data
- Archival files contain master and transaction
file records offline which are no longer needed - Table look-up files contain static data for
reference calculations (tax tables, Zip codes) - Audit files record changes to other files
10Data Architecture
- An organizations data architecture includes
- The files, databases, and database technology
used to store their data, and - The administrative structure used to manage the
data resources - The architecture may include legacy information
systems, local department databases, and shared
data warehouses
11Database Management
- Database administrators (DBAs) try to plan,
define, and structure this mess, so people can
use it - Database management systems (DBMS) are the tools
used to create and manage a database - The heart of the DBMS is called the database
engine, which controls access to the data
12SQL
- Structured Query Language (SQL) is used for both
data definition (DDL) and manipulation (DML) - DDL creates and allows views of the data
structures - DML is used to access and use data from those
structures (CRUD) - SQL is used by nearly all relational databases
- Oracle, DB2, SQL Server, Sybase, Informix,
Access, FoxPro, Paradox, dBase, etc.
13Relational Database Tools
- A relational database models the data in a
schema - Tables can embed programs
- Triggers are used to run programs based on some
event - e.g. updating records in another table - Stored procedures are called by an application -
e.g. validate new records before theyre stored
14Data Modeling
- A good data model should be
- Simple, describing only one entity
- Non-redundant, except for foreign keys
- Flexible, for future improvement
- Normalization helps achieve all of these
15Flat File Design
- Design of conventional flat files is dictated
primarily by the output and input needs of a
particular application - We wont say any more, in order to discourage
such barbaric practices )
16Database Design
- Detailed design of databases is heavily dependent
on the DBMS used to implement them - CASE tools can often generate SQL, based on the
database model, which will even generate the
tables and relationships needed - Otherwise you have to do it manually
17Database Design
- Each entity is implemented as a table
- Primary and alternate (secondary) keys are
generally indexed, to make searches faster - Foreign keys are marked as such
- Attributes become fields
- Supertypes and subtypes may not be implemented as
such by CASE tools may make them one table w/
stored procedures
18Database Design
- Keys and other fields need to be defined in
detail - Data type (from list of allowable types in this
particular DBMS) - Field size (length or number of characters)
- NULL allowable (no entry)
- Cant allow NULL for a primary key
- Domain and default values
19Integrity
- Integrity refers to knowing that the data is
trustworthy, correct and can be predictably
found - Three kinds of integrity
- Key integrity
- Domain integrity
- Referential integrity
20Key Integrity
- Key integrity can be achieved when every table
has a unique, non-NULL primary key - Primary key must be non-NULL because there must
be an entry for every record - If the DBMS does not directly support these
features, then other controls must be taken to
enforce them
21Domain Integrity
- Domain integrity is achieved by ensuring every
field is within its allowable domain (legal
values) - e.g. dont allow a 7-digit credit card number
- Note that this does not preclude incorrect
values, only those values which are impossible
22Referential Integrity
- Referential integrity is violated when a foreign
keys value does not match up to a primary key
record - To prevent this, must consider side effects of
deleting any record which contains a foreign key - Might have to wait for all of a customers
invoices are deleted before deleting customer
23Referential Integrity
- Rules for deletion should exist for every table
to help maintain referential integrity - No restriction delete freely
- DeleteCascade when you need to delete all
associated records from the foreign keys - DeleteRestrict when you hold off deleting
records until foreign key records are deleted - DeleteSet null when foreign keys are set null
24Role Names
- Some organizations prefer to have all unique
field names, to prevent confusion - When naming foreign keys, each one can get a role
name to distinguish its role differently from the
primary keys name - Primary key product_number in the Products table
may become foreign key ordered_product_number in
the Orders table
25Database Distribution
- Four options for database distribution
- Centralization there is only one data server
- Horiz. Distrib. tables or records (rows) are
assigned different locations, but hard to analyze - Vert. Distrib. attributes (columns) are assigned
different locations, also hard to analyze - Replication duplicate entire tables at different
locations, but more complex and expensive
26Database Capacity Planning
- The size of a database can be estimated by
- Record size In a table, add the field sizes to
get the size of one record (one character1 byte) - Table size Multiple the record size by the
number of records expected in 3 yrs - Database size Repeat first two steps for all
tables and add them up - Add a big factor for unexpected growth
27The Database Future
- Object-oriented databases have become more common
in the last few years - Some purely object-oriented databases are
available, but rarely used - Since many organizations are heavily invested in
relational databases (20 years),
object-relational databases are a popular hybrid
transition technology (e.g. Oracle 9)
28Output Design
- An information system generally has both internal
and external outputs - Internal outputs are used for management of the
system and for reference by the users they
rarely leave the organization - External outputs include any printed or displayed
information which is used outside of the system
and its organization
29Internal Outputs
- Detailed reports are those which use little
filtering to produce a comprehensive statement,
such as all inventory on hand, or every
transaction during some time period - Some detailed reports are historical or
regulatory in nature
30Internal Outputs
- Summary reports provide a synopsis (maybe
graphic) of the information, often for managers
to look for trends or problems - Exception reports describe when something is
wrong outside of predefined limits (low stock)
31External Outputs
- External outputs leave the system and its
immediate users, generally to go to a vendor,
customer, or external system - Might include invoices, paychecks, tickets,
passes, bills, purchase orders, etc. - Turnaround outputs are external outputs which
later become inputs for the same system (invoices
returned by customer)
32Output Media for Implementation
- Media used for outputs may include
- Paper (preprinted or not)
- Microfilm (?)
- Magnetic media (e.g. CD-ROM, DVD, video)
- Text files (e.g. append log entries)
- Posted online (Internet or intranet)
- Displayed on screen
- Client screen, projector, or POS terminal
33Output Media for Implementation
- Consider different options not just how you
first think of the output being presented - Automated phone call to a pager or cell phone
- E-mail messages
- Multimedia (sound, pictures, video, etc.)
- Flash or Shockwave presentations
- Hyperlinks, or even entire web pages
34Output Media
- Printed output is often tabular and/or zoned
- Tabular format looks like a table (rows and
columns of text data) - Zoned output is closer to a GUI input, or the
information at the top of a spreadsheet
collections of related data are grouped together - Tabular and zoned output formats are often used
together - On-screen output may be graphic (charts)
35Graphic Output
- Graphic output uses pictures to express
information and help look for trends, but cant
replace narrative output to explain what is being
shown - Need to be clear about the scope and source of
the data shown in a graph, so it is used
appropriately by its audience - Might have a hyperlink for definitions of the
terms used
36Graphic Output
- For every graphic output, need to consider
- How often will the output be generated?
- Does the output cover one moment in time or many
repeated assessments? - How many data points are to be displayed?
- What medium will be used to present and
distribute the output? - A 32-bit color plot is really boring after it has
been re-copied in grayscale and faxed three times
37Graphic Output Chart Types
- Bar charts use horizontal bars, primarily for
visual comparison rather than tracking - Column charts use vertical bars to compare
different items or track data over time - Pie charts are good for showing a limited number
of data points at one moment
38Graphic Output Chart Types
- Line charts show trends over time for one or
more variables - Scatter charts are used to compare two parameters
to each other often a correlation between them,
or relationship, is sought - Other types exist, but are used less often
- Donut, area, radar, and control charts
39Output Design Tools
- Printer spacing charts were used to design
text-based outputs - Now CASE tools and most database environments
(Access, Crystal Reports, Oracle Designer) allow
easy drag-and-drop layout of outputs
40Design Considerations
- Output should be easy to read and interpret
- Every output should have a title
- Every output should be date and time stamped
- Section headings should be used to identify
groups of data - In forms, fields should be clearly labeled
- Columns in tabular outputs need column headings
41Design Considerations
- Legends should be used to explain column headings
and field names - Only required information should be shown (hide
irrelevant details) - Output should be usable in its presented form
- Outputs should be balanced on the page
- Users need to navigate easily and freely, or exit
- Outputs shouldnt have jargon, error messages
42Design Considerations
- Timing of output is very important a beautiful
report a week late is worthless - Distribution of outputs must be complete enough
to reach all relevant (affected) users - Outputs must contain all information needed by
its user - even if its requirements forgot
something!
43Output Design
- There are four steps to designing outputs
- 1. Identify system outputs
- 2. Select physical output requirements
- 3. If needed, design preprinted external forms
- 4. Design, validate, and test the outputs (which
includes obtaining feedback and refining the
design)
441. Identify System Outputs
- Outputs should have been defined in the
requirements for each design unit in the DFD,
look for data which leaves the system - Content for each output should be defined in the
data dictionary - Consider the audience for each output, and when
it will be generated (periodically and/or on an
event-driven basis)
452. Select Physical Output Requirements
- Choose how the output will be presented based on
its type and purpose, and on the feasibility
limitations - Pick the medium (paper, screen, kiosk, etc.)
- Choose the output format (8.5 x 11 paper, 800 x
600 pixel screen, etc.)
462. Select Physical Output Requirements
- How often is this output generated?
- Does its generation need to be scheduled?
- Review the frequency the output will be needed
for the output volume requirements (amount of
paper, number of forms) - Is duplication needed? (Xerox, burn CDs)
- How will the output be distributed? How is that
distribution controlled?
473. Design Preprinted Forms (opt.)
- Preprinted forms may need long lead time
- What information will be preprinted?
- Will the form be mailed?
- What size will the form be? Perforated?
- How much volume of printing is expected?
- What legends and instructions need to be on the
form? What colors will be needed?
484. Design, Validate, and Test the Outputs
- Key tabular report concerns
- Page size, orientation, and headings
- Report legend column row headings
- Alignment and spacing of labels and headings
- Data formatting, including field masks
- Control breaks, make rows of data into sections
- Clearly define the end of report
494. Design, Validate, and Test the Outputs
- Key screen (and web) report concerns
- Size (screen resolution) and page size (kB)
- Scrolling can headings be frozen?
- Navigation scroll bars, tabs, or buttons
- Partitioning (zoning of data within a form)
- Information Hiding how do users get more info
- Highlighting how when get users attention?
- Printing always provide an option to do so