Title: Database and the Internet
1Database and the Internet
2Review of Last Class
- Object Oriented technology provides an
alternative to the Relational database model. - OO combines the data and the operations in one
package. - UML provides an alternative modeling language for
OO and Relational databases.
3Class Goals
- Understand the role of databases in the Internet
world. - Understand the technology that support the use of
databases over the Internet. - Internet protocols
- Database technology
- Design issues
- Understand the fundamentals of XML
- Understand why XML is such a rapidly growing
technology.
4Characteristic and Benefits of Internet
Technologies
- Hardware and Software Independence.
- Common and simple network protocols and user
interface. - Location independence.
- Rapid deployment and manageable costs.
5Independence
- Common network protocols let disparate equipment
and programs communicate as though they were the
same. - Separating the network into layers lets the
appropriate equipment or programs handle the each
job.
6TCP/IP Foundation of the Internet
- Every device (node) has an address.
- Dotted syntax 255.255.255.255
- Assigned name cottageland.net
- The lowest level of interaction is the Socket.
- Sockets have a port number
- Ports are assigned to protocols
- Higher level protocols use sockets to communicate
via formatted messages. - Most of the messages are simple English.
- SMTP (E-mail) HELLO
7Common Protocols
- SMTP - Simple Mail Transport Protocol
- FTP - File Transfer Protocol
- Telnet Terminal Emulation
- LDAP - Lightweight Directory Access Protocol
- HTTP - Hypertext Transport Protocol
- The World Wide Web
- SOAP
8Rapid Deployment
- Location Independence
- One world wide network means that in theory you
can put your applications anywhere and run it
from anywhere. - Standards mean
- Software tool cost remain low.
- Programmers and managers are easier to find.
- Tools do a lot more work for the programmer and
manager. - No network specific costs.
- Hardware and software can be optimized
9A Sample Of Applications That Use Internet
Technology
10Intranets and Extranets
- Intranets
- An Intranet is a locally owned and operated
Internet whose access is carefully controlled.
Its objective is to enhance company operations
through improved data access management. - Intranets provide a platform for the development
of new systems in a timely and cost-effective
manner. - Intranets are relatively easy to set up and to
implement at the technical level. Once
implemented, intranet services tend to grow
exponentially. - Extranets
- If an intranet extends beyond a single corporate
entity, it is known as an extranet. - An extranet extends the intranets to the
corporations value chain.
11Intranet/Extranet Components
12Intranet Architecture
Common Intranet Services
- Web server
- Electronic mail
- Document search
- File Transfer Protocol (FTP)
- News or discussion groups
- Workflow and team collaboration
- Web-to-database services
- Transaction processing
- Directory, security, and authentication services
- Firewalls and proxy servers
- Load balancing and caching
- Web-to-host access
13Basic Components Of The World Wide Web
14Web Building Blocks
- HTTP The communications protocol that Web uses.
- HTML The mechanism for coding the display pages.
- URL An address for every web page.
- Hyperlink A way to link an addresses to a Web
page. - Web Server A system to format and present Web
pages on request from a - Web Browser A program for converting HTML to a
graphical display on a local device.
15Web Building Blocks
- Web pages can be
- Static, coded once and stored on the server.
- Dynamic, created on the fly each time a client
asks for one. - In either case the place to store content is in a
database!
16Common Gateway Interface
- Web pages are bi-directional
- HTTP Get fetches a page at a specified address.
- HTTP Put sends a formatted page back to the
server. - CGI the Common Gateway Interface
- A standard way of sending formatted requests to
the server for action. - Runs programs on the server end to do work.
17Intranet Architecture
- Server-Side Extensions Web-to-Database
Middleware - A server-side extension is a program that
interacts directly with the Web server to handle
specific types of requests. - A database server-side extension program is also
known as Web-to-database middleware.
18Web-To-Database Middleware (ColdFusion)
19Two Web Server Interfaces
- Common Gateway Interface (CGI)
- The CGI uses script files that perform specific
functions based on the clients parameters that
are passed to the Web servers. - The script file is a small program containing
commands written in some programming language
(e.g., PERL, C, or Visual Basic) - Application programming interfaces (APIs)
- APIs are a newer Web server interface standard
that is much more efficient and faster than CGI
scripts. - APIs are implemented as shared code or as
dynamic-link libraries (DLLs) in the Windoz
environment.
20The API And CGI Web Server Interfaces
21Intranet Architecture
- Open Database Connectivity (ODBC)
- ODBC is Microsofts implementation of a superset
of the SQL Access Group Call-Level-Interface
(CLI) standard for database access.
22(No Transcript)
23Intranet Architecture
- The Web Browser
- The Web browser is located in the client computer
and it is the end user interface to the Web. - The Web browsers job is to interpret the HTML or
XML code that it receives from the Web server and
to present the different page components in a
standard way. - The browsers interpretation and presentation
capabilities are not sufficient to develop
Web-based applications, requiring plug-ins and
other client-side extensions.
24HTML
- ltHTMLgt
- ltHEADgt
- ltTITLEgt The Title lt/TITLEgt
- lt/HEADgt
- ltBODYgt
- Lots of HTML
- lt/BODYgt
- lt/HTMLgt
25Intranet Architecture
- Client-Side Extensions
- Plug-ins
- A plug-in is an external application that is
automatically invoked by the browser when needed.
- The plug-in is OS specific.
- The plug-in is associated with a data object to
allow the Web server to properly handle data that
are not originally supported.
26Java JavaScript
- Java
- Java is an object-oriented programming language
developed by Sun Microsystems that run on top of
the Web browser software. - Java applications are compiled and stored in the
Web server. - Calls to Java routines are embedded inside the
HTML page. - JavaScript
- JavaScript, developed by Netscape, is a scripting
language that allows Web authors to design
interactive sites. - JavaScript code is embedded in the Web pages.
- The embedded JavaScript is downloaded with the
Web page and is activated when a specific event
takes place.
27Active X VB Script
- Active X is Microsofts alternative to Java. It
is a specification for writing programs that will
run inside the Microsoft client browser. - Active X extends the browser by adding controls
to Web pages. These controls can be downloaded
from the Web server and let the user manipulate
data inside the browser. - VBScript is another Microsoft product that is
used to extend the browser's functionality. - VBScript is derived from Visual Basic.
- VBScript code is embedded inside an HTML page and
this code is activated by triggering events such
as clicking on a link.
28So Lets Look at Some Web Pages
- First impressions
- No one SQL statement can generate all of this
data. - There are implied hierarchies in the display
- Images are a major component
- More detailed
- Tables arent very Normalized
29How Can We Execute SQL Queries From a Web Page?
30Web-to-DB - ColdFusion
- An example of a Web-to-DB tool is Cold Fusion
- A product from Adobe (the PDF folks)
- ColdFusion application middleware can be used to
- Connect to and query a database from a Web page.
- Present database data in a Web page, using
various formats. - Create dynamic Web search pages.
- Create Web pages to insert, update, and delete
database data.
31Web-to-DB - ColdFusion
- Define required and optional relationships.
- Define required and optional form fields.
- Enforce referential integrity in form fields.
- Use simple and nested queries and form select
fields to represent business rules.
32How ColdFusion Works
33The RobCor Databases Relational Schema
34Web-to-DB - ColdFusion
- Creating a Simple Query with CFQuery and CFOutput
- Tasks
- Query the database, using standard SQL to
retrieve a data set that contains all records
found in the VENDOR table. - Format all of the records generated in Step 1 in
HTML to let them be included in the page that is
returned to the client browser. - See Script 14.1 for the script and Figure 14.10
for the output.
35A Simple Query Using CFQUERY And CFOUTPUT
36The CH14-1.CFM Script Output
37CFQUERY With Tabular CFOUTPUT
38The CH14-2.CFM Script Output
39Web-to-DB - ColdFusion
- Creating a Simple Query with CFQuery and CFTable
- Task
- Perform the same query with the result presented
in tabular format. - See Script 14-3 and Figure 14.12
40CFQUERY With CFTABLE
41The CH14-3.CFM Script Output
42Internet DB SystemsSpecial Considerations
- What Data Types Are Supported?
- How does one store and extract data objects such
as documents, pictures, and movies through a Web
browser? - How much overhead will be created by the storage
of binary objects in the database? How robust
must the DBMS be to handle binary object
transactions? What are the limitations for
extended or OLE data types? How many extended or
OLE data type fields can tables have?
43Internet DB SystemsSpecial Considerations
- Does the client browser support the data type of
the object you are trying to access? Are the
necessary plug-ins available? Is there a way to
automatically translate documents from their
native format to HTML? - Does the DBMS support Very Large Databases? What
about transaction speed? How many users are going
to access the database? How often?
44Internet DB SystemsSpecial Considerations
- Data Security
- Security can be implemented in the Web server,
the database and in the networking
infrastructure. - At the Web server level, most Web clients and
servers can perform secure transactions by using
encryption routines at the TCP/IP protocol level. - At the SQL level, administrators can use the
GRANT and REVOKE commands to assign access
restrictions to tables and/or to specific SQL
commands. - Web-to-database middleware vendors usually have
several security mechanisms available to
interface with databases.
45Internet DB SystemsSpecial Considerations
- Transaction Management
- The designers must ensure proper transaction
management support at the database server level
since the Web does not support the concept of
database transaction - The Web cannot maintain an open line between the
client and the database server. - The mechanics of a recovery from incomplete or
corrupted database transactions require that the
client must maintain an open communications line
with the database server.
46Internet DB SystemsSpecial Considerations
- De-normalization of Database Tables
- The Web environment does not support multitable
(parent-child) data entry. - Although implementing the parent/child data entry
is not impossible in a Web environment, its final
outcome is less than optimum, counterintuitive,
less user-friendly, and prone to errors. - Web programming languages such as Java,
JavaScript, or VBScript can be used to create the
required Web interfaces.
47XMLExtensible Markup Language
MIS 304 Winter 2006
48Where to start?
- Well whats missing from HTML?
- What do we want or need to add?
49Missing Components
- Document Type
- Structure
- Validation
50Document Type
- Well first there is only one kind of document
in HTML a Web Page and a cool as the web is it
should be easy to see that there is more than one
type of document out there and they all have
slightly different requirements. Ideally you
would like to have a way to specify and encode
different document types.
51Structure
- Second, HTML has a very limited Structure.
The ideal systems would be able to represent tree
and graph structures as well as forms.
52Validation
- Third, there is no way to validate HTML at the
client end. This is usually not a problem
because HTML is simple, but validation becomes a
bigger issue when the documents become more
complex. The ideal client application can check
for structural validity of the documents.
53The new markup language must
- Be usable over the Internet
- Support a wide variety of applications
- Be SGML compatible
- Be easy to write
- Be easy to process by programs
- Have no optional features
- Be human-legible and clear
- Be designed quickly
- Have a formal and concise design
54XML documents contain
- Character data
- Unicode (ISO 10646)
- Captures all of the Western and Eastern character
sets. - Elements and escapements
- lt! ---- Sample XML ---- gt
- Processing instructions
- Special instructions to the client processor
- lt?XML VERSION1.0 ?gt
- Elements
- Start tag, body, end tag
- The author controls the tag names
- ltpgt lt/pgt
- ltpgtltNamegtMickey Mouse lt/Namegtlt/pgt
- Document Type Definition Markup
55Tags continued
- Tags can be arranged in lots of different
combinations but in particular you probably want
to arrange them in either hierarchical or
relational style (or both).
56Attributes
- Optional Attribute, Value pairs
- Example
- ltperson firstnameMickey surnameMousegt
- ltaddressgt Disneylandlt/addressgt
- lte-mailgtmickey.mouse_at_disney.comlt/e-mailgt
- lt/persongt
57Document Type Declaration
- lt!DOCTYPE person SYSTEM person.dtd
- lt!ENTITY mickey mickey.mouse_at_disney.com
58Document Type Definition
- Identifies the syntax of the XML flavor being
used. - Meta-information about document contents
- Valid element names
- Valid attribute names and values
- How elements nest with one another
- The DTD is typically stored in a separate
document - The DTD says nothing about document semantics
59XML
- Well-Formed
- Conforms to basic XML syntax
- Can be parsed without regard to DTD
- Valid
- Well formed
- Conforms to DTD
60DTD Element Declarations
- Specifies valid element and its contents
- Uses regular expressions to define valid contents
- lt!ELEMENT br EMPTYgt //empty element
- lt!ELEMENT p ANYgt //allows everything
- lt!ELEMENT person
- (firstname surname address e-mail) gt
61A Huge Side Benefit
- A structured document is very much like a
database file! - One problem with Groupware has been that the
documents tend to be BLOBS. - Binary Large Objects
- MS Word Structure
- PDF Structure
- Now instead of using BLOBS you can reach into the
documents and pull out data.
62Applications of XML
- As a Database substitute
- Resource Description Framework (RDF)
- Content Description Format (CDF)
- XSL Extensible Style Languages
- XML/EDI (Electronic Data Interchange)
63Database Substitute
- Many databases are small sets of information that
only needs to be organized. - Data interchange between databases is a
particular problem. - Census Data Example
64Two Ways of Organizing Data
Relational
Hierarchical
Attribute1 Attribute2
Entity 1
Entity 2
Entity 3
Entity 4
Entity 5
65So what about XML?
- XML is just a way of organizing data based on
marking up the logical structures in the data or
document. - A Database is nothing if not a way of logically
organizing data so it seems a simple step to go
from a document markup to a Database. - For a while people were calling XML documents
Data Islands
66RDB vs XML Terminology
- The Table itself is an Entity.
- The Rows are Instances or Entity Instances.
- The Columns are Attributes.
- Each table has a primary Key
- Foreign keys can be used to link tables
- XML Tags are also called Elements.
- XML also uses the Attributes name to describe tag
values.
67Example of Primary and Foreign Keys
PartNo PartName Quantity Cost VendorNum
147589 Bolt 1066 .03 100002
148995 Washer 1550 .01 100003
151000 Spring 5034 1.00 100001
164601 Bolt 2507 .75 100002
199472 Wingnut 1952 .25 100002
VendorNum VendorName City StPr
100001 ABC Spring Troy MI
100002 XYZZY Fastener Toledo OH
100003 Boom Stamping Guelph ON
Primary Key
Foreign Key
68Entity Relationship (ER) Diagram
Part PartNo PartName Quantity Cost VendorNum
Vendor VendorNum VendorName City StPr
1
M
69RDBMS Normalization
- To be properly normalized all of the tables in
a relational database must be organized so that
all of the attributes of the table are facts
about the primary key. - Attributes in the table should not be facts about
any attribute other than the primary key
attribute(s). - A highly normalized database can assure the
designer that he/she has removed all of the
redundancy possible from the tables.
70Hierarchical XML
- ltRootgt
- ltLevel-1gt Identifier 1
- ltLevel-2gt Data Element lt/Level-2gt
- lt/Level-1gt
- ltLevel-1gt Identifier 2
- ltLevel-2gt Data Element lt/Level-2gt
- lt/Level-1gt
- ltLevel-1gt Identifier 3
- ltLevel-2gt Data Element lt/Level-2gt
- lt/Level-1gt
- lt/Rootgt
71Transform
- ltRootgt
- ltLevel-1gt Identifier 1
- ltLevel-2gt Data Element lt/Level-2gt
- lt/Level-1gt
- ltLevel-1gt Identifier 2
- ltLevel-2gt Data Element lt/Level-2gt
- lt/Level-1gt
- ltLevel-1gt Identifier 3
- ltLevel-2gt Data Element lt/Level-2gt
- lt/Level-1gt
- lt/Rootgt
72Relational Style XML
- ltEntitygt
- ltInstancegt
- ltAttribute-1gt Information lt/Attribute-1gt
- ltAttribute-2gt More info lt/Attribute-2gt
- ltAttribute-3gt Even more lt/Attribute-3gt
- lt/Instancegt
- ltInstancegt
- ltAttribute-1gt Information lt/Attribute-1gt
- ltAttribute-2gt More info lt/Attribute-2gt
- ltAttribute-3gt Even more lt/Attribute-3gt
- lt/Instancegt
- lt/Entitygt
73Transform
Attribute-1 Attribute-2
Instance 1
Instance 2
Instance 3
Instance 4
Instance 5
- ltEntitygt
- ltInstancegt
- ltAttribute-1gt Information lt/Attribute-1gt
- ltAttribute-2gt More info lt/Attribute-2gt
- ltAttribute-3gt Even more lt/Attribute-3gt
- lt/Instancegt
- ltInstancegt
- ltAttribute-1gt Information lt/Attribute-1gt
- ltAttribute-2gt More info lt/Attribute-2gt
- ltAttribute-3gt Even more lt/Attribute-3gt
- lt/Instancegt
- lt/Entitygt
74Mixed Style
- ltRootgt
- ltLevel-0gt
- ltLevel-1gt Identifier 1
- ltLevel-2.1gt Data Element 2.1 lt/Level-2.1gt
- ltLevel-2.2gt Data Element 2.2 lt/Level-2.2gt
- ltLevel-2.3gt Data Element 2.3lt/Level-2.3gt
- lt/Level-1gt
- ltLevel-1gt Identifier 2
- ltLevel-2.1gt Data Element 2.1 lt/Level-2.1gt
- ltLevel-2.2gt Data Element 2.2 lt/Level-2.2gt
- ltLevel-2.3gt Data Element 2.3 lt/Level-2.3gt
- lt/Level-1gt
- lt/Level-0gt
- lt/Rootgt
75What does this transform into?
76Linked Entities
Level-0 PrimaryKey
Level-1 PrimaryKey ForeignKey Level-2.1 Level-2.2
Level-2.3
M
1
77Impact Relational Database Development
- This little trickery has not been lost on the XML
community or the relational database developers. - Many traditional RDBMS vendors now support the
import and export of transactions and tables in
XML Format. - Many XML tool vendors now have RDBMS table export
functions in their tools. - There is a lively exchange of ideas between the
camps.
78XML Spy Demo
- XML Spy is a product of Altova Corp.
- http//www.altova.com
79RDBMS XML Issues
- One of the major issues is the type of
identifiers that each system uses. A legal XML
tag name may not be a legal RDBMS Entity or
Attribute name and vice versa. - Complex XML documents may not normalize well.
80Queries
- A major impetus to relational database
development has been the need to query the
contents of the database for reporting and
analysis. - This has led to the continued development of the
Structured Query Language (SQL) for relational
databases. - Traditional hierarchical databases like IMS
relied on the programming language to implement
the data manipulation tools. - Since XML is hierarchical it looks like we are
going backwards!
81The Example Two Table RDB
PartNo PartName Quantity Cost VendorNum
147589 Bolt 1066 .03 100002
148995 Washer 1550 .01 100003
151000 Spring 5034 1.00 100001
164601 Bolt 2507 .75 100002
199472 Wingnut 1952 .25 100002
VendorNum VendorName City StPr
100001 ABC Spring Troy MI
100002 XYZZY Fastener Toledo OH
100003 Boom Stamping Guelph ON
Primary Key
Foreign Key
82Relational SQL
- SQLgt SELECT PartNum, Cost FROM Part WHERE Cost
gt .25 -
- SQLgt SELECT PartNum, VendorName, CostFROM Part,
VendorWHERE Cost gt .25 ANDPart.VendorNum
Vendor.VendorNum
PartNo Cost
151000 1.00
164601 .75
PartNo VendorName Cost
151000 ABC Spring 1.00
164601 XYZZY Fastener .75
83XML Queries
- If you had an XML formatted document you would
like to be able to do the same thing. - This may not be so easy as there is no XML
equivalent to SQL. - Worse, the things that make SQL work like Primary
and Foreign Keys arent present in XML.(In
database speak XML has no underlying Relational
Algebra)
84Navigating the wild XML document
- An XML document is just that, a document.
- We know what to do with documents you open them
with an Editor - Editors have text manipulation tools like Find
and Replace. - Editors can Cut and Paste parts of the document
from one location to another in the document or
into another document. - The problem is that Editors are manually operated
and to be useful a database needs to run on
automatic.
85XPath
- Xpath is an XML enabling technology that allows
us to navigate the XML document Tree and to
discover and manipulate its contents. - Xpath consists of three parts
- A transversal direction called an Axis
- A node test which selects nodes along the path.
- An algorithmic predicate used to refine the the
exact node selected. -
86Axis
- Think of an ant Sauntering about the XML
document Tree. - Xpath treats the XML document as a tree of nodes.
- The ant visits the various nodes and reports on
its location when it finds what it is looking
for. - Xpath has two types of location paths relative
and absolute. - Relative paths proceed from another node.
- Absolute paths proceed from the root.
87Node Positions
- Child
- Descendent
- Parent
- Ancestor
- Following-sibling
- Proceeding-sibling
- Following
- Preceding
- Attribute
- Self
- Descendent-or-self
- Ancestor-or-self
88Relative vs Absolute
- Relativechildtitle/childparagraphselects
all the children of the division wherever you
happen to be. - Absolute/childbook/childtitle/childparagra
phselects the same thing except only if they
are immediate children of the book document.
89XML Snippet
ltchaptergt lttitlegt I Discover Moses and the
Bulrushers lt/titlegt ltparagraph
langen-Twaingt ltquotegt You dont
know about me without you have read a book
name of The Adventures of Tom Sawyer but that
aint no matter. lt/quotegt
lt/paragraphgt ltparagraphgt The above
quote is from Mark Twain lt/paragraphgtlt/chapte
rgt
90XPath Examples
- You can select all of the child elements of the
fragment, assuming Chapter is the current
nodechild - If you want the first paragraph element you could
use - childparagraphposition()1
- Select elementchildelement-name element-name
91XPath Core Functions
- Node Set Functions
- Number last()
- Number posiiton ()
- Number count(node-set)
- Node-set id(object)
- String local-name()
- String namespace-uri
- String name()
- Boolean Functions
- Boolean Boolean()
- Boolean not()
- Boolean true()
- Boolean false()
- Boolean lang()
- String Functions
- String string(object)
- String concat()
- Boolean starts-with()
- Boolean contains()
- String substring-before()
- String substring-after()
- String substring()
- Number string-length()
- String normalize-space()
- String translate()
- Number Functions
- Number object()
- Number sum()
- Number floor()
- Number ceiling()
- Number round()
92Mix and match
- XPath can be combined with other W3C XML
standards like XSL, XSLT, Xpointer and
programming languages like - Java
- Java script
- Visual Basic
- ASP
- C/C
- C
93XPath Issues
- For those of you who started their careers
writing Cobol this looks all too familiar. - You are required to Navigate the document using
the XPath API. - XPath statements can be very verbose so the
standard allows lots of abbreviations. - The RDBMS was supposed to free us from all of
this!
94XML DBMS
- Another approach to this problem is to build a
complete DBMS around an XML document(s). - When used a mid-tier applications between
traditional a RDBMS and the end application they
can offer improved performance. - Possible candidate applications for XML DBMS
include - Employee Portals
- Catalog Repository
- Membership databases
95XML DBMS
- Three approaches
- Map XML onto a traditional RDBMS.
- Native XML storage.
- Unstructured data management.
- Several companies offer such products.
- Ipedo
- Tamino (Sofware AG)
- Coherity
- Most use unique tools as well as the standard W3C
tool standards.
96XML DBMS Value Proposition
- Pluses
- Less translation to and from the traditional
RDBMS. - Data is highly structured.
- More flexible programming.
- Minuses
- Native XML is a redundancy Nightmare.
- Traditional database indexing is gone.
- Lower performance.
- Less mature management tools.
97The Future
- XQuery
- An XML based Query language from the W3C with
XPath as a subset. - SQLX sometimes called SQL/XML is a proposed
standard from ISO. - Allows you to query XML documents using an SQL
like syntax.
98XSL Extensible Style Sheets
- Maps XML to HTML
- Rule based mappings
- XML XSL HTML4.0 Industrial strength
publishing - Different devices require different formats.
(Cell Phones, PDA, Car Dashboard)
99Example XML EDI
- lt?XML version1.0 encodingUTF-8 ?gt
- ltPurchaseOrder Version4010gt
- ltPurchaseOrderHeadergt
- ltTransactionSetHeader X12.ID850gt
- ltTransactionSetIDCode code1234/gt
- ltTransactionSetNumbergt00001lt/TransactionSetNumbe
rgt - ltConventionRefgt123456lt/ConventionRefgt
- lt/TransactionSetHeadergt
- ltBeginningSegmentgt
- ltPurposeTypeCode CodeSU Status Update/gt
- ltOrderTypeCode CodeKN Purchase Order/gt
- ltReleaseNumbergtMTB 98765lt/ReleaseNumbergt
- lt/BeginningSegmentgt