Title: Standards For Building Web Sites
1Standards For Building Web Sites
- Brian Kelly Email Address
- UK Web Focus B.Kelly_at_ukoln.ac.uk
- UKOLN
- University of Bath
- http//www.ukoln.ac.uk/
UKOLN is funded by Resource The Council for
Museums, Archives and Libraries, the Joint
Information Systems Committee (JISC) of the
Higher Education Funding Councils, as well as by
project funding from the JISC and the European
Union. UKOLN also receives support from the
University of Bath where it is based.
2Contents
- Introduction
- Web Standards Overview
- Web Standards
- Data Formats
- Transport
- Addressing
- Metadata
- Deployment Issues
- Questions
- Aims of Talk
- To describe standards bodies involved with the
Web - To review key Web standards
- To report on developments to Web standards
- To briefly address implementation models
3UK Web Focus / W3C
- UK Web Focus
- JISC funded post based at UKOLN (Bath Univ)
- Advises UK HE community on web issues
- Represents JISC on W3C
- UKOLN
- UK Office for Library and Information Networking
- Applied research (e.g. JISC and EU-funded
projects) and dissemination - W3C (World Wide Web Consortium)
- International consortium, with headquarters at
MIT, INRIA and Keio University (Japan) - Coordinates development of web protocols and
file formats
4Standards, Architectures, Applications, Resources
- This talk is concerned primarily with the
standards used to develop web services
Architectures models for implementing systems
Standards concerned with protocols and file
formats
NT / UnixFile system / database application HTML
tools / content management
Open standards vs. Proprietary HTML / XML vs.
PDF CSS / XSL vs. HTML
Applications software products used to implement
systems
Resources financial and staff costs needed to
implement systems
Apache / IIS FrontPage / Dreamweaver Oracle /
SQLServer
Development vs. Migration costs Use of in-house
expertise In-house vs. out-sourced Licensed vs.
open source
5Standards
- Need for standards to provide
- Platform independence
- Application independence
- Avoidance of patented technologies
- Flexibility ("evolvability" - Tim Berners-Lee)
- Architectural integrity
- Long-term access to data
- Ideally look at standards first, then find
applications which support the standards - Difficult to achieve this ideal!
6Standardisation
- Other
- Standards bodies such as ECMA
- Community groups which can agree on, say, profiles
HTML extensions PDF and Java?
- Proprietary
- De facto standards
- Often initially appealing (cf PowerPoint)
- May emerge as standards
- W3C
- Produces W3C Recommendations on Web protocols
- Managed approach to developments
- Protocols initially developed by W3C members
- Decisions made by W3C, influenced by member and
public review
PNG HTML Z39.50 Java?
- ISO
- Produces ISO Standards
- Can be slow moving and bureaucratic
- Produce robust standards
- IETF
- Produces Internet Drafts on Internet protocols
- Bottom-up approach to developments
- Protocols developed by interested individuals
- "Rough consensus and working code"
HTTP URNwhois
PNG HTML HTTP
7The Web Vision
- Tim Berners-Lee's vision for the Web
- Automation of information management If a
decision can be made by machine, it should - All structured data formats should be based on
XML - Migrate HTML to XML
- All logical assertions to map onto RDF model
- All metadata to use RDF
A useful overview of Tim Berners-Lee's vision for
the Web is given in his book Weaving The Web.
8Web Protocols
- Web initially based on three simple protocols
- Data FormatsHTML (HyperText Markup Language)
provides the data format for native documents - AddressingURLs (Uniform Resource Locator)
provides an addressing mechanism for web
resources - TransportHTTP (HyperText Transfer Protocol)
defines transfer of resources between client and
server
Transport HTTP
9HTML History
- HTML 1.0 Unpublished specification. DTD
developed by Tim Berners-Lee (CERN). - HTML 2.0 Spec. based on innovations from NCSA
(forms and inline images!) - HTML 3.0 Proposed spec. (renamed from
HTML).Very comprehensive Failed to complete
IETF standardisation Little implementation
experience - Proprietary Introduction of proprietary HTML
elements by Netscape and Microsoft (browser wars) - HTML 3.2 Spec. based on description of mainstream
innovations in marketplace - HTML 4.0 Current recommendation
10Problems with Extensions
- Device Dependency
- Resources are dependent on a particular browser
- Platform dependency
- Costs
- Read costs in supporting multiple architectures
- Potential costs in re-engineering
- Architecture
- Proprietary innovations have been flawed
- Merging content and appearance
- Maintenance of resources
- Accessibility problems
- Poor support for access by disabled
- But
- Experiments are needed
11HTML 4.0, CSS 2.0 and DOM
- HTML 4.0 used in conjunction with CSS 2.0
(Cascading Style Sheets) and DOM 1.0 provides an
architecturally pure, yet functionally rich
environment
- HTML 4.0
- Improved forms
- Hooks for stylesheets
- Hooks for scripting languages
- Table enhancements
- Better printing
- CSS 2.0
- Support for all HTML formatting
- Positioning of HTML elements
- Multiple media support
- DOM 1.0
- Document Object Model
- Hooks for scripting languages
- Permits changes to HTML CSS properties and
content
- CSS Problems
- Changes during CSS development
- Netscape IE incompatibilities
- Continued use of browsers with known bugs
12HTML Limitations
- HTML 4.0 / CSS 2.0 have limitations
- Difficulties in introducing new elements
- Time-consuming standardisation process (ltABBREVgt)
- Dictated by browser vendor (ltBLINKgt, ltMARQUEEgt)
- Area may be inappropriate for standarisation
- Covers specialist area (maths, music, ...)
- Application-specific (ltSTUD-NUMgt)
- HTML is a display (output) format
- HTML's lack of arbitrary structure limits
functionality - Find all memos copied to John Smith
- How many unique tracks on Jackson Browne CDs
13XML
- XML
- Extensible Markup Language
- A lightweight SGML designed for network use
- Addresses HTML's lack of evolvability
- Arbitrary elements can be defined
(ltSTUDENT-NUMBERgt, ltPART-NOgt, etc) - Agreement achieved quickly - XML 1.0 became W3C
Recommendation in Feb 1998 - Support from industry (SGML vendors, Microsoft,
etc.) - Support in Netscape 6 (?) and IE 5
14XML Concepts
- Well-formed XML resources
- Make end-tags explicit ltligt...lt/ligt
- Make empty elements explicit ltimg ... /gt
- Quote attributes ltimg src"logo.gif" height"20"
- Use consistent upper/lower case
- Valid XML resources
- Need DTD
- XML Namespaces
- Mechanism for ensuring unique XML elements
- lt?xmlnamespace ns"http//foo.org/1998-001"
prefix"i"gt - ltpgtInsert ltiPARTgtM-471lt/iPARTgtlt/pgt
15XLink, XPointer and XSL
- XLink will provide sophisticated hyperlinking
missing in HTML - Links that lead user to multiple destinations
- Bidirectional links
- Links with special behaviors
- Expand-in-place / Replace / Create new window
- Link on load / Link on user action
- Link databases
- XPointer will provide access to arbitrary
portions of XML resource - XSL stylesheet language will provide
extensibility and transformation facilities (e.g.
create a table of contents)
ltcommentary xmllink"extended" inline"false"gt
ltlocator href"smith2.1" role"Essay"/gt
ltlocator href"jones1.4" role"Rebuttal"/gt
ltlocator href"robin3.2" role"Comparison"/gt
lt/commentarygt
16More XML Developments
- Momentum behind XML is driving additional
standardisation developments - XML PathA language for addressing parts of an
XML document, designed to be used by XSLT and
XPointer - XML Schemas (Ii)Defining the nature of XML
schemas and their component parts - XML Schemas (II)Facilities for defining
datatypes to be used in XML Schemas and other XML
specifications - XSLTA language for transforming XML documents
into other XML documents - XML InfospaceAn abstract data set containing the
information available from an XML document
17XHTML
- XHTML
- Extensible Hypertext Markup Language
- HTML represented in XML
- Some small changes to HTML
- Elements in lowercase (ltpgt not ltPgt)
- Attributes must be quoted (ltimg src"logo"
height"50"gt - Elements must be closed (ltpgt..lt/pgt)
- Empty elements must be closed (ltimg src"logo" ..
/gt) - Gain benefits from XML
- Tools available (e.g. HTML-Kit from
http//www.chami.com/html-kit/) - See lthttp//www.webreference.com/xml/column6/gt
and lthttp//www.builder.com/Authoring/Xhtml/gt
18Addressing
- URLs (e.g. http//www.bristol-poly.ac.uk/depts/mu
sic/) have limitations - Lack of long-term persistency
- Organisation changes name
- Department scrapped
- Directory structure reorganised
- Inability to support multiple versions of
resources (mirroring) - URNs (Uniform Resource Names)
- Proposed as solution
- Difficult to implement (no W3C activity in this
area)
19Addressing - Solutions
- DOIs (Document Object Identifiers)
- Proposed by publishing industry as a solution
- Aimed at supporting rights ownership
- Business model needed
- PURLs (Persistent URLs)
- Provide single level of redirection
- Cache support
- National caches could provide simple URN support
- For further information see
- ltURL http//www.ukoln.ac.uk/metadata/resources/u
rn/gt - ltURL http//hosted.ukoln.ac.uk/biblink/wp2/links
.htmlgt
20Transport
- HTTP/0.9 and HTTP/1.0
- Made the Web popular
- Design flaws and implementation problems caused
poor performance - HTTP/1.1
- Addresses some of these problems
- 60 server support, client proxy support
beginning - Performance benefits! (optimised implementation
reduces packet traffic by 2/3) - Is acting as fire-fighter
- Poor usage counting
- Not sufficiently flexible or extensible
21HTTP/NG
- HTTP/NG
- Ideas for next generation of HTTP
- Produced various studies and reports
- No longer being developed within W3C
- Work now being coordinated by the IETF
22Metadata
- Metadata - the missing architectural component
from the initial implementation of the web
- Metadata Needs
- Resource discovery
- Content filtering
- Authentication
- Improved navigation
- Multiple format support
- Rights management
23Privacy
- P3P (Platform for Privacy Preferences)
- Example of a metadata application
- Privacy concerns are a current barrier to Web
development (esp. in US) - P3P project developing methods for exchanging
Privacy Practices of Web sites and user - Documents on architecture and vocabulary
available - See ltURL http//www.w3.org/TR/P3P/gt
24Digital Signatures
- DSig (Digital Signatures initiative)
- Key component for providing trust on the web
- DSig 1.0 is based on PICS
- DSig 2.0 will be based on RDF and will support
signed assertion - This page is from the University of Bath
- This page is a legally-binding list of courses
provided by the University - See lthttp//www.w3.org/DSig/gt
25RDF
- RDF (Resource Description Framework)
- Highlight of WWW 7 conference
- Provides a metadata framework ("machine
understandable metadata for the web") - Based on ideas from content rating (PICS),
resource discovery (Dublin Core) and site mapping
(MCF) - Applications include
- cataloging resources resource discovery
- electronic commerce intelligent agents
- digital signatures content rating
- intellectual property rights privacy
- See ltURL http//www.w3.org/RDF/gt
26RDF Model
RDF Data Model
- RDF
- Based on a formal data model (direct label
graphs) - Syntax for interchange of data
- Schema model
PropertyType
Resource
Value
Property
page.html
Cost
0.05
Cost
ValidUntil
page.html
0.05
11-May-98
PropObj
InstanceOf
Value
Property
ValidUntil
PropName
11-May-98
Cost
27RDF Example
- Example of Dublin Core metadata in RDF
ltRDF xmlns"http//www.w3.org/1999/02/22-rdf-synta
x-ns"xmlnsDC"http//purl.org/dc/elements/1.0/"
gt ltDescription about"http//www.w3.org/folio.html
"gt ltDCtitlegtThe W3C Folio 1999lt/DCtitlegt ltDCcre
atorgtW3C Communications Teamlt/DCcreatorgt ltDCdate
gt1999-03-10lt/DCdategt ltDCsubjectgtWeb
development, World Wide Web Consortium,
Interoperability of the Weblt/DCsubjectgt lt/Descrip
tiongt lt/RDFgt
See lthttp//www.w3.org/Metadata/Activity/gt
RDF has been used to express data about the W3C
Folio. The basic concept is that metadata about
this item on the Web is described through a
collection of properties called an RDF
Description. Notice that RDF uses the familiar
XML syntax. This example also illustrates XML
Namespaces.
28RDF Conclusion
- RDF is a general-purpose framework
- RDF provides structured, machine-understandable
metadata for the Web - Metadata vocabularies can be developed without
central coordination - RDF Schemas describe the meaning of each property
name - Signed RDF is the basis for trust
29Deployment Issues
- What part of the spectrum are you closest to?
Must support standards
Go with the marketplace
30I Support Standards
- But
- You probably use PowerPoint, don't you?
- Software vendors will subtly suck you into use of
proprietary features - Home-grown solutions can be expensive (where are
all the good Perl / C programmers willing to work
on short-term contracts for a pittance in
Universities?) - Standards may not take off remember Coloured
Book network protocols? - Proprietary solutions may become standardised
- Standards may not yet be available (or finalised)
- Do users want standards? Will "We support
standards" conflict with "Our services are based
on user requirements"?
31I Follow The Marketplace
- Good New Labour philosophy, but
- Can you trust your software vendor?
- Will your software vendor be around in a few
years time ("I only buy Rover") - Will your system be interoperable?
- What happens when you want to interwork with
partners or your organisation merges / is taken
over? - What happens when you want to extend your system
beyond the limits set by your software vendor?
32Some Difficulties
- We should acknowledge some difficulties in a
standards-based approach - Keeping up-to-date (look at nos. of documents at
http//www.w3c.org/TR/ and size of
http//www.diffuse.org/standards.html) - Spotting the winning standards
- Implementing the standard in a timely way
- Dealing with the problems of the software vendor
- Resources!
33Is It Worth It?
- Has the Web stabilised?
- Are you thinking about WAP services?
- Will you want to (be forced to) make your web
service accessible? - Will you want to deploy personalised interfaces
(e.g. My.Oxford.ac.uk) - Will your web service move from information
provision to e-business? - Do you want your University web site to use
business-to-business (B2B) protocols to automate
transfer of link and news items to HERO (neé HE
Mall)?
34What Should I Do?
- What approaches should I use?
- Storing information in a structured format makes
subsequent redevelopment easier - Be driven initially by standards and
architectural considerations, not by applications - Consider use of more sophisticated web management
tools, rather than HTML authoring tools - An organisational standards guidelines document
(part of a Web Strategy document) may be useful - Don't work in isolation
- Monitor standards development (e.g. W3C)
- Listen to others in your community
- Talk and discuss issues within your community
35Architectural Models
- There is a need for more intelligent software
which can process structured resources or
reformat unstructured ones
Web server simply sends file to client File
contains redundant information (for old browsers)
plus client interrogation support
HTML resource
Web server
HTML / XML / databaseresource
IntelligentWeb server
Client proxy
Server proxy
- Intermediaries can provide functionality not
available at client - DOI support
- XML support
- Format conversion
36Architectural Models e.g. XML Deployment
- Ariadne issue 14 has article on "What Is XML?"
- Describes how XML support can be provided
- Natively by new browsers
- Back end conversion of XML - HTML
- Client-side conversion of XML - HTML / CSS
- Java rendering of XML
- Examples of intermediaries
See http//www.ariadne.ac.uk/issue15/what-is/
37Conclusions
- To conclude
- Standards are important, especially for large
organisation and national initiatives - Proprietary solutions are often tempting because
- They are available
- They are often well-marketed and well-supported
- They may become standardised
- Solutions based on standards may not be properly
supported by applications - Intermediaries may have a role to play in
deploying standards-based solutions
38Further Information
- W3C web site lthttp//www.w3.org/gt
- W3C Tech Reports lthttp//www.w3.org/TR/gt
- "The Development Of Web Protocols And
- Formats", Exploit Interactive issue 1,
lthttp//www.exploit-lib.org/issue1/web/gt - "Wilde's WWW Technical Foundations of the World
Wide Web", Erik Wilde, ISBN 3-540-64285-4 - Diffuse Project web site lthttp//www.diffuse.org
/gt - "On Julius Caesar, Queen Eanfleda, and the
lessons from time past" Brian Meek, KCL
lthttp//www.kcl.ac.uk/kis/support/cc/staff/brian/
caesar.htmlgt
39Community Information
- Discuss standards, architectures and applications
on various mailing lists - website-info-mgt Mailbase list
- web-support Mailbase list
- See lthttp//www.mailbase.ac.uk/gt
- Participate in the Institutional Web Management
workshop (Bath University, 7-9th Sept) details
will be announced on website-info-mgt Mailbase
list
40Question Time