Title: An Update On Web Standards
1An Update On Web Standards
- Brian Kelly
- University of Bath
- Bath, BA2 7AY
Email B.Kelly_at_ukoln.ac.uk URL http//www.ukoln.ac.
- Introduction
- Standards
- The Original Web Architecture
- Architectural Developments
- Deployment Issues
- Discussion
- Aims of Talk
- To give brief overview of Web architecture
- To describe developments to Web standards
- To briefly address implementation issues
Please feel free to ask questions at any time,
especially to clarify any unexplained TLAs or
3About Me
- Brian Kelly
- UK Web Focus a JISC-funded post to advise HE
and FE communities on Web developments - Based in UKOLN - a national focus of expertise in
digital information management based at the
University of Bath - Involved in Web since 1993, while working in the
Computing Service at University of Leeds - Represent JISC on the World Wide Web Consortium
4Standards in HE/FE Context
- Standards are important in the HE and FE sector
to - Ensure widespread access to resources
- Enables resources to be reused and repurposed
- Ensure scholarly resources can be preserved
- Address accountability of public funding
- Minimise resource costs for upgrading systems
- Provide universal access to resources (cf
disability legislation)
Before the Web Access to resources typically
required use of software vendors software
which was only available on limited no. of
platforms. Often the software would be
licensed. The goal of the Web was to provide
universal access to resources. Who could argue
with this goal?
- Need for standards to provide
- Platform and application independence
- Avoidance of patented technologies
- Flexibility and architectural integrity
- Long-term access to data
- Ideally look at standards first, then find
applications which support the standards.
However it can be difficult to achieve this ideal!
6Standards and the Web
- Proprietary
- De facto standards
- Often initially appealing (cf PowerPoint, PDF)
- May emerge as standards
HTML extensions PDF and Java?
PNG HTML Z39.50 Java
- W3C
- Produces W3C Recommendations on Web protocols
- Managed approach to developments
- Protocols initially developed by W3C members
- Decisions made by W3C, informedby member
public review
- Produces ISO Standards
- Can be slow moving and bureaucratic
- Produce robust standards
- Produces Internet Drafts on Internet protocols
- Bottom-up approach to developments
- Protocols may be developed by interested
individuals - "Rough consensus and working code"
7The Case For W3C Standards
- Why use open standards developed by the W3C? Why
not leave it to the marketplace? - W3Cs open standards have been developed in an
open environment, with the aim of achieving
platform and application independency - Commercial companies develop proprietary formats
in order to maximise their profits and dividends
to shareholders - W3Cs open standards have been developed to
interoperate with each other according to W3Cs
design vision - Commercial companies typically develop
proprietary formats in isolation, or along the
lines of a company vision
8Challenges For The W3C
- W3C can be regarded as a force for good
- Neutral trusted body
- Have a vision
- But W3C are facing a number of challenges
- The increasing complexity of the Web
- Ambiguities in specs, overlapping areas,
- Reaching consensus
- Developing solutions now or developing the best
solution - Patents
9Standards, Architectures, Applications, Resources
- This talk touches on several areas
Architectures models for implementing systems
Standards concerned with protocols and file
Which standards are applicable NT / UnixFile
system / database application HTML tools /
content management
Open standards vs. Proprietary HTML / XML vs.
Resources financial and staff costs needed to
implement systems
Applications software products used to implement
Apache / IIS FrontPage / Dreamweaver Oracle /
SQLServer ColdFusion vs ASP
Development vs. Migration costs Use of in-house
expertise In-house vs. out-sourced Licensed vs.
open source
- As an example of the dangers of use of
proprietary solutions, consider the GIF file
format - Unisys announce that they hold patent to
compression algorithm used in GIF images and
users of GIF will have to pay - Following much debate, Unisys require payment for
licence from software developers - and also for
end users of unlicensed software (5,000!) - Web community responds with PNG format
- See lthttp//burnallgifs.org/gt
- There is no guarantee that payment will not be
required for proprietary file formats which are
currently free
11How Does The Web Work?
- The Web has three fundamental concepts
- URLs addresses of resources
- HTTP dialogue between client and server
- HTML format of resources
1 User clicks on link to the address
Web Browser
2 Browser converts link to HTTP command
(METHOD) Connect to computer at
www.netsoft.com GET /hello.html
The Netsoft home page
3 Remote computer sends file
ltHTMLgt ltTITLEgtWelcomelt/TITLEgt.. ltPgtThe ltA
HREFgtNetsoftlt/Agt home pagelt/Pgt
Web server
4 Local computer displays HTML file
12Approaches To HTML
- Emphasis on managing HTML resources
inappropriate - HTML is an output format, which cannot easily be
reused (e.g. WAP, e-Books, etc.) - Need to manage HTML fragments (only partly
achievable with SSIs) - Need to manage collections of resources
- Need to have single master source of data
- Need to support new developments such as
personalisation - Difficult to integrate with new formats
- Issues
- Should we stop giving HTML courses?
- Should we stop buying HTML authoring tools?
- Extensible Markup Language
- A lightweight SGML designed for network use
- Addresses HTML's lack of evolvability
- Arbitrary elements can be defined
(ltSTUDENT-NUMBERgt, ltPART-NOgt, etc) - Agreement achieved quickly - XML 1.0 became W3C
Recommendation in Feb 1998 - Support from industry (SGML vendors, Microsoft,
etc.) - Support in latest versions of Web browsers
14XML Concepts (1)
- Well-formed XML resources
- Make end-tags explicit ltligt...lt/ligt
- Make empty elements explicit ltimg ... /gt
- Quote attributes ltimg src"logo.gif" height"20"
- Use consistent upper/lower case ltpgt and ltPgt are
different - XML Namespaces
- Mechanism for ensuring unique XML elements
- lt?xmlnamespace ns"http//foo.org/1998-001"
prefix"i"gt - ltpgtInsert ltiPARTgtM-471lt/iPARTgtlt/pgt
15XML Concepts (2)
- XML Schemas
- Allow constraints to be applied on XML attributes
- Express shared vocabularies and allow machines to
carry out rules made by people - Richer than DTDs
- See lthttp//www.w3.org/XML/Schemagt
- A language for transforming XML from one DTD to
another, or to another format (e.g. PDF) - Written in XML
- Knows about XML (e.g. tree structures, etc.)
- See lthttp//www.xslt.com/gt
16XML Concepts (3)
- XLink provides sophisticated hyperlinking
- Links that allow you to choose multiple
destinations - Bidirectional links
- Links with special behaviours
- Expand-in-place / Replace / Create new window
- Link on load / Link on user action
- Link databases
- See lthttp//www.xml.com/pub/a/2000/09/xlink/gt
- XPointer
- Provides access to arbitrary portions of XML
resource - See lthttp//www.devshed.com/Server_Side/XML/XPoin
17Getting to XML With XHTML
- HTML represented in XML
- Some small changes to HTML
- Elements in lowercase ltpgt not ltPgt
- Attributes must be quoted ltimg src"logo"
height"50"gt - Elements must be closed
- lt p gt... lt/ p gt)
- ltimg src"logo" ... /gt
- Gain benefits from XML
- Tools available (e.g. HTML-Kit from
http//www.chami.com/html-kit/) - See lthttp//www.webreference.com/xml/column6/gt,
lthttp//groups.yahoo.com/group/XHTML-L/gt and
- Cascading Style Sheets
- XHTML/XML defines structure, CSS describes the
appearance - CSS 1.0 and 2.0 now W3C recommendations
- CSS 3.0 in preparation (modularised)
- We should be using CSS
- Part of architecture
- Ease of maintenance
- Becoming much richer
- Accessibility
- See lthttp//www.w3c.org/Style/CSS/gt
- Scalable Vector Graphics
- A language for describing two-dimensional
graphics in XML - See lthttp//www.w3.org/Graphics/SVG/Overview.htm8
gt - Also see presentation on XML written in SVG at
nTheWorldslide.svgzgt - WWW 2002 talk at lthttp//www.w3c.org/2002/Talks/w
20(No Transcript)
21SVG Example
22SVG and XSLT
- This example
- Originally written in Java
- Author realised that XSLT would be easier
- Uses SVG for chess board and pieces
- Uses XSLT to move pieces
- A molecule described in CML can be transformed
using XSLT into SVG, allowing it to be displayed
and manipulated
- Synchronized Multimedia Integration Language
- A language for authoring of interactive
audiovisual presentations - Allows you to synchronize text, images, audio and
video in a document - An XML Application
- See lthttp//www.w3c.org/AudioVideo/gt
25SMIL Example
- MathML
- An XML application for maths
- Various plugins, dedicated readers, etc.
- Mozilla renders natively
See lthttp//www.mozilla.org/projects/mathml/gt
- How can you
- Include XML resources such as MathML, ChemML, etc
in XHTML documents? - Provide a subset of XHTML features in browsers on
devices such as mobile phones, PDAs, etc.? - The answer is
- XHTML modularisation (modularization )
- See lthttp//www.w3.org/TR/xhtml-modularization/gt
28Addressing (1)
- URLs have limitations
- Lack of long-term persistency
- Univ. changes name or department shut down or
merged - Directory structure reorganised
- Inability to support multiple versions
(mirroring) - URIs
- Were an address of a resource and moving a
resource was annoying but not critical - With the development of Web services,
structured resources, B2B communications, etc.
the availability of URIs will be of great
29Addressing (2)
- Solutions
- Unique identifiers possible, but resolution
difficult - Solutions include DOIs, PURLs, OpenURLs, etc.
- Interest mostly in publishing sector
- "URIs dont break - people break them"
- Think about URL persistency naming
30Transport - The Original Roadmap
- HTTP/0.9 and HTTP/1.0
- Design flaws and implementation problems
- HTTP/1.1
- Addresses some of these problems
- 60 server support
- Performance benefits! (60 packet traffic
reduction) - Is acting as fire-fighter
- Not sufficiently flexible or extensible
- Radical redesign using object-oriented
technologies - Undergoing trials
- Gradual transition (using proxies)
31Transport - Today
- Today
- Responsibility for development moved from W3C to
IETF - Little progress with HTTP/NG
- Problems with HTTP/1.1
- Lengthy (176-page) specification without much
explicit rationale for design decisions - Environment has become more complex
- Lack of a clean underlying data model
- See Clarifying the Fundamentals of HTTP
- Simple Object Access Protocol
- Facilitates development of machine-to-machine
communications using Web protocols by providing a
richer XML-based messaging mechanism - A protocol for invoking methods on servers,
services, components and objects - Codifies existing practice of using XML and HTTP
as a method invocation mechanism - See FAQ at lthttp//www.develop.com/soap/soapfaq.h
- Metadata - the missing architectural component
from the initial implementation of the web
- Metadata Needs
- Resource discovery
- Content filtering
- Authentication
- Improved navigation
- Multiple format support
- Rights management
34Metadata Examples
- DSig (Digital Signatures initiative)
- Key component for providing trust on the web
- DSig 2.0 will be based on RDF and will support
signed assertion - This page is from the University of Bath
- This page is a legally-binding list of courses
provided by the University - P3P (Platform for Privacy Preferences)
- Developing methods for exchanging Privacy
Practices of Web sites and user - Note that discussions about additional rights
management metadata are currently taking place
- RDF (Resource Description Framework)
- Highlight of WWW 7 conference
- Provides a metadata framework ("machine
understandable metadata for the web") - Based on ideas from content rating (PICS),
resource discovery (Dublin Core) and site mapping
(MCF) - Applications include
- cataloging resources ? resource discovery
- electronic commerce ? intelligent agents
- digital signatures ? content rating
- intellectual property rights ? privacy
- See ltURL http//www.w3.org/Talks/1998/0417-WWW7-
36RDF Model
RDF Data Model
- Based on a formal data model (direct label
graphs) - Syntax for interchange of data
- Schema model
37Browser Support for RDF
Trusted 3rd Party Metadata
- Mozilla (Netscape's source code release) provides
support for RDF. - Mozilla supports site maps in RDF, as well as
bookmarks and history lists - See Netscape's or HotWired home page for a link
to the RDF file.
Embedded Metadata e.g. sitemaps
Image from http//purl.oclc.org/net/eric/talks/www
38RDF Conclusion
- RDF is a general-purpose framework
- RDF provides structured, machine-understandable
metadata for the Web - Metadata vocabularies can be developed without
central coordination - RDF Schemas describe the meaning of each property
name - Signed RDF is the basis for trust
- But
- Is it too complex?
- Is it the right approach?
39RSS An XML/RDF Application
- RSS (Rich / RDF Site Summary)
- Initially XML, now an RDF application
- Used for news feeds
- Lightweight approach that we should be
investigating (e.g. see news page on IWMW 2002
Web site)
See example of an RSS authoring tool and parser
at lthttp//rssxpress.ukoln.ac.uk/gt
40Model For News Feeds
Community(e.g. MIMAS)
Institution (e.g. Bath)
Zope CMS outputs to RSS XHTML
XHTML converted to RSS
- Good For User
- The end user can choose her news feeds, including
local news, news from JISC services and news from
third parties - Good For Service
- The service can chose its own information
- flow model. Its news is disseminated
External(e.g. BBC)
Structured database converted to RSS
41What About Tomorrow?
- Two interesting areas
- The Semantic Web
- Will allow intelligent agents to know about
resources - AI and ontologists meet the Web
- Uses RDF (Resource Description Framework) W3Cs
framework for metadata - Some concerns over scale of problem
- See lthttp//www.w3.org/2001/sw/gt
- Web Services
- Highlight of the WWW 10 and WWW 2002 conferences
42Web Services
- The Web
- Initially used for viewing static resources
- Then interactive services built (e.g. e-learning)
- We now want
- Programmable Web services which can be used by
other Web services using standards Web protocols
We have experience of the first generation of
externally-hosted Web services (stats services,
voting systems, etc.) - see lthttp//www.ariadne.a
c.uk/issue23/web-focus/gt. The next generation
will be programmable and machine-understandable No
te that concerns over outsourcing may be an issue
- Some examples at gotdotnet.com
- Mailsender
- Thumbnail Generator
- Concepts been around for some time (see Auditing
Evaluating Web Sites workshop) - Now being standardised (UDDI, WSDL, SOAP, )
44Weve Been Here Before
- Reusable components available on the network
- Sounds like COM/DCOM, CORBA, etc. for reusable
program components - Network services for use within a community
- Sounds like JISCmail, RDN, EDINA, MIMAS, BIDS,
Mirror Service and other JISC Services - Its outsourcing but its OK!
- Web Services And UK HE / FE Communities
- Sounds like a great idea
- Weve the organisational framework to develop
national services (JISC, etc.) - Weve got the network
- Weve a community which is willing to exploit
centrally-provided services and wants to avoid
reinventing the wheel (havent we?)
Local content
International content
National content
We should be moving away from providing separate
Web services with their own interfaces
End user
National content
International content
Local content
Collection Description(e.g. Agora)
User Profile(e.g. Headline)
Agora and headline are eLib Hybrid libraries
Authentication (Athens)
and separate metadata repositories and access
services (which are sometimes centralised)
Metadata Services / Access (Web) Services
Application Services?
Collection description
Brokered access provide byinstitutional
portal (MLE, )
User profile
.. and move to Web-accessible,
machine-understandable Web services as well as
seamless access to content
End user
48Other W3C Areas
- See
- W3C site map at lthttp//www.w3c.org/Help/siteindex
gt - TimBLs Web Design Issues at lthttp//www.w3c.org/D
esignIssuesgt - Web Architecture from 50,000 feet at
- Let us consider the following areas
- Content Management
- Systems Architecture
- Access (Browser support)
50Position Today
- What should we be doing today?
- Move away from creating new content in HTML
- Move to XHTML as part of the migration
- Deploying XML applications
- Storing structured information in a neutral
database - Using a CMS to manage our content
- Deploying B2B applications to avoid human
bottleneck (such as RSS)
Note that these are aspirations. We will, of
course, be constrained by existing systems,
resource implications, vested interests, inertia,
51The CMS To The Rescue
- HTML authoring tools have limitations (as has
HTML). - A CMS (Content Management System)
- Allows fragments to be managed
- Allows collections to be managed
- Allows resources to be stored in a neutral format
(backend database) - Allows resources to be reused
- Often provides access control
- Often provides workflow processes and project
- Issues
- CMS can be expensive
- CMS can be free but have support implications
- Which one to choose?
52Content Management
- Storing resources in HTML and GIF/JPEG is
- Easy to do and is a low cost solution
- Makes reuse and management of resources difficult
53Systems Architecture
- Issues for you to consider
- Operating SystemShould you go for a Unix OS or
Windows NT?If Unix, should you go for Linux? - Open Source vs Licensed SolutionShould you go
for an open source solution or buy a licensed
application? - Package vs Do It YourselfShould you make use of
a pre-packages solution or develop your own
solution based on a toolkit (e.g. database,
scripting language, )?
There are no global solutions your choice
should be based on expertise available locally,
resourcing issues, discussions with partners,
solutions provider, etc.
54Browser Issues
- Which approach to browser issues should you take?
Web sites should be usable to old browsers as
these are still in use and we aim to maximise
access. Therefore you should deliver HTML 2.0 /
3.2 avoid technologies such as JavaScript and
Old browsers are broken and fail to implement
technologies which provide (a) richer
functionality (b) support for new devices and (c)
better support for people with disabilities.
Therefore you should use the latest stable
versions of XHTML, CSS, etc.
- To conclude
- Standards are important
- HTML wont do the job
- XHTML is a useful transition
- Many new standards being developed
- Need to keep up to date and avoid devoloping
systems with built-in obsolescence - Well need a CMS to manage richly functional
institutional Web services - Web services should be important and we
shouldnt be too concerned about using remote