Title: Fixing The Web
1Fixing The Web
- Brian Kelly Email Address
- UK Web Focus B.Kelly_at_ukoln.ac.uk
- UKOLN URL
- University of Bath http//www.ukoln.ac.uk/
UKOLN is funded by the British Library Research
and Innovation Centre, the Joint Information
Systems Committee of the Higher Education Funding
Councils, as well as by project funding from the
JISCs Electronic Libraries Programme and the
European Union. UKOLN also receives support from
the University of Bath where it is based.
2Contents
- Introduction
- Web Standards Overview
- Web Standards
- Data Formats
- Transport
- Addressing
- Metadata
- Distributed Searching
- Authentication
- Deployment Issues
- Questions
- Aims of Talk
- To give brief overview of web architecture
- To identify problem areas in the Web
- To describe developments to web which are
addressing these problems - To provide an opportunity for discussion
3About UK Web Focus
- UK Web Focus
- JISC funded post based at UKOLN (Bath Univ)
- Advises UK HE community on web issues
- Represents JISC on W3C
4What Is The Web?
- What is your definition of the Web?
- A definition of the Web from Tim Berners-Lee, the
father of the web, is available at ltURL
http//www.w3.org/WWW/gt - The World Wide Web (known as "WWW', "Web" or
"W3") is the universe of network-accessible
information, the embodiment of human knowledge.
5How Does The Web Work?
- The web is based on 3 protocols
- URLs addresses of resources
- HTTP dialogue between client and server
- HTML format of resources
1 User clicks on link to the address
(URL)http//www.netsoft.com/hello.html
The Netsoft home page
2 Browser converts link to HTTP command
(METHOD) Connect to computer at
www.netsoft.com GET /hello.html
Welcome to Netsoft
3 Remote computer sends file
ltHTMLgt ltTITLEgtWelcomelt/TITLEgt.. ltPgtWelcome to
ltBgtNetsoftlt/Bgt
Web server
Web Browser (client)
4 Local computer displays HTML file
6What's It Used For?
- The simplicity of the web has resulted in
little-used academic tool (50 servers in early
1994) now being widely used in a wide range of
applications.
7Web Problem Areas
- What difficulties have you encountered when using
the web? - What other problem areas can you envisage?
8Hands-On Exercise
- Now spend some time accessing web resources.
- Make notes of any problem areas you come across.
9Review
10Fixing The Web
- We will now look at future developments to the
web - Much of the development work is being coordinated
by the W3C - W3C (World Wide Web Consortium)
- International consortium, with headquarters at
MIT, INRIA and Keio University (Japan) - Coordinates development of web protocols
- Four domains
- Architecture Technology Society
- User Interface Web Accessibility
11Standardisation
- Proprietary
- De facto standards
- Often initially appealing (cf PowerPoint, PDF)
- May emerge as standards
HTML extensions PDF and Java?
- W3C
- Produces W3C Recommendations on Web protocols
- Managed approach to developments
- Protocols initially developed by W3C members
- Decisions made by W3C, influenced by member and
public review
PNG HTML Z39.50 Java?
- ISO
- Produces ISO Standards
- Can be slow moving and bureaucratic
- Produce robust standards
- IETF
- Produces Internet Drafts on Internet protocols
- Bottom-up approach to developments
- Protocols developed by interested individuals
- "Rough consensus and working code"
HTTP URNwhois
PNG HTML HTTP
12The Web Vision
- Tim Berners-Lee's (and W3C's) vision for the Web
- Evolvability is critical
- Automation of information management If a
decision can be made by machine, it should - All structured data formats should be based on
XML - Migrate HTML to XML
- All logical assertions to map onto RDF model
- All metadata to use RDF
- See keynote talk at WWW 7 conference at ltURL
http//www.w3.org/Talks/1998/0415-Evolvability/sl
ide1-1.htmgt
13HTML 4.0, CSS 2.0 and DOM
- HTML 4.0 used in conjunction with CSS 2.0
(Cascading Style Sheets) and the DOM provides an
architecturally pure, yet functionally rich
environment
- HTML 4.0 - W3C-Rec
- Improved forms
- Hooks for stylesheets
- Hooks for scripting languages
- Table enhancements
- Better printing
- CSS 2.0 - W3C-Rec
- Support for all HTML formatting
- Positioning of HTML elements
- Multiple media support
- DOM - W3C-Rec
- Document Object Model
- Hooks for scripting languages
- Permits changes to HTML CSS properties and
content
- Problems
- Changes during CSS development
- Netscape IE incompatibilities
- Continued use of browsers with known bugs
14HTML 4.0
- HTML 4.0 extends HTML with mechanisms for
- style sheets
- scripting
- frames
- embedding objects
- improved support for right to left and mixed
direction text - richer tables, and enhancements to forms
- offering improved accessibility for people with
disabilities. - See ltURL http//www.w3.org/TR/REC-html40/gt
15Style Sheets
http//www.w3.org/Style/
- Cascading Style Sheets (CSS)
- CSS 2.0 is a W3C Recommendation
- External CSS files can minimise maintenance
- All HTML elements can be positioned anywhere on
page (cf PowerPoint) - Transition effects planned for CSS 3.0
- Support for accessibility
16WAI
http//www.cast.org/bobby/
- WAI
- Web Accessibility Initiative
- Universal access to web resources
- Accessibility checkers being developed, such as
Bobby - See ltURL http//www.w3.org/WAI/gt
17HTML Limitations
- HTML 4.0 / CSS 2.0 have limitations
- Difficulties in introducing new elements
- Time-consuming standardisation process (ltABBREVgt)
- Dictated by browser vendor (ltBLINKgt, ltMARQUEEgt)
- Area may be inappropriate for standarisation
- Covers specialist area (maths, music, ...)
- Application-specific (ltSTUD-NUMgt)
- HTML is a display (output) format
- HTML's lack of arbitrary structure limits
functionality - Find all memos copied to John Smith
- How many unique tracks on Jackson Browne CDs
18XML
- XML
- Extensible Markup Language
- A lightweight SGML designed for network use
- Addresses HTML's lack of evolvability
- Arbitrary elements can be defined
(ltSTUDENT-NUMBERgt, ltPART-NOgt, etc) - Agreement achieved quickly - XML 1.0 became W3C
Recommendation in Feb 1998 - Support from industry (SGML vendors, Microsoft,
etc.) - Support in Netscape 5 and IE 5
19XML Concepts
- Well-formed XML resources
- Make end-tags explicit ltLIgt...lt/LIgt
- Make empty elements explicit ltIMG .../gt
- Quote attributes ltIMG SRC"logo" HEIGHT"20"
- Use consistent upper/lower case
- Valid XML resources
- Need DTD
- XML Namespaces
- Mechanism for ensuring unique XML elements
- lt?xmlnsFOO"http//foo.org/1998-001" prefix"i"gt
- ltPgtInsert ltiPARTgtM-471lt/iPARTgtlt/Pgt
20XML Deployment
- Ariadne issue 15 has article on "What Is XML?"
- Describes how XML support can be provided
- Natively by new browsers
- Back end conversion of XML - HTML
- Client-side conversion of XML - HTML / CSS
- Java rendering of XML
- Examples of intermediaries
See http//www.ariadne.ac.uk/issue15/what-is/
21XLink, XPointer and XSL
- XLink will provide sophisticated hyperlinking
missing in HTML - Links that lead user to multiple destinations
- Bidirectional links
- Links with special behaviors
- Expand-in-place / Replace / Create new window
- Link on load / Link on user action
- Link databases
- XPointer will provide access to arbitrary
portions of XML resource - XSL stylesheet language will provide
extensibility and transformation facilities (e.g.
create a table of contents)
ltcommentary xmllink"extended" inline"false"gt
ltlocator href"smith2.1" role"Essay"/gt
ltlocator href"jones1.4" role"Rebuttal"/gt
ltlocator href"robin3.2" role"Comparison"/gt
lt/commentarygt
22XML Update
- Data / Schemas
- XML-Data Submitted to W3C Jan 98 (Obsolete?)
- Document Content Description Submitted Aug 98
- XSchema Independent effort
- Programming Interface
- DOM level 1 W3C Recommendation, May 98
- Style Presentation
- CSS level 2 W3C Recommendation, May 98
- Extensible Style Language Working Draft, Aug 98
- Relationship to Other Resources
- XLink , XPointer Working Drafts, Mar 98
- XML Namespaces Working Draft, Aug 98
- Query Languages
- XML Query Language Submitted to W3C Aug 98
- XQL Independent effort
23Addressing
- URLs (e.g. http//www.bristol-poly.ac.uk/depts/mu
sic/) have limitations - Lack of long-term persistency
- Organisation changes name
- Department shut down or merged
- Directory structure reorganised
- Inability to support multiple versions of
resources (mirroring) - URNs (Uniform Resource Names)
- Proposed as solution
- Difficult to implement (no W3C activity in this
area)
24Addressing - Solutions
- DOIs (Document Object Identifiers)
- Proposed by publishing industry as a solution
- Aimed at supporting rights ownership
- Business model needed
- PURLs (Persistent URLs)
- Provide single level of redirection
- Pragmatic Solution
- URLs don't break - people break them
- Design URLs to have long life-span
- Further information
- ltURL http//www.ukoln.ac.uk/metadata/resources/ur
n/gt - ltURL http//hosted.ukoln.ac.uk/biblink/wp2/links
.htmlgt
25Transport
- HTTP/0.9 and HTTP/1.0
- Design flaws and implementation problems
- HTTP/1.1
- Addresses some of these problems
- 60 server support
- Performance benefits! (60 packet traffic
reduction) - Is acting as fire-fighter
- Not sufficiently flexible or extensible
- HTTP/NG
- Radical redesign using object-oriented
technologies - Undergoing trials
- Gradual transition (using proxies)
- Integration of application (distributed
searching?)
26Metadata
- Metadata - the missing architectural component
from the initial implementation of the web
- Metadata Needs
- Resource discovery
- Content filtering
- Authentication
- Improved navigation
- Multiple format support
- Rights management
27Dublin Core
- The international Library and Information Science
community have developed a core set of attributes
for finding Internet resources known as Dublin
Core - Title Subject Description
- Source Language Relation
- Coverage Creator Publisher
- Contributor Rights Date
- Type Format Identifier
- See ltURL http//purl.org/metadata/dublin_coregt
28Dublin Core Applications
- An increasing number of search engines are
supporting Dublin Core.
The HotMeta service at ltURL http//www.dstc.edu.
au/RDU/HotMeta/qld/gt enables fielded searches to
be carried out.
29Metadata Examples
- DSig (Digital Signatures initiative)
- Key component for providing trust on the web
- DSig 2.0 will be based on RDF and will support
signed assertion - This page is from the University of Bath
- This page is a legally-binding list of courses
provided by the University - P3P (Platform for Privacy Preferences)
- Developing methods for exchanging Privacy
Practices of Web sites and user - Note that discussions about additional rights
management metadata are currently taking place
30RDF
- RDF (Resource Description Framework)
- Highlight of WWW 7 conference
- Provides a metadata framework ("machine
understandable metadata for the web") - Based on ideas from content rating (PICS),
resource discovery (Dublin Core) and site mapping
(MCF) - Based on a formal data model (direct label
graphs) - Applications include
- cataloging resources resource discovery
- electronic commerce intelligent agents
- intellectual property rights privacy
- See ltURL http//www.w3.org/Talks/1998/0417-WWW7-
RDFgt
31Browser Support for RDF
Trusted 3rd Party Metadata
- Mozilla (Netscape's source code release) provides
support for RDF. - Mozilla supports site maps in RDF, as well as
bookmarks and history lists - See Netscape's or HotWired home page for a link
to the RDF file.
Embedded Metadata e.g. sitemaps
Image from http//purl.oclc.org/net/eric/talks/www
7/devday/
32RDF Conclusion
- RDF is a general-purpose framework
- RDF provides structured, machine-understandable
metadata for the Web - Metadata vocabularies can be developed without
central coordination - RDF Schemas describe the meaning of each property
name - Signed RDF is the basis for trust
33Distributed Searching
- Distributed searching important for the DNER
(Distributed National Electronic Resource)
http//prospero.ahds.ac.uk8080/ahds_live/
AHDS prototype provides cross-searching using
Z39.50
ROADS prototype provides cross-searching using
whois
34How Metadata Could Be Used
- Issues
- Loss of visibility
- Performance, ..
- Database Description
- Music resources, including ...
- Policy (Terms Conditions / Resource and
Service) - For licensing reasons, access is restricted to
authorised HEIs - For performance reasons, access restricted
between 9-17.00 - The service logo must be included in results set,
unless results only come from service - Permission for cross-searching restricted to
other eLib projects - You're only allowed to link to the main entry
point - Individual
- Give me HTML or PDF resources, not Word,
- I'm blind. Include ACSS in results and deliver a
sitemap - Client Software
- My browser doesn't support XML,so send me HTML
35Technologies
- Number of formats and protocols could be used to
implement distributed searching. XML and RDF
plus - Z39.50ISO standard. Well-known in library
world, but heavy-weight - whoisLightweight IETF standard. Used in ANR
gateways, but not widely deployed - LDAPLightweight version of X.500 directory
service. - HTTP/NG?Opportunity to develop new solution
using OO technologies - IETF WebDav
- Requirement for distributed authoring include
author metadata and collection definitions. See
ltURL http//www.ietf.org/html.charters/webdav-ch
arter.htmlgt and ltURL http//www.ietf.org/ids.by.w
g/webdav.htmlgt
36Authentication
- Deployment of an open, scaleable, flexible
authentication system is difficult expensive - Current solutions include
- Server-based username and password schemes
- IP-based schemes
- Athens - Based on replicated Sybase application
See ltURL http//www.athens.ac.uk/gt - W3C DSig work - Digital Signatures Initiative.
See ltURL http//www.w3.org/DSig/gt - Other Public Key developments - e.g. reports of
Post Office involvement, statements from Tony
Blair, EU, .. - "In May 1998 the Commission published its
proposal for a "European Parliament and Council
Directive on a Common Framework for Electronic
Signatures" (COM(1998)297)."
37Certificates
- Commercially-supported digital ids, such as
Verisign's, could be used to authenticate
services - Can purchase server ID for 349
- End user certificates available
Browser Support
Use certificates to positively identify yourself,
certificate authorities andpublishers
Need for a certification infrastructure
38Deployment Issues
- More sophisticated deployment techniques can be
adopted to overcome deficiencies in simple model
Original Model
Web server simply sends file to client File
contains redundant information (for old browsers)
plus client interrogation support
HTML resource
Web server
Sophisticated Model
HTML / XML / databaseresource
IntelligentWeb server
Client proxy
Server proxy
- Intermediaries can provide functionality not
available at client - DOI support
- XML support / format conversion
- Authentication
Example of an intermediary
39Conclusions
- To conclude
- The Web will continue to develop
- Standards are important
- Proprietary solutions are often tempting because
- They are available
- They are often well-marketed and well-supported
- They may become standardised
- Solutions based on standards may not be properly
supported by applications - Metadata is big growth area
- Intermediaries may have a role to play in
deploying standards-based solutions - Intelligent servers likely to be important