Title: The%20Next%20Generation%20Web
1The Next Generation Web
- You see things and you say Why?But I dream
things that never were and I say Why not? - G. B. Shaw (Back to Methuselah)
- Michael B. Spring
- Department of Information Science and
Telecommunications - University of Pittsburgh
- spring_at_imap.pitt.edu
- http//www.sis.pitt.edu/spring
2Prelude
- When I try to explain the architecture now for
the semantic web, I get the same distant look in
peoples eyes as I did in 1989, when I tried to
explain how global hypertext the world wide web
would work. - Tim Berners-Lee, Weaving the Web, pp.194-195
3Overview
SimpleConcepts
- Context
- Changing nature of documents
- The current state of the WWW
- The next generation web
- Conceptually
- Infrastructure
- Proposed Architectures
- Issues and conclusions
Ideas for aTechnicalSolution
4Context
- The Web is a critical environment for information
exchange and is increasingly used by business - Documents are changing in form and function
computer mediation is making them more dynamic,
more interactive, and more personal - Powerful existing computing tools and
infrastructure to support collaboration,
distributed services, and agents are being
retooled for a second generation web
5Artifacts and process
- Humans use documents as tools to accomplish
different things - Explanation of ideas
- Communication of design
- Specification of agreements
- In addition to their main message, documents
provide ancillary information about work
processes (feedback and feedthrough) - The next generation web will embrace computer
mediated and manipulable document forms
6Types of documents
- Classifying documents by scope captures many of
the important distinctions between different
types. - Consider the difference in documents that are
- Personal
- Group
- Organizational
- Enterprise
- Archival
7Document Process matrix
8Digital Document Characteristics
- Digital documents have new features
- Components (charts, data sheets)
- Active links
- Versioning and commentary
- We work with digital documents differently
- Increased number of revisions, more copy and edit
- Outline and fill composition
- Computers can work with digital documents
- Automatic generation
- Agent based processing
9The Current State of the Web
- The web has begun to lose coherence -- locational
association is not enough - Disintermediated junk
- Unlinkable processes
- Insufficient inverted indices
- Classification is needed
- It requires a shared ontology
- It needs to be distributed
- It must be decentralized
2000
1990
Classification
300BC
Linking
10What the WWW is
- The web is distributed, decentralized and
scalable system with - A robust simple idempotent protocol (single
request response) - Based on a locational infrastructure (URLs and
DNS) - With an effective but costly envelop (TCP)
- And a growing ad hoc superstructure (server and
client sides)
11What the WWW is not
- Stateful
- Cookies and server side session management
struggle to provide a semblance of state - Organized
- Search engines and portals are veneers that
endeavor to provide organization - Designed for processes
- http is connectionless and cannot support
sessions - HTML is inadequate for data interchange
12The Next Generation Is Here
- When a CGI program takes data from a DBMS to
compose a web page with a form that is processed
by another CGI, that is in essence a web service - When a plug-in such as real-player accesses and
plays streaming media, we are in essence using a
web service - Instant messaging and file sharing services such
as Gnutella are better examples of services.
More fully integrated in browsers, they are web
services
13A Once and Future King
- The next generation is not really new --
distributed computing, based on a client server
model, has been with us for years. - RPC and RMI approaches to distributed computing
are multi-tiered, involve registration and
lookup, and make use of shared interfaces. - CORBA and DCOM provide process wrappers and
distributed lookups. - The next generation web is distributed computing
dynamic, late binding, peer to peer
architectures.
14The Next Generation WebConceptually
- Use the vast infrastructure of the web (millions
of clients and servers using http) to access
programs as well as static pages. - Develop additional infrastructure that will allow
highly distributed programs to operate securely
and productively over the web. - Build marketplaces that work.
15Infrastructure and Architecture
- The underlying infrastructure includes
- Component transaction monitors
- Abstract syntax
- Distributed directory services
- Security and authentication
- The architectural approaches are varied
- HPs E-service architectures (now absorbed)
- Suns JXTA (still emerging)
- Microsofts .NET (growing strong)
- W3Cs semantic web with its agent architectures
(???)
16CTM Infrastructure
- Component Transaction Monitors (CTMs) are CGIs on
steroids - Early efforts moved beyond CGI to servlet runners
and ASP/JSP with components/beans. - CTM moves beyond managing threads to advertising,
federation, transaction management, and security
management. - To deliver messages to components through http,
an envelop is needed. The Simple Object Access
Protocol (SOAP) is emerging as the protocol.
17XML Infrastructure
- Data Interchange
- In order to build documents that can be machine
processed, we need more precise control over the
structure of the document (we need to know there
is an author element). - We also need control over the content of the
document (we need to control what goes in the
author element). - XML provides both of these capabilities
18XML Infrastructure(continued)
- Given a clear model of XML documents
- XPath defines the ability to obtain any set of
nodes in a document - XSLT allows the transformation of document trees
- DOM and SAX provide application program
interfaces to these models - Schema allow extensible and modular extension of
definitions
19Directory Infrastructure
- Directories of objects are key to the
infrastructure. They will be distributed and
object oriented. - The X.500 Directory Service is being used via
LDAP(Lightweight Directory Access Protocols) - The directories will be used to allow resources
to advertise themselves so as to be discovered.
The Universal Description, Discovery and
Integration (UDDI) Protocol is emerging as a
leader here - There are a number of models for how directories
will be related.
20Security and Authentication
- CTMs will implement secure pipes of various types
between monitors to allow transactions that begin
and end behind firewalls to move across the
unprotected internet. - Certificates will be used as the basis for the
development of a web of trust that will allow for
authentication and non-repudiation
21Approaches to Marketplaces
- Michael Dertouzous suggested that this
environment we are moving toward most closely
resembles and information marketplace - HPs e-speak was an early contender
- SUNs JXTA has many similar characteristics, a
tight coupling to Java, and a simple Unix like
approach - Microsofts .NET is focused on bringing
Microsofts extensive resources to bear in easy
to use ways - The Semantic Web is a personal agent based view
of how this new web might be used
22What is Espeak (1)
- Espeak is a component transaction monitor
ESpeak Services
Consumer
23What is ESpeak (2)
- ESpeak is a distributed service
24What is ESpeak (3)
- ESpeak is a system that provides support for
- Advertising and finding based on vocabularies
- Authentication
- Multiple access methods
- Security
25ESpeak Advertising and Authentication
26ESpeak Access and Security
Secure
27What is ESpeak (4)
- Espeak is an extensible system with
decentralized control - Vocabularies
- Attributes and values
- Combined and advertised
- Services
- Basic services
- Infrastructure services
- Support services
28Vocabularies and Metaservices
29A Mind Set
- Millions of machines
- Marketplace makers
- Competing service providers
- Dynamic offerings
- Distributed certificate authorities
- Intermediaries
- Smart devices
- Smart consumers
30E-Marketplaces
31JXTA
- Peer to peer P2P computing model
- JXTA comes from juxtapose the peer to peer
model is juxtaposed against the traditional
client server model. - It consists of a federation of peers running JXTA
cores - The focus of this implementation is on truly ad
hoc peers
32JXTA Architecture
JXTA Application
JXTA Service
Pipes
Network Services
Peers
Peer Groups
Services
Discovery
Pipe Binding
Access
Propagation
Rendezvous
Relay
Routing
33JXTA Components
- Peers are the fundamental entities in JXTA
- They offer and/or consume services
- They have a unique ID but may be mobile
- Pipes are used to connect peers
- They are late binding and dynamic
- They are considered one way and unreliable
- Services consist of both peer services and
peer-group services which may be offered
redundantly by all members of a peer group
34JXTA Services
- Discovery along with propagation and rendezvous
are all related to the processes by which
entities and services become known - Advertisements are made by participants that are
propagated, cached by rendezvous services, such
that they might be discovered. - Once discovered, services may be engaged through
a series of messages
35JXTA Services (continued)
- The rendezvous service keeps track of known
advertisements and other rendezvous services
discovered services are cached. Queries, if not
answerable locally are propagated. - The router and relay services maintain
information about routes to peers and translation
between various transport services. - There are also security and authentication
services
36.NET
- Microsoft views the next generation web as a
challenge to its control of the desktop - .NET is Microsofts response
- .NET is a distributed peer to peer service
environment built on top of a series of
interfaces - Microsoft is aggressively upgrading their
existing software to include support for services - Microsoft is very conscious of the need to
promote this new environment via killer
applications
37Microsofts .NET
- .NET is built by extending and creating new tools
and services in three areas - A set of servers both modified existing (SQL
and Exchange) and new (BizTalk) - A set of tools the wizards in VB and C are
extended to encompass web services - New foundation services advertising,
notification, and security (Passport) that are a
part of the CTM.
38The .NET Interfaces
- XML is the language or abstract syntax for
protocols in .NET - SOAP is the envelop which will be used to
transmit the messages. It is used within an http
POST request to send a UDDI request or get a WSDL
interface. - UDDI is the set of protocols to used to
advertise, find, and combine services. It is
written in XML. - WSDL is the set of protocols used to define web
services. It is written in XML.
39.NET Seduction
- Microsoft will help users see how this new
environment will work by building new devices and
services - New devices that consume services
- Xbox
- Tablet PC
- Smart phone
- User experiences that promote the new environment
- Office will be .Net sensitive
- Microsoft Project will be .Net sensitive
- XP will be .Net sensitive
40The Semantic Web
- Tim Berners-Lee has described the next generation
web as the semantic web. - The semantic web would be manipulable by programs
- Like the current web, his vision is one that
- Is distributed and decentralized
- Must scale well and allow for imperfect
contributions - Must be such that the vast majority of the
contributors to the web can participate easily
41The Components
- XML provides an extensible language for document
and data interchange including - Query
- Linking
- RDF provides a mechanism for developing
- Descriptions of resources
- Descriptions of the descriptions (schema)
- Inference engines (agents) would have the ability
to discover resources based on schema analysis
42The Web Today
2. ObjectRequest
1. LocationLookup
43The Web with Descriptions
44Semantic Web
45Issues(1)
- A simple request could involve hundreds versus
10s of network hops - This would greatly increase bandwidth and the
impact of network delays - Preprocessing (DNS lookup pre LDAP) could reduce
steps - A component view could ease the burden by
dynamically passing requests off - Caching of information could greatly ease the
burden - Clients could cache schema and schema heads
- Schema servers could cache schema structures
46Issues(2)
- The appropriate level of logic and the
distribution of processing will be critical - What expressive power should be allowed (at what
computational cost) - Disjunction
- Negation
- What should the schema servers do?
- What can the directory servers do?
- What can the client do?
47Conclusions
- Infrastructure exists that can be used
- There are at least two visions operating
- A business service view -- .NET
- A personal agent view the semantic web
- JXTA appears capable of serving both visions
- It is not clear whether
- Any solution can be made simple enough
- A critical mass payoff can be achieved in
reasonable time