Title: Nessun titolo diapositiva
1 Site Explorer Server an integrated,
client-server, query system for Web
sites Giancarlo Bongiovanni, Flavio Fontana,
Stefano Borghetti Dept. Of Computer Science,
University of Rome, La Sapienza ENEAs
Usability Lab
2- Summary
- Introduction
- Information Retrieval Systems and keyword score
- Search engines
- Internet now and the future
- Java
- Site Explorer Server v2.0
- Conclusion and experimental results
- Future works
3Information Search in Internet
Internet is the biggest and the most widespread
network
Internet
- Index
- Introduction
- Information Retrieval Systems and keyword score
- Search engines
- Internet now and the future
- Java
- Site Explorer Server v2.0
- Conclusion and experimental results
- Future works
4Internet
?
Issue Information search in Internet could be a
problem for particular type of users?
- Index
- Introduction
- Information Retrieval Systems and keyword score
- Search engines
- Internet now and the future
- Java
- Site Explorer Server v2.0
- Conclusion and experimental results
- Future works
Today a better scenario
- Users problems related to information search
- Many users dont know the Web information model
- Users have problems to find a valid tools able to
locate the relevant information - Users have problems to describe searched
information using right and concise terms - Users have problems to use advanced search tools
(i.e. Site Explorer Server is more difficult to
use rather than browser)
5New search and exploration tools
New and alternative Web approach to traditional
browser
Implementation of a Client/Server tools able to
make Web IR using Java, experimented and tested
ENEA
- Index
- Introduction
- Information Retrieval Systems and keyword score
- Search engines
- Internet now and the future
- Java
- Site Explorer Server v2.0
- Conclusion and experimental results
- Future works
Tool integrated with browser
Network service
6Gerard Salton, Introduction to modern information
retrieval, Ed. 1983, McGraw-Hill, Inc.
IRS
- Index
- Introduction
- Information Retrieval Systems and keyword score
- Search engines
- Internet now and the future
- Java
- Site Explorer Server v2.0
- Conclusion and experimental results
- Future works
- Query formulation by user
7Query formulation is a list of terms able to
express and summarize the searched argument
IRS
- Boolean Systems combine the terms using boolean
operators - and
- or
- andnot
Examples
Information and retrieval Information or
retrieval Information andnot retrieval
Operatori booleani
- Index
- Introduction
- Information Retrieval Systems and keyword score
- Search engines
- Internet now and the future
- Java
- Site Explorer Server v2.0
- Conclusion and experimental results
- Future works
- Extended boolean systems use additional
operators - nearness of terms
- cutting of terms
- search using particular field
Examples
Information adj retrieval Inform Information in
titolo
Operatori estesi
In Ranking systems query formulation is made
using natural language phrases
Examples
Uman influence in Information Retrieval systems
Ranking
8Indexing is a process to analyse documents and to
provide a short contents rapresentation.
IRS
Rapresentation is based on a keyword vector.
These keywords are choosen by a manual process or
are extracted by an authomatic process
Example
Information Retrieval Data Structure
Algorithms
- Index
- Introduction
- Information Retrieval Systems and keyword score
- Search engines
- Internet now and the future
- Java
- Site Explorer Server v2.0
- Conclusion and experimental results
- Future works
Terms vector
ltinformation, retrieval, data-strucuture,
alghoritmsgt
Example
List, tree, index file, etc.
Data structure to contains document rapresentation
Data structures
Example
A file where every record describe the releted
record with each particular term
Iverted indexing
9In traditional IRS the result is a potential
relevant document list
Gerard Salton, Introduction to modern information
retrieval, Ed. 1983, McGraw-Hill, Inc.
IRS
William B. Frakes, Ricardo Baeza-Yates,
Information Retrieval Data Structure
Algorithms, Ed. 1992, Prentice Hall, Inc.
- Index
- Introduction
- Information Retrieval Systems and keyword score
- Search engines
- Internet now and the future
- Java
- Site Explorer Server v2.0
- Conclusion and experimental results
- Future works
Documents ordinated by relevance level
Resuls order
Explicit measure of relevance level (score)
Dynamic presentation (results manipulation)
Graphic and direct method presentations
New features
Multimedia integration
Use of windows (different way to present the
results)
10Information Retrieval Systems
Calcolo dello score
Score compute is focused to measure the relevance
of specific terms in specific documents
IRS
Key point in score compute
Example
A method to weight the term relevance in the
whole document collection
- Index
- Introduction
- Information Retrieval Systems and keyword score
- Search engines
- Internet now and the future
- Java
- Site Explorer Server v2.0
- Conclusion and experimental results
- Future works
(Sparck Jones, 1972)
(Dennis, 1967)
Example
Frequence normalization for particular document
collection
(Croft, 1983)
(Harman, 1986)
Compute of a term weght for a document Term
frequence in the document term relevance
weigth in the collection
- Compute the score
- Boolean system use SOP method
- Ranking system use particular formula.
11 Web interface (Query and results)
Index DB
SIMILAR
Web pages
Authomatic indexing system
- Index
- Introduction
- Information Retrieval Systems and keyword score
- Search engines
- Internet now and the future
- Java
- Site Explorer Server v2.0
- Conclusion and experimental results
- Future works
- New functionality in the most popular search
engine - Sites classification
- Integration of new advanced search services to
search information in particular format (picture,
sounds, MP3, e-mail etc.) - not much search engines provide a document score
- Migration from search service to on-line seller
guides
Media Matrix - June 1999
12Internet
Source FIND/ITPD, III, Gennaio 1999 - NII
project, supported by DOIT, MOEA
- Index
- Introduction
- Information Retrieval Systems and keyword score
- Search engines
- Internet now and the future
- Java
- Site Explorer Server v2.0
- Conclusion and experimental results
- Future works
13Internet
Source FIND/ITPD, III, Gennaio 1999 - NII
project, supported by DOIT, MOEA
- Index
- Introduction
- Information Retrieval Systems and keyword score
- Search engines
- Internet now and the future
- Java
- Site Explorer Server v2.0
- Conclusion and experimental results
- Future works
14Internet
Source FIND/ITPD, III, Gennaio 1999 - NII
project, supported by DOIT, MOEA
- Index
- Introduction
- Information Retrieval Systems and keyword score
- Search engines
- Internet now and the future
- Java
- Site Explorer Server v2.0
- Conclusion and experimental results
- Future works
15Internet
Source FIND/ITPD, III, Gennaio 1999 - NII
project, supported by DOIT, MOEA
- Index
- Introduction
- Information Retrieval Systems and keyword score
- Search engines
- Internet now and the future
- Java
- Site Explorer Server v2.0
- Conclusion and experimental results
- Future works
16Internet
- Index
- Introduction
- Information Retrieval Systems and keyword score
- Search engines
- Internet now and the future
- Java
- Site Explorer Server v2.0
- Conclusion and experimental results
- Future works
- Future Tracks
- Research and technologies
- Educational
- The Public Administration
- E-commerce
17Main features
Technologies
Applet
Oriented to Graphic User Interfaces implementation
Multithread
Client
- Index
- Introduction
- Information Retrieval Systems and keyword score
- Search engines
- Internet now and the future
- Java
- Site Explorer Server v2.0
- Conclusion and experimental results
- Future works
Object-oriented
Site Explorer Server v2.0
Oriented to Client/Server systems implementation
Dynamic
Portable
High functionalities for networking
Platform independence
Server
18Goals - To implement a new system
able to work directly on Web
able to helps the user to find interesting
documents on Web
- Index
- Introduction
- Information Retrieval Systems and keyword score
- Search engines
- Internet now and the future
- Java
- Site Explorer Server v2.0
- Conclusion and experimental results
- Future works
with an high usability degree
- able to integrate
- search functions
- alternative approach rather than browser
- management functions
- user position to access to the Web etherogeneous
data using a unique way.
19Site Explorer Server v.2.0. AClient/Server
system, implemented using Java, able to make
automatic Web site analyse, and to provide, as
result, the tree site structure where the root
node represents the site home-page.
- Focused on information search and retreiving by
keywords search approach - an easy information-filtering service
- a score computation service
- user management
Additional features
- Index
- Introduction
- Information Retrieval Systems and keyword score
- Search engines
- Internet now and the future
- Java
- Site Explorer Server v2.0
- Conclusion and experimental results
- Future works
User
Client
A network service
An accessible (open to everybody) open and
multi-platform service
Interface
INTERNET
Web site
Site Explorer Server
20- Index
- Introduction
- Information Retrieval Systems and keyword score
- Search engines
- Internet now and the future
- Java
- Site Explorer Server v2.0
- Conclusion and experimental results
- Future works
- Client/Server system
- The Server (SES) is a Java application
- The Client (SEJA) is a Java applet
- SES and SEJA speak using a dedicated Application
layer protocol (SEP)
Technical features
21Query selector process
Query
USER
Web sites
HTTP connection process
- Index
- Introduction
- Information Retrieval Systems and keyword score
- Search engines
- Internet now and the future
- Java
- Site Explorer Server v2.0
- Conclusion and experimental results
- Future works
Links extraction process
Contents extraction process
Keywords analisys process
Score process
Result builder
Next sites page
Client user interface
Result-display process
Result
22Site Explorer Server v2.0
- Index
- Introduction
- Information Retrieval Systems and keyword score
- Search engines
- Internet now and the future
- Java
- Site Explorer Server v2.0
- Conclusion and experimental results
- Future works
- full-text document analyse
- Links cheking using connection requests
- HTML 4 oriented
Features
23- Three score level
- Level 1 score. Its based only on the keywords
items inside the Web page. - Level 2 score. Its also based on the keywords
distribution inside the whole Web site. - Level 3 score. Its based also on the position of
keywords items inside the Web page structure.
- Index
- Introduction
- Information Retrieval Systems and keyword score
- Search engines
- Internet now and the future
- Java
- Site Explorer Server v2.0
- Conclusion and experimental results
- Future works
24Menù-bar
Tool-bar
Displayed result
Tree structure area
Retrieved object in Web site
Textual area
Multimedia area
State bar
State indicator
25(No Transcript)
26Connessione al server
27Indicatore di connessine attiva
28New site analyse request
29Use of a favorite site analyse request
30Use of a pre-defined site analyse request
31Receiving result
32Results navigation
33Results browsing
34(No Transcript)
35Lo Usability Lab (Ulab), istituito nel 1992
presso il pilot-center del progetto ESPRIT III
VENUS e svolge unattività di Ricerca Sviluppo
nel campo delle interfacce visuali avanzate a
basi di dati e sistemi informativi multimediali
in rete.
- Index
- Introduction
- Information Retrieval Systems and keyword score
- Search engines
- Internet now and the future
- Java
- Site Explorer Server v2.0
- Conclusion and experimental results
- Future works
- Macchine di sviluppo e test
- Intel Pentium II 350Mhz / Windows 98 (Netlab)
- Intel Pentium MMX 166Mhz / Windows 95
(Fontanaulab) - AMD K6 300Mhz/ Windows 98 (Ulab)
- Sun Sparc Station 5 / Unix Solaris 2.5 (Venus)
- Sun Sparc Station 10 / Unix Solaris 2.5 (Dafne)
- Strumenti software
- JDK v1.1.6, JDK v1.1.7, JDK v1.1.7a, JDK v1.17b,
JDK 1.1.8 - Edit, Netbeans
- Java Swing v1.0.3, Java Media Framework v1.1
36- A strong system
- good/exellent usability degree
- A good response time (Analyse and result build)
- Index
- Introduction
- Information Retrieval Systems and keyword score
- Search engines
- Internet now and the future
- Java
- Site Explorer Server v2.0
- Conclusion and experimental results
- Future works
- 50 users selected using ENEA/VENUS methodology
- random user. Occassional system use.
- Professional users System user related to their
work. - Expert user.
37- G7 Global-Inventory project
- A project data card collection
- Site search engine vs Site Explorer Server
- Index
- Introduction
- Information Retrieval Systems and keyword score
- Search engines
- Internet now and the future
- Java
- Site Explorer Server v2.0
- Conclusion and experimental results
- Future works
Plus - Prosoma LinkUp Service A multimedia data
card collection
Experimental sites ULAB sites
- Future testing
- Virtual Lab Site
- FAD
38Esplorazione dei link
LinkBot - Analisi dei link
Site Explorer - Costruzione di un albero per un
singolo sito SurfMap JavaNavigator
Applet per navigazione su mappa
- Index
- Introduction
- Information Retrieval Systems and keyword score
- Search engines
- Internet now and the future
- Java
- Site Explorer Server v2.0
- Conclusion and experimental results
- Future works
Ricerca su un sito
PersonalSearch applet come motore di ricerca per
un sito Virgilio - Funzione di ricerca su un sito
Esplorazione e rappresentazione di un sito
HyperSystem Net40 - esplora un sito e ne da una
rappresentazione ad albero permettendo la
navigazione
Navigazione su mappa e funzione di ricerca
MerzeScope applet di navigazione su un grafo con
funzione di ricerca per un solo sito
39A totally modular internal architecture to be
able to add new modules and news functions in the
simplest and most dynamic way.
- Index
- Introduction
- Information Retrieval Systems and keyword score
- Search engines
- Internet now and the future
- Java
- Site Explorer Server v2.0
- Conclusion and experimental results
- Future works
The implementation of a user profile system based
on the users interests constantly updateable by
a feed-back technique.
The insertion of a new system agent able to make
automatic off-line Web site analysis to suggest
to the user, using his profile information, a set
of query about specific themes.