Using the WWW - PowerPoint PPT Presentation

1 / 55
About This Presentation
Title:

Using the WWW

Description:

5th CEENet Workshop on Network Technology, Budapest, Hungary, ... WebCrawler - http://www.webcrawler.com/ Nothern Light Search - http://www.northernlight.com ... – PowerPoint PPT presentation

Number of Views:92
Avg rating:3.0/5.0
Slides: 56
Provided by: lidijaf
Category:
Tags: www | using | webcrawler

less

Transcript and Presenter's Notes

Title: Using the WWW


1
Using the WWW
  • Miroslav Milinovic
  • Croatian Academic and Research Network - CARNet
  • Zagreb, Croatia

5th CEENet Workshop on Network Technology,
Budapest, Hungary, August 1999.
2
Using the WWW (Part 1)Introduction to WWW
information service
3
Content
  • Computer network ? Information network
  • WWW - important concepts (HTML, URL, HTTP)
  • How WWW works?
  • WWW client - browser
  • HTML file Web page
  • Netscape browser
  • Client - server communication
  • Server status codes
  • Active Web pages
  • Internationalization
  • Security

4
Computer network
5
Information network
6
WWW - World Wide Web
  • Distributed, multimedia information service based
    on hypertext
  • Distributed
  • information located on hosts around the world
  • Multimedia
  • information includes text, graphics, sound, video
  • Hypertext
  • hypertext techniques used to enable access to the
    information

7
WWW - important concepts
  • Uses Client/Server Architecture
  • client programs known as browsers
  • Netscape, MS IE, Amaya, Hot Java, Tango, Lynx,
    ...
  • Provides access to Internet resources
  • provides access to WWW resources as well as
    FTP, NEWS, Gopher, ...
  • brings together whole range of services

?
8
WWW - important concepts
  • WWW resources - documents are prepared using
    simple standard markup language which defines
    document
  • content, presentation, links to the other
    documents
  • Documents have a unique identifier
  • depends on their location on a particular host
  • Clients can communicate with any server
  • using correct protocol

?
9
WWW - important concepts
  • HTML - HyperText Markup Language
  • language for preparing the WWW documents
  • URL - Uniform Resource Locator
  • resource address unique identifier
  • HTTP - HyperText Transport Protocol
  • defines communication between WWW client and
    server

10
HTML
  • HTML is the native language of the WWW
  • HTML file Web page
  • standards
  • HTML 1.0, 2.0, 3.0., 3.2, 4.0,
  • browser extensions (Netscape, MS IE, ...)
  • other (VRML, DHTML, SMIL, MathML, CSS, XML, XSL,
    ...)
  • XHTML 1.0 (in draft)

11
URL - locating Internet resources
  • URL is unique identifier for Internet resources
  • indicates
  • means of access
  • location
  • simple syntax
  • protocol//host_nameport_num/path/file_name
  • example
  • http//www.ceenet.org/constitution.html

12
Internet resources identification
  • URI - Uniform Resource Identifier
  • URL - Uniform Resource Locator
  • PURL - Persistent URL
  • URN - Uniform Resource Name
  • URC - Uniform Resource Characteristics
  • data about the networked resource
  • metadata data about data

13
HTTP
  • application-level protocol
  • stateless
  • supports
  • use of URLs
  • Internet media types (MIME types
    RFC2045-RFC2049)
  • allows access to different data formats
  • standards
  • HTTP 1.0 (RFC 1945), HTTP 1.1 (RFC 2068, 01.97.)

14
How WWW works?
?
15
How WWW works?
16
WWW client - browser
  • retrieve (display if possible) various resources
  • can be
  • text-only (Lynx, ...)
  • graphic (Netscape, ...)
  • there are some differences in displaying HTML
    documents between different clients
  • can display a variety of formats
  • TEXT, GIF, JPEG, ...

?
17
WWW client - browser
  • has multiprotocol support
  • HTTP, FTP, GOPHER, NNTP, SMTP, POP, ...
  • can automatically launch helper application
    (viewer)to handle some data formats (sound,
    video, postscript, MS applications, ...)
  • plug-in extensions can be used to extend
    browser capabilities (3D animation, various
    graphics formats, ...)

18
HTML file Web page
HTML source
Web pagedisplayed by browser
19
Netscape browser
Title line
Menus
Icons
URL
HTML document
Hyperlink
Status line
20
Client - server communication
  • Simple client request (entered manually)
  • telnet www.srce.hr 80
  • Trying 161.53.2.69...
  • Connected to regoc.srce.hr.
  • Escape character is ''.
  • GET /index.html HTTP/1.0
  • ACCEPT /
  • USER-AGENT manually entered HTTP
  • (blank line)

?
21
Client - server communication
  • Server reply
  • HTTP/1.0 200 OK
  • Date Tue, 29 Jul 1997 125615 GMT
  • Server Apache/1.1.3
  • Content-type text/html
  • Content-length 2320
  • Last-modified Fri, 22 Nov 1996 100727 GMT
  • (blank line)
  • (content - document source)

22
Server status codes
  • Status codes are three digit numbers grouped as
    follows
  • 1xx - informational
  • 2xx - client request successful
  • 200 - OK
  • 3xx - request redirected
  • 4xx - client errors (request incomplete)
  • 403 - Forbidden
  • 404 - Not found
  • 5xx - server errors

23
Active Web pages
  • enhanced Web
  • two way interaction
  • page animation
  • browser intelligence
  • desktop integration
  • better multimedia
  • access to other systems
  • common examples
  • forms (feedback processing)
  • active maps (clickable maps)
  • (database) gateways

?
24
Active Web pages
  • techniques
  • CGI - Common Gateway Interface
  • WWW server communicates with other programs (CGI
    scripts)
  • SSI - Server Side Includes (.shtml)
  • API - Application Programming Interface
  • Cookies (making a browser remember)
  • scripting languages (embedded in HTML document)
  • Javascript, VBscript,
  • DHTML
  • Java (applets, servlets)
  • ActiveX

?
25
Active Web pages
  • Who is doing the job?
  • browser downloads and automatically executes
    program (Java applet)
  • OR
  • HTML document is generated on the server machine
    (by CGI script)

?
?
26
Active Web pages
27
Internationalization
  • originally
  • plain ASCII (Latin 1) English language
  • HTML internationalization (RFC 2070)
  • UNICODE language attribute in HTML
  • HTTP 1.1
  • enables charset and language negotiation

28
Security
  • plain WWW is not secure!
  • security on
  • content level
  • PGP, data encription
  • channel level
  • SSL (Secure Socket Layer)
  • message level
  • SHTTP, PEP, ...

29
Questions ?
30
Using the WWW (Part 2)Searching the Internet
31
Content
  • Internet information space
  • Searching with the WWW
  • Searching the WWW
  • Search Engines
  • Metasearch Engines
  • Subject Catalogs
  • Other tools
  • Conclusion on search tools
  • Selecting a tool
  • A Strategy?

32
Internet information space
  • is NOT unified
  • many subjects
  • different formats
  • different resources (information services)
  • various tools and techniquesfor searching and
    information retrieval
  • some information is not (yet)
  • published electronically
  • available on the Net

Internet
printed
WWW
33
Searching with the WWW
  • searching tools
  • many different tools
  • various concepts
  • specialized for chosen resources
  • WWW, gopher, Netnews, FTP, databases, ...
  • global or local scope
  • main problems quality and currency
  • there is NO perfect tool
  • user needs a strategy

34
Searching the WWW
  • Search Engines
  • Search Engines
  • Metasearch Engines (Unified Search Interfaces)
  • Subject Catalogs (Virtual Libraries)
  • Other tools
  • Multiple Search Interfaces
  • Information Gateways
  • Portals

35
Search Engines
  • automated systems
  • specially designed programs
  • robots, crawlers, spiders
  • fetch WWW documents
  • index those documents to build database
  • provide interface for user to search the database
  • query syntax
  • searching features
  • presentation of the results - hits (format,
    ranking)

?
36
Search Engines
?
37
Search Engines
  • examples
  • Alta Vista - http//altavista.digital.com/
  • excite! NetSearch - http//www.excite.com/
  • InfoSeek - http//www.infoseek.com/
  • HotBot - http//www.hotbot.com/
  • Lycos Search - http//www.lycos.com/
  • WebCrawler - http//www.webcrawler.com/
  • Nothern Light Search - http//www.northernlight.co
    m/
  • Google - http//www.google.com/
  • Ask Jeeves! - http//www.ask.com/
  • local (regional) search engines

?
38
Search Engines
  • query syntax and searching features
  • upper and lower case letters
  • John December
  • island
  • phrases (text in quotes - ...)
  • NASA space shuttle program
  • John December
  • Boolean operators (AND, OR, NOT) and parentheses
    (...)
  • vegetable AND green
  • fruit NOT apple
  • keyword control (, -)
  • film noir -pinot noir
  • pyton -monty

?
39
Search Engines
  • query syntax and searching features
  • proximity search (NEAR)
  • Internet NEAR training (Alta Vista)
  • keyword truncation ()
  • alumium
  • comput
  • cascade search (Infoseek)
  • resource control (AltaVista, HotBot, Infoseek)
  • titleInternet training
  • natural language searching (Ask Jeeves!)
  • ...

?
40
Search Engines
  • important characteristics
  • database (quantity quality)
  • query language
  • response time
  • ranking (hits)
  • output (format, available info)
  • additional features (cascade search, refine, )
  • ...

?
41
Search Engines
  • advantages
  • vast number of documents (over 100 million)
  • highly efficient searching and retrieval
  • automated production
  • disadvantages
  • no quality control
  • no classification
  • hits can be out of context
  • dead or out-of-date links, junk

42
Metasearch Engines
  • Unified Search Interfaces
  • automated systems
  • DO NOT build databases of their own
  • query other search engines
  • provide unified interface for user to search a
    number of databases (search engines) with one
    query

?
43
Metasearch Engines
  • examples
  • All4one - http//all4one.com/
  • Mamma - http//www.mamma.com/
  • MetaCrawler - http//www.metacrawler.com/
  • SavvySearch - http//www.savvysearch.com/

?
44
Metasearch Engines
  • important characteristics
  • number and selection of search engines covered
  • query language
  • response time
  • ranking (hits)
  • results (hits) merging
  • output (format, available info)
  • additional features
  • ...

?
45
Metasearch Engines
  • advantages
  • same as search engines
  • make use of search engines easier
  • disadvantages
  • same as search engines
  • unified query for all search engines means loss
    of additional capabilities of particular search
    engine
  • searching is slower

46
Subject Catalogs
  • Virtual Libraries, Subject Directories
  • collections of Internet resources descriptions
  • names, URLs, abstracts, ratings, ...
  • organized within hierarchical subject scheme
  • heuristic (subject based)
  • UDC, Dewey, ...
  • manually maintained
  • internal search

?
47
Subject Catalogs
  • examples
  • Yahoo - http//www.yahoo.com/
  • EINet Galaxy - http//galaxy.einet.net/
  • Magellan - http//magellan.excite.com/
  • NetGuide - http//www.netguide.com/
  • BUBL - http//bubl.ac.uk//link/
  • WWlib - http//www.scit.wlv.ac.uk/wwlib/
  • WWW.HR - http//www.hr/wwwhr/

?
48
Subject Catalogs
  • important characteristics
  • size
  • classification method
  • available info (about classified resources)
  • ranking
  • (internal) searching
  • additional features
  • ...

?
49
Subject Catalogs
  • advantages
  • classified into subject areas
  • manually reviewed resources (no junk)
  • internal search
  • disadvantages
  • manual maintenance
  • out-of-date information
  • catalogue (some parts) is not professional

50
Other tools
  • Multiple Search Interfaces
  • simple Web pages with interfaces to number of
    search tools
  • enable user to choose among listed tools
  • DO NOT build databases of their own
  • DO NOT act as Metasearch Engines
  • examples
  • All-in-One - http//www.allonesearch.com/
  • Easy Searcher - http//www.easysearcher.com/

?
51
Other tools
  • Information Gateways
  • dedicated to one subject (e.g. Social Sciences)
  • examples
  • SOSIG - http//sosig.ac.uk/
  • OMNI - http//www.omni.ac.uk/
  • And ...
  • electronic dictionaries, encyclopedias, guides,
    software collections, map collections,
    databases, tools for searching non-www
    resources,

52
Conclusion on search tools
  • each tool has advantages and disadvantages
  • new systems appear, old stagnate
  • CAUTION tools are text oriented
  • non-WWW resources are also covered
  • quality and currency
  • precision .vs. recall
  • cooperation between tools is a necessity
  • winner portal
  • hybrid tool (Search Engine with Catalog)
  • brings together many (all) network services

53
Selecting a tool
  • Search Engines
  • when you have good (precise) keywords (narrow
    topic)
  • Subject Catalogs
  • for look and feel
  • when you dont have good keywords (broad topic)
  • Information Gateways or other specialized tools
  • for quality (if you can find one)
  • Multiple Search Interfaces
  • useful to see what is available
  • Portals
  • tools for the future

54
A Strategy?
  • no searching system is prefect
  • be flexible and try different tools
  • compare results and gain experience
  • learn vocabulary, read HELP and FAQ
  • be focused (dont wander around)
  • concentrate on problem, not on tool (query)
  • use stepwise approach
  • refine query (keywords)

55
Questions ?
Write a Comment
User Comments (0)
About PowerShow.com