WIRED Week 2 - PowerPoint PPT Presentation

1 / 14
About This Presentation
Title:

WIRED Week 2

Description:

Web: WWW Wanderer, Web Crawler. Full text, HTML & links. AltaVista ... Acquisitions of Magellan, WebCrawler MyExcite - the Portal _at_Home (compete with AOL) ... – PowerPoint PPT presentation

Number of Views:42
Avg rating:3.0/5.0
Slides: 15
Provided by: bert189
Category:
Tags: wired | webcrawler | week

less

Transcript and Presenter's Notes

Title: WIRED Week 2


1
WIRED Week 2
  • Syllabus Update
  • Readings Overview

2
Why IR?
  • IR originally mostly for systems, not people
  • IR in the last 25 years
  • classification and categorization
  • systems and languages
  • user interfaces and visualization
  • A small world of concern
  • The Web changed everything
  • Huge amount of accessible information
  • Varied information sources
  • Relatively easy to look for information
  • Improving IR means improving learning
  • Digital technology changes everything (again)

3
WIRED Focus
  • Information Retrieval representation, storage,
    organization of, and access to information items
  • Focus is on the user information need
  • User information need
  • Find all docs containing information on Austin
    which
  • Are hosted by utexas.edu
  • Discuss restaurants
  • Emphasis is on the retrieval of information (not
    data, not just a keyword match)

4
The Search
  • Who is John Battelle?
  • Magazine Editor WIRED, The Industry Standard
  • Web 2.0 conference organizer
  • Business 2.0 magazine columnist
  • Federated Media Publishing
  • Boingboing.net manager

5
Database of Intentions
  • What do you think the database of intentions is?
  • Is it more than Googles Zeitgeist?
  • What were thinking about and interested in.
  • Everything we want to know and when we want to
    know it.
  • the aggregate results of every search ever
    entered, every result list ever tendered, and
    every path taken as a result (Battelle, p 6)
  • a real time history of post-Web culture (p 6)
  • What other databases like this are there?
  • How is this possible?

6
Searchiness?
  • The tasking of search?
  • Everything could be a search task?
  • Every task has an ad associated with it?
  • Our expectations are met and made with search.
  • How would the Web work without search?
  • Yahoo and email links, LOTS of email links
  • You are your clickstream?
  • Products services based on it
  • marketing, media, technology, pop culture,
    international law, and civil liberties (p 13)

7
Elements of Search
  • Crawl
  • Index
  • Runtime system (query processor)
  • Segments the data
  • Analyzes the Crawl
  • Optimizes everything
  • Interface
  • Query
  • Reults
  • Users

8
Search before Google
  • Traditional systems SMART (Salton)
  • Strongly typed information, (traditional
    databases)
  • Not always interactive or easy to use
  • Library Catalogs online
  • Controlled vocabulary limited records
  • Internet Archie Veronica
  • Titles only (mostly) over text
  • Web WWW Wanderer, Web Crawler
  • Full text, HTML links

9
AltaVista gets serious
  • Web now large enough to be a challenge
  • Now enough content that youd want to search it
  • Costs of hardware bandwidth falling
  • Parallel crawlers
  • Significant CPU resources
  • 1995 16 million documents
  • Why didnt people get it ?

10
The Web goes Pro
  • Lycos
  • Anchor text content location context
  • Yahoo
  • Directory clean interface for browsing links
  • Adversiting user (logs) analysis
  • AOL
  • Gateway to the internet for many
  • Excite
  • Consumer-driven, word relationships
  • Acquisitions of Magellan, WebCrawler
  • MyExcite - the Portal
  • _at_Home (compete with AOL)

11
Google is Born
  • Larry Page Sergey Brin
  • Links are the key (Bibliometrics)
  • Impact factor (link it if you like it)
  • Patterns of citation (links) expand the text
  • Defending setting the context of your work by
    associating it with others
  • Backrub
  • Crawl pages, store links, analyze them, publish
  • Large computing challenges
  • PageRank
  • Link counts with a recipe for deriving (relative)
    value
  • Value is who and their rank too

12
Google goes Pro
  • More resources for more data
  • Help with (significant) analysis design
  • Lack of commercial approach may have been a
    strength
  • Not ads, but just good search
  • Simple (non-existent) design of interface had an
    impact
  • More people getting online
  • Broadband adoption stabilizing browsers
  • Growing content (to say the least)

13
Assignments
  • Read weekly Primary Readings Participate in
    class discussions 10
  • Re-design Search Results interface 10
  • Web (log) analytics 25
  • Google 2010 (5 page paper) 10
  • Class Topic Presentation 15
  • Main Project 30

14
Projects and/or Papers Overview
  • How can (Web) IR be better?
  • Better IR models
  • Better User Interfaces
  • More to find vs. easier to find
  • Scriptable applications
  • New interfaces for applications
  • New datasets for applications
Write a Comment
User Comments (0)
About PowerShow.com