Informed Search and Applications - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

Informed Search and Applications

Description:

Soumen Chakrabarti, Martin van den Berg, Byron Dom ... Bigram list has probabilities for each letter pair. Ratio of frequency of that pair compared to all ... – PowerPoint PPT presentation

Number of Views:18
Avg rating:3.0/5.0
Slides: 24
Provided by: Kathleen268
Category:

less

Transcript and Presenter's Notes

Title: Informed Search and Applications


1
Informed Search and Applications
  • Reading Recommended paper
  • Focused Crawling a new approach to
    topic-specific Web resource discover,
  • Soumen Chakrabarti, Martin van den Berg, Byron
    Dom
  • http//www.cs.berkeley.edu/soumen/doc/www1999f/pd
    f/www1999f.pdf
  • Next class chapter 5

2
Homework Notes
  • What is a bigram frequency?
  • Bigram letter pair (a,b)
  • Bigram list has probabilities for each letter
    pair. Ratio of frequency of that pair compared to
    all possible combinations
  • How many random initial states have solutions?
    Not very many. Just let your program run until
    you get between 10-20 and base your averages on
    that.

3
Local Search Algorithms
  • Operate using a single current state
  • Move only to neighbors of the state
  • Paths followed by search are not retained
  • Iterative improvement
  • Keep a single current state and try to improve it

4
Advantages to local search
  • Use very little memory usually a constant
    amount
  • Can often find reasonable solutions in large or
    infinite state spaces (e.g., continuous)
  • Unsuitable for systematic search
  • Useful for pure optimatization problems
  • Find the best state according to an objective
    function
  • Traveling salesman

5
(No Transcript)
6
(No Transcript)
7
Steepest Ascent
8
(No Transcript)
9
Problems for hill climbing
  • When the higher the heuristic function the
    better maxima (objective fns) when the lower
    the function the better minima (cost fns)
  • Local maxima A local maximum is a peak that is
    higher than each of its neighboring states, but
    lower than the global maximum
  • Ridges a sequence of local maxima
  • Plateaux an area of the state space landscape
    where the evaluation function is flat

10
Some solutions
  • Stochastic hill-climbing
  • Chose at random from among the uphill moves
  • First-choice hill climbing
  • Generates successors randomly until one is
    generated that is better than current state
  • Random-restart hill climbing
  • Keep restarting from randomly generated initial
    states, stopping when goal is found
  • Simulated annealing
  • Generate a random move. Accept if improvement.
    Otherwise accept with continually decreasing
    probability.
  • Local beam search
  • Keep track of k states rather than just 1

11
Online Search
  • Agent operates by interleaving computation and
    action
  • No time for thinking
  • The agent only knows
  • Actions (s)
  • The step-cost function c(s,a,s)
  • Goal-test (s)
  • Cannot access the successors of a state without
    trying all actions

12
Assumptions
  • Agent recognizes a state it has seen before
  • Actions are deterministic
  • Admissable heuristics
  • Competitive ratio Compare cost that agent
    actually travels with cost of the actual shortest
    path

13
What properties of search are desirable?
  • Will A work?
  • Expand nodes in a local order
  • Depth first
  • Variant of greedy search
  • Difference from offline search
  • Agent must physically backtrack
  • Record states to which agent can backtrack and
    has not yet explored

14
Online DFS - setup
  • Inputs s, a percept that identifies the current
    state
  • Static
  • result, a table indexed by action and state,
    initially empty
  • unexplored a table that lists, for each visited
    state, the actions not yet tried
  • unbacktracked a table that lists, for each
    visited state, the backtracks not yet tried
  • s,a the previous state and action, initially null

15
Online DFS the algorithm
  • If Goal-test(s) then return stop
  • If s is a new state then unexploreds ?
    actions(s)
  • If s is not null then do
  • resulta,s?s,resultreverse-a,ss
  • Add s to the front of unbacktrackeds
  • If unexploreds is empty
  • If unbacktrackeds is empty t hen return stop
  • Else a? action b such that resultb,spop(unback
    trackeds)
  • Else a?pop(unexploreds)
  • s? s
  • Return a

16
Back to Online Search
  • Would hill-climbing be appropriate?

17
Learning Real Time A (LRTA)
  • Augment hill-climbing with memory
  • Store current best estimate of cost from node to
    goal H(s)
  • Initially, H(s) h(s)
  • Update H(s) through experience
  • Estimated cost to reach the goal through neighbor
    s
  • H(s) c(s,a,s) H(s)

18
Real-World Application for AI Search
  • The Problem
  • World Wide Web is a vast resource
  • Google reports indexing 3.3 billion web pages as
    of 1/2004
  • About 600GB of text changes every month
  • A search engine requires continuous crawls to
    index all pages
  • Inktomi
  • Cluster of 100s of Sun Sparc stations
  • 75GB RAM each
  • 1 TB disk
  • Crawls greater than 10 million pages/day

19
Alternative focused crawl
  • Start with a user specified topic hierarchy (like
    Yahoo, but user specific)
  • Used for training the crawler to distinguish
    relevant pages from irrelevant pages
  • Classifier a learned function that predicts
    relevance
  • Resource discovery Starting from a node in the
    hierarchy the crawler branches out to find other
    relevant pages
  • Must determine which outgoing links are good ones
  • Simultaneously, the crawler runs a topic
    distillation algorithm to identify hubs
  • Pages with large numbers of links to relevant
    documents

20
Why is this an AI Search Problem?
  • What is the search space?
  • What is the goal test?
  • What is the heuristic function?

21
System Architecture
  • Classifier makes relevance judgments on pages
    crawled to decide on link expansion
  • Distiller determines a measure of centrality of
    crawled pages to determine visit priorities
  • Crawler dynamically reconfigurable priority
    controls governed by classifier and distiller

22
Distillation
  • A good strategy for the crawler is to identify
    hubs pages that are almost exclusively
    collection of links to authoritative resources
    that are relevant to the topic
  • Authorities a web page with many incoming links,
    particularly from high-prestige, relevant pages

23
Identifying hubs and authorities
  • Each node has two scores, iteratively determined
  • a(v) number of incoming edges from relevant
    nodes
  • h(v) number of outgoing edge to relevant nodes
  • Weight these scores by the relevance scores of
    the pages they point to (a probability between 0
    and 1)
Write a Comment
User Comments (0)
About PowerShow.com