Informed Search and Applications

About This Presentation

Title:

Informed Search and Applications

Description:

Soumen Chakrabarti, Martin van den Berg, Byron Dom ... Bigram list has probabilities for each letter pair. Ratio of frequency of that pair compared to all ... – PowerPoint PPT presentation

Number of Views:18

Avg rating:3.0/5.0

Slides: 24

Provided by: Kathleen268

Learn more at: http://www1.cs.columbia.edu

Category:

more less

Transcript and Presenter's Notes

Title: Informed Search and Applications

1
Informed Search and Applications

Reading Recommended paper
Focused Crawling a new approach to
topic-specific Web resource discover,
Soumen Chakrabarti, Martin van den Berg, Byron
Dom
http//www.cs.berkeley.edu/soumen/doc/www1999f/pd
f/www1999f.pdf
Next class chapter 5

2
Homework Notes

What is a bigram frequency?
Bigram letter pair (a,b)
Bigram list has probabilities for each letter
pair. Ratio of frequency of that pair compared to
all possible combinations
How many random initial states have solutions?
Not very many. Just let your program run until
you get between 10-20 and base your averages on
that.

3
Local Search Algorithms

Operate using a single current state
Move only to neighbors of the state
Paths followed by search are not retained
Iterative improvement
Keep a single current state and try to improve it

4
Advantages to local search

Use very little memory usually a constant
amount
Can often find reasonable solutions in large or
infinite state spaces (e.g., continuous)
Unsuitable for systematic search
Useful for pure optimatization problems
Find the best state according to an objective
function
Traveling salesman

5
(No Transcript)
6
(No Transcript)
7
Steepest Ascent
8
(No Transcript)
9
Problems for hill climbing

When the higher the heuristic function the
better maxima (objective fns) when the lower
the function the better minima (cost fns)
Local maxima A local maximum is a peak that is
higher than each of its neighboring states, but
lower than the global maximum
Ridges a sequence of local maxima
Plateaux an area of the state space landscape
where the evaluation function is flat

10
Some solutions

Stochastic hill-climbing
Chose at random from among the uphill moves
First-choice hill climbing
Generates successors randomly until one is
generated that is better than current state
Random-restart hill climbing
Keep restarting from randomly generated initial
states, stopping when goal is found
Simulated annealing
Generate a random move. Accept if improvement.
Otherwise accept with continually decreasing
probability.
Local beam search
Keep track of k states rather than just 1

11
Online Search

Agent operates by interleaving computation and
action
No time for thinking
The agent only knows
Actions (s)
The step-cost function c(s,a,s)
Goal-test (s)
Cannot access the successors of a state without
trying all actions

12
Assumptions

Agent recognizes a state it has seen before
Actions are deterministic
Admissable heuristics
Competitive ratio Compare cost that agent
actually travels with cost of the actual shortest
path

13
What properties of search are desirable?

Will A work?
Expand nodes in a local order
Depth first
Variant of greedy search
Difference from offline search
Agent must physically backtrack
Record states to which agent can backtrack and
has not yet explored

14
Online DFS - setup

Inputs s, a percept that identifies the current
state
Static
result, a table indexed by action and state,
initially empty
unexplored a table that lists, for each visited
state, the actions not yet tried
unbacktracked a table that lists, for each
visited state, the backtracks not yet tried
s,a the previous state and action, initially null

15
Online DFS the algorithm

If Goal-test(s) then return stop
If s is a new state then unexploreds ?
actions(s)
If s is not null then do
resulta,s?s,resultreverse-a,ss
Add s to the front of unbacktrackeds
If unexploreds is empty
If unbacktrackeds is empty t hen return stop
Else a? action b such that resultb,spop(unback
trackeds)
Else a?pop(unexploreds)
s? s
Return a

16
Back to Online Search

Would hill-climbing be appropriate?

17
Learning Real Time A (LRTA)

Augment hill-climbing with memory
Store current best estimate of cost from node to
goal H(s)
Initially, H(s) h(s)
Update H(s) through experience
Estimated cost to reach the goal through neighbor
s
H(s) c(s,a,s) H(s)

18
Real-World Application for AI Search

The Problem
World Wide Web is a vast resource
Google reports indexing 3.3 billion web pages as
of 1/2004
About 600GB of text changes every month
A search engine requires continuous crawls to
index all pages
Inktomi
Cluster of 100s of Sun Sparc stations
75GB RAM each
1 TB disk
Crawls greater than 10 million pages/day

19
Alternative focused crawl

Start with a user specified topic hierarchy (like
Yahoo, but user specific)
Used for training the crawler to distinguish
relevant pages from irrelevant pages
Classifier a learned function that predicts
relevance
Resource discovery Starting from a node in the
hierarchy the crawler branches out to find other
relevant pages
Must determine which outgoing links are good ones
Simultaneously, the crawler runs a topic
distillation algorithm to identify hubs
Pages with large numbers of links to relevant
documents

20
Why is this an AI Search Problem?

What is the search space?
What is the goal test?
What is the heuristic function?

21
System Architecture

Classifier makes relevance judgments on pages
crawled to decide on link expansion
Distiller determines a measure of centrality of
crawled pages to determine visit priorities
Crawler dynamically reconfigurable priority
controls governed by classifier and distiller

22
Distillation

A good strategy for the crawler is to identify
hubs pages that are almost exclusively
collection of links to authoritative resources
that are relevant to the topic
Authorities a web page with many incoming links,
particularly from high-prestige, relevant pages

23
Identifying hubs and authorities

Each node has two scores, iteratively determined
a(v) number of incoming edges from relevant
nodes
h(v) number of outgoing edge to relevant nodes
Weight these scores by the relevance scores of
the pages they point to (a probability between 0
and 1)

Write a Comment

User Comments (0)