My Projects Yahoo - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

My Projects Yahoo

Description:

Query Disambiguation: identify ambiguous queries (queries ... lyrics to let go 0.1333 home depot 0.9969. atlantic city hotels 0.6298 jessica simpson 0.9975 ... – PowerPoint PPT presentation

Number of Views:83
Avg rating:3.0/5.0
Slides: 23
Provided by: Afsa7
Category:
Tags: depot | home | projects | yahoo

less

Transcript and Presenter's Notes

Title: My Projects Yahoo


1
My Projects _at_ Yahoo!
  • Afsaneh Shirazi
  • ScUBA team

2
Outline
  • Unit Strength assign a strength score to units
  • Query Disambiguation identify ambiguous queries
    (queries that have multiple different meanings)

3
Unit Strength Motivation
  • Unit sequence of words that represents a
    distinct concept
  • Examples
  • new york
  • times square
  • beyonce lyrics
  • harry potter and the prisoner of azkaban
  • Question how strong is a unit? (we can not
    decompose it into two)

new / york
beyonce / lyrics
4
Score for Multi-Word Units
  • Remove u from the list of units
  • Decompose unit u into sub-units uu1u2un
  • Example
  • u new york u1 new u2 york

  • Score(u) freq(u)/ ( Kfreq(u1, , un) )

5
Example Score(new york)
  • Score(u) freq(u)/ ( Kfreq(u1, , un) )
  • K is a smoothing constant
  • freq(u1, , un) is the frequency of bag of words
    u1, , un in any order

new york mayor of new york new york n
ew city of york university of york new building

new york york new
freq(u)
freq(u1,u2)
With one pass through all queries the numbers can
be extracted
6
Score for single-Word Units
  • Based on unit frequency
  • Score(u) freq(u)/ ( Kfreq(u) )
  • u beyonce
  • K 8, freq(u) 37286 Score 0.9997
  • u zzzzz
  • K 8, freq(u) 7 Score 0.5333

Threshold is around 0.8
7
Results
  • Weak Strong
  • online dictionary 0.73 real estate 0.9916
  • free online games 0.5277 los angeles 0.9954
  • beyonce lyrics 0.2413 bank of america
    0.9744
  • oops i did it again lyrics 0.75 yellow pages
    0.997
  • lyrics to let go 0.1333 home depot 0.9969
  • atlantic city hotels 0.6298 jessica simpson
    0.9975
  • hotels in singapore 0.3417 american express
    0.9169
  • katie couric pictures 0.7603 music videos
    0.979
  • top 20 songs 0.6095 best western hotels
    0.9926
  • breakfast recipes 0.6872 federal credit union
    0.9901
  • new york subway map 0.3894 days of our lives
    0.9972
  • best love songs 0.5063 country love songs
    0.9755
  • kidney infection symptoms 0.765 kelley blue
    book 0.9892
  • chicago suburbs 0.5417 myspace layouts
    0.857

8
Results
  • Weak Strong
  • online dictionary 0.73 real estate 0.9916
  • free online games 0.5277 los angeles 0.9954
  • beyonce lyrics 0.2413 bank of america
    0.9744
  • oops i did it again lyrics 0.75 yellow pages
    0.997
  • lyrics to let go 0.1333 home depot 0.9969
  • atlantic city hotels 0.6298 jessica simpson
    0.9975
  • hotels in singapore 0.3417 american express
    0.9169
  • katie couric pictures 0.7603 music videos
    0.979
  • top 20 songs 0.6095 best western hotels
    0.9926
  • breakfast recipes 0.6872 federal credit union
    0.9901
  • new york subway map 0.3894 days of our lives
    0.9972
  • best love songs 0.5063 country love songs
    0.9755
  • kidney infection symptoms 0.765 kelley blue
    book 0.9892
  • chicago suburbs 0.5417 myspace layouts
    0.857

9
Results
  • Weak Strong
  • online dictionary 0.73 real estate 0.9916
  • free online games 0.5277 los angeles 0.9954
  • beyonce lyrics 0.2413 bank of america
    0.9744
  • oops i did it again lyrics 0.75 yellow pages
    0.997
  • lyrics to let go 0.1333 home depot 0.9969
  • atlantic city hotels 0.6298 jessica simpson
    0.9975
  • hotels in singapore 0.3417 american express
    0.9169
  • katie couric pictures 0.7603 music videos
    0.979
  • top 20 songs 0.6095 best western hotels
    0.9926
  • breakfast recipes 0.6872 federal credit union
    0.9901
  • new york subway map 0.3894 days of our lives
    0.9972
  • best love songs 0.5063 country love songs
    0.9755
  • kidney infection symptoms 0.765 kelley blue
    book 0.9892
  • chicago suburbs 0.5417 myspace layouts
    0.857

10
Outline
  • Unit Strength assign a strength score to units
  • Query Disambiguation identify ambiguous queries
    (queries that have multiple different meanings)

11
Motivation
  • Question when a user enters jaguar what are
    the possible meanings jaguar cat, jaguar car
    or

Jaguar
?
12
Applications (Search)
  • If the user is looking for jaguar cat he will
    be disappointed by a page full of car related
    results
  • The aim
  • Identifying ambiguous queries
  • Showing some links from each meaning to satisfy
    all users

13
Overview of Method
  • Extracting meanings from query pairs
  • Adding named-entity information
  • Adding wikipedia disambiguation information
  • Result classify a query as ambiguous or not
    give some suggestions about possible meanings

14
Extracting Possible Meanings
  • Query pairs jaguar ? jaguar animal
  • User is not happy with jaguar results and changed
    it to jaguar animal

q1 q2 q1 q3
jaguar jaguar animal jaguar jaguar car jaguar
jaguar insurance
jaguar jaguar auto
retrieve possible replacements based on
the frequency of query pairs
15
Merging Similar Meanings
  • Use units/associations/extensions to identify
    similar meanings
  • Example car and insurance appear together
    frequently ? they are correlated

jaguar jaguar animal jaguar animal
jaguar jaguar car jaguar jaguar insurance jagu
ar car
jaguar jaguar auto
16
Units/Associations
  • Units distinct concepts
  • Associations units which are appeared together
  • freq(car, auto) K freq(car) freq(auto)
  • ? car and auto are correlated
  • ? merge into category car auto

associations
units
jaguar animal auto insurance
jaguar animal car auto animal cat
17
Results
  • mercury
  • marine insurance car
  • planet
  • metal element chemical
  • apple
  • ipod computers
  • fruit
  • mac
  • tools computer
  • cosmetics

18
Adding Named-Entity Information
q1 PER q1 LOC q1 COMP
  • Identify queries that have
  • multiple meaning by having
  • different named-entity tags
  • Armstrong ? Louis Armstrong PERSON
  • ? Lance Armstrong
    PERSON
  • Add named-entity information to different
    meanings
  • Caterpillar ? COMPANY

19
Adding Wikipedia Information
  • Identify important meanings
  • by comparing with query pairs
  • (remove meanings that no one
  • searches for)

q1 meaning1 q1 meaning2 q1 meaning3
jaguar jaguar (car) jaguar fender jaguar jagua
r mac os jaguar sepecat jaguar jaguar jagu
ar (rocket)
jaguar jacksonville jaguar
20
Conclusions
  • According to our approach 10 of queries are
    ambiguous
  • Editorial investigation shows 4 of those are
    false positives
  • Ran our method on 100 ambiguous queries ? 85
    classified correctly
  • Ran our method on 500 non-ambiguous (random)
    queries ? 95 classified correctly
  • Overall 94 correct classification

21
Thank you!
22
Results
  • Weak Strong
  • online dictionary 0.73 real estate 0.9916
  • free online games 0.5277 los angeles 0.9954
  • beyonce lyrics 0.2413 bank of america
    0.9744
  • oops i did it again lyrics 0.75 yellow pages
    0.997
  • lyrics to let go 0.1333 home depot 0.9969
  • atlantic city hotels 0.6298 jessica simpson
    0.9975
  • hotels in singapore 0.3417 american express
    0.9169
  • katie couric pictures 0.7603 music videos
    0.979
  • top 20 songs 0.6095 best western hotels
    0.9926
  • breakfast recipes 0.6872 federal credit union
    0.9901
  • new york subway map 0.3894 days of our lives
    0.9972
  • best love songs 0.5063 country love songs
    0.9755
  • kidney infection symptoms 0.765 kelley blue
    book 0.9892
  • chicago suburbs 0.5417 myspace layouts
    0.857
Write a Comment
User Comments (0)
About PowerShow.com