BCS Powerpoint template white

1 / 21
About This Presentation
Title:

BCS Powerpoint template white

Description:

'Searching for the Music You Like' ... Constant battle with SEOs. Enterprise search is a different proposition... As is desktop search ... – PowerPoint PPT presentation

Number of Views:39
Avg rating:3.0/5.0
Slides: 22
Provided by: sad566

less

Transcript and Presenter's Notes

Title: BCS Powerpoint template white


1
Information Search Retrieval Problems,
solutions, trends Tony Rose, PhD MBCS
CEng Vice-Chair, BCS IRSG
going further together
2
Contents
  • The BCS Information Retrieval SG
  • What is IR anyway?
  • How search engines work
  • Why search is hard
  • Wheres it all going?

3
Information Retrieval SG
  • Growing rapidly
  • 750 members
  • Annual conference (ECIR)
  • FDIA
  • Various 1-day events
  • Search Solutions
  • Informer
  • Discounts for various events, e.g. SIGIR
  • is free to join!

4
Information Retrieval SG
  • Traditional focus on search (text retrieval)
  • Knowledge management, Multimedia retrieval, User
    experience, Information visualisation,
    extraction, summarisation, etc.
  • Latest issue of Informer
  • Searching for the Music You Like
  • Exploring Maps through Geo-referenced Images and
    RDF Shared Metadata
  • Using Semantic Relations to improve Question
    Answering
  • Modeling Annotation of Dance Media Semantics

5
What is IR?
  • Science of searching for
  • information in documents
  • documents themselves
  • metadata which describe documents,
  • within databases
  • whether relational stand-alone databases or
    hypertextually-networked databases such as the
    World Wide Web

6
The Need for IR
  • In a word Infoglut
  • 800Mb of recorded information is produced per
    person per year Computing magazine
  • Up to 80 of corporate information is
    unstructured
  • Documents, emails, images, voicemail, etc.
  • So cant we just use Google?

7
How do Search Engines Work?
  • On the surface
  • Understand what the user wants
  • Find documents about that topic
  • In reality
  • Count words
  • Apply a simple equation

8
How do Search Engines Work?
  • Measure the conceptual distance between your
    query and each document in the DB
  • Return the best matches

Source Maristella Agosti, University of Padova
9
The Central Problem in IR
Information Seeker
Author
Concepts
Concepts
Query Terms
Document Terms
Do these represent the same concepts?
Source Jimmy Lin, University of Maryland
10
The Central Problem in IR
  • How do you represent the concepts?
  • Documents and queries bag of words
  • Unordered set of terms numeric weights
  • How do you calculate similarity?
  • Set theory (e.g. Boolean)
  • Algebraic (e.g. vector space)
  • Probabilistic

11
IR models
Source Wikipedia
12
How do we Evaluate Search?
  • Assume that results are either relevant or
    non-relevant
  • Precision
  • Proportion of retrieved documents that are
    relevant
  • Recall
  • Proportion of known-relevant documents that were
    actually retrieved
  • But what about indexing / retrieval speed, query
    language, user experience, etc?

relevant
retrieved
13
Why Search is Hard
  • Document representation
  • Keywords are not enough
  • Blind Venetian Venetian Blind
  • Terms are not independent
  • Structural discourse dependencies,
    co-references, etc.
  • Imperfect stop lists
  • the, and, of

14
Why Search is Hard
  • Morphological relationships
  • Computer, computing, compute, computed
  • Index documents using word stems
  • False positives
  • organization, organ ? organ
  • police, policy ? polic
  • arm, army ? arm
  • False negatives
  • cylinder, cylindrical
  • create, creation
  • Europe, European
  • Prefixes are particularly difficult
  • Un, dis
  • Delegate de-leg-ate
  • Ratify rat-ify

15
Why Search is Hard
  • Named entity recognition
  • Companies in New York
  • New companies in York
  • NEs are highly discriminatory
  • People
  • Places
  • Organisations
  • Many vertical applications
  • e.g. bioscience

16
Why Search is Hard
  • Semantic relationships
  • Car automobile
  • Buy purchase
  • Sick ill
  • Synonym rings
  • Car, automobile, truck, bus, taxi...
  • Appropriate level of abstraction depends on user
    task
  • Development of subject-specific taxonomies
  • concept matching

17
Why Search is Hard
  • Word sense disambiguation
  • Bank
  • Financial institution?
  • Part of a river?
  • An aerial manoeuvre?
  • Active research area
  • Categorisation clustering of results

18
Googles Insight
  • Exploit the link structure inherent in the web
  • calculate measure of documents value
  • Independent of any query
  • PageRank
  • Overall relevance based on 100 parameters
  • Constant battle with SEOs
  • Enterprise search is a different proposition
  • As is desktop search

19
Wheres it all going?
  • Vertical search
  • Jobs, travel, health, people, etc.
  • Rich media search
  • Audio, video, TV, images
  • Specialised content search
  • blogs, news, classifieds
  • Social search
  • Personalisation

20
Wheres it all going?
  • Mobile search
  • Answer engines
  • Active research community in Question Answering
  • Multi / cross-lingual search
  • Search agents
  • Human UI

21
Further Information
  • www.irsg.bcs.org
  • Informer
  • ECIR (March 2008, Glasgow)
  • Search Solutions 2008 (Sept 2008, London)
Write a Comment
User Comments (0)