Information Retrieval - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

Information Retrieval

Description:

Information Retrieval and projects we have done. Group Members: Aditya Tiwari (08005036) Harshit Mittal (08005032) Rohit Kumar Saraf (08005040) – PowerPoint PPT presentation

Number of Views:70
Avg rating:3.0/5.0
Slides: 27
Provided by: Akhi2
Category:

less

Transcript and Presenter's Notes

Title: Information Retrieval


1
Information Retrieval and projects we have done.
  • Group Members
  • Aditya Tiwari (08005036)
  • Harshit Mittal (08005032)
  • Rohit Kumar Saraf (08005040)
  • Vinay Surana (08005031)

Guided by Prof. Pushpak Bhattacharyya
2
Motivation
  • Web, documents and encyclopedia all have
    tremendous amount of data and information in
    them. The information thus available serves only
    the intent of the creator or collector of data.
  • However, there can be other uses of that
    data/information as well. The need is to mine the
    right information from the data and use it
    appropriately.

3
Information Retrieval
4
Applications
  • Web search Google, Yahoo
  • Querying/QA system like Watson (developed by
    IBM).
  • Spam filtering
  • Automatic Summarization
  • Cross-lingual retrieval

5
Information Retrieval
  • IR is the study of concerned with searching for
    documents, and for metadata about documents, as
    well as that of searching relational databases
    and the WWW.
  • The data objects that are collected can be
    images, documents, videos, mind maps, music

6
Wiki Mind Mapping
  • Harshit Mittal (IIT-B)
  • h.mittal83_at_gmail.com
  • Aditya Tiwari (IIT-B)
  • adi.tiwari27_at_gmail.com
  • Akhil Bhiwal (VIT University)
  • bhiwalakhil_at_gmail.com

7
Project Idea
  • Represent the textual information in graphical
    form which is easier to understand and more
    intuitive to read. The visual representation
    should be able to summarize the text.

8
Research Goal
  • Use of phrases to represent semantic information.
  • Hierarchical representation of information of a
    given text

9
Mind maps
  • A mind map is a diagram used to
    represent words, ideas, tasks, or other items
    linked to and arranged around a central key word
    or idea.
  • Example Mind map in the next slide.

http//en.wikipedia.org/wiki/Mind_maps
10
Mind map
11
Whats the difficult part?
  • We cant represent information from any article
    in mind-map as it is. That would make it
    incoherent and clumsy.
  • Phrase extraction
  • General rules of grammar dont apply here.

12
Possible Solution
  • Develop new linguistic rules for representation
    of text in visual form.
  • Use existing summarization tools to generate
    summary and try to represent that in mind-map.

13
How we did it.
  • Pulling out the article section wise from the
    Wikipedia page.
  • Parsing each section sentence wise using the
    Stanford parser.
  • Extracting relevant phrases using Tregex
    (another Stanford tool).
  • Putting these phrases into a mind map, section
    wise.

14
Extraction of relevant information
  • Identifying subtrees from the parse tree of a
    sentence that are important.
  • This was done using a few heuristics like
  • Presence of a superlative adjective in a noun
    phrase

15
Extraction of relevant information
  • Presence of a cardinal number in a noun phrase

16
Extraction of relevant information
  • Matching of a particular verb to the bag of verbs
    that were considered relevant for a particular
    article. For example for the history section,
    verbs like find , discover, settle, decline were
    considered more useful, as compared to words
    like derive, deduce etc. which were considered
    useful for some other section.

17
Extraction of relevant information
  • Ex The name India is derived from Indus.

18
Code Generated Mind Map
19
Evaluation
  •  

http//en.wikipedia.org/wiki/Precision_and_recall
20
Evaluation
  • Survey based
  • Asking a person to generate 10 questions from
    given article.
  • Asking another person to answer those question
    with the help of mind-map.
  • Repeating the same exercise in reverse manner for
    another article.

21
Observations
  • Pros
  • Extraction of right information with high
    accuracy.
  • Concept of phrase extraction works well.
  • High precision value were obtained (between
    0.5-0.75).

22
Observations
  • Cons
  • Information presented in mindmap of low depth is
    clumsy.
  • Low recall value (0.2 0.4)
  • Linking of node phrases with their apt
    description.
  • Heuristics defining important phrases need to
    be refined.

23
Limitations
  • Bag of words and Tregex expressions is hand-coded
    instead of machine learned.
  • Garbage phrases are being generated.
  • Level of hierarchy is limited to 3.

24
Future work
  • Using machine learning to determine the important
    keywords for a given sentence.
  • We want to explore the possibility of finding
    patterns in subtree expressions using machine
    learned approach.
  • Refinement of generated phrases.

25
References
  • http//en.wikipedia.org/wiki/Mind_maps
  • http//en.wikipedia.org/wiki/Precision_and_recall
  • Tool Stanford Parser and Stanford Tregex Match
    http//nlp.stanford.edu/software/tregex.shtml

26
Vision Based Attribute Segmentation from lists in
Web Pages
-by Rohit Kumar Saraf
Write a Comment
User Comments (0)
About PowerShow.com