GRADUATION PROJECT 1 TOPIC: Numerical Integration - PowerPoint PPT Presentation

1 / 13
About This Presentation
Title:

GRADUATION PROJECT 1 TOPIC: Numerical Integration

Description:

The word entropy is borrowed from physics, in which entropy is a measure of the ... Step2: Understanding what is Entropy by meaning and by formulas. ... – PowerPoint PPT presentation

Number of Views:70
Avg rating:3.0/5.0
Slides: 14
Provided by: sev86
Category:

less

Transcript and Presenter's Notes

Title: GRADUATION PROJECT 1 TOPIC: Numerical Integration


1
GRADUATION PROJECT1
Automatically Building a Stopword
List for an Information Retrieval System using
Entropy AdvisorsAsst. Prof. Dr Kamil
M.Sc. Amir Karshenas
Student Alireza Sadeghi 4ECE151
2
The aim of the my project is to test whether, we
can have better result by applying Entropy
methods to find stopword List IR and giving
improved Inverted file.
3
Information Retrieval
4
Vector Space Model
  • Vector space model (or term vector model) is an
    algebraic model used for information filtering,
    information retrieval, indexing and relevancy
    rankings. It represents natural language
    documents (or any objects, in general) in a
    formal manner through the use of vectors (of
    identifiers, such as, for example, index terms)
    in a multi-dimensional linear space. Its first
    use was in the SMART Information Retrieval
    System.
  • Documents are represented as vectors of index
    terms (keywords). The set of terms is a
    predefined collection of terms, for example the
    set of all unique words occurring in the document
    corpus.

5
Inverted Index
6
Stopword
  • A stoplist (or stop list), the name commonly
    given to a set or list of stopwords, is typically
    language specific although it may contain words
    (and other character sequences like numbers and
    punctuation). A search engine or other natural
    language processing system may contain a variety
    of stoplists, one per language, or it may contain
    a single stoplist that is multilingual.
  • Some of the more frequently used stop words for
    English include "a", "of", "the", "I", "it",
    "you", and "and.These are generally regarded as
    'functional words' which do not carry meaning
    (are not as important for communication). The
    assumption is that, when assessing the contents
    of natural language, the meaning can be conveyed
    more clearly, or interpreted more easily, by
    ignoring the functional words.

7
Entropy
  • Entropy A fundamental problem in information
    theory is to find the minimum average number of
    bits needed to represent a particular message
    selected from a set of possible messages. Shannon
    solved this problem by using the notion of
    entropy. The word entropy is borrowed from
    physics, in which entropy is a measure of the
    disorder of a group of particles. In information
    theory disorder implies uncertainty and,
    therefore, information content, so in information
    theory, entropy describes the amount of
    information in a given message. Entropy also
    describes the average information content of all
    the potential messages of a source. This value is
    useful when, as is often the case, some messages
    from a source are more likely to be transmitted
    than are others.

8
Entropy (formulas)
  • The entropy of X is defined by its average
    information
  • H(X) E IX   -?  i ( pi log2 p i )
  • Entropy can also be called average uncertainty
    (strictly speaking, the average reduction in
    uncertainty for a receiver).

9
? Entropy of word to hole file(bit/sentence)
Stopword
  • PART1
  • PART2
  • PART3
  • PART4
  • PART5
  • AFTER GETTING VALUES WE UNDERSTAND THAT THERE IS
    A DIRECT RELATION BETWEEN THE STOPWORDS AND
    ENTROPY SO NOW WE CAN IMPROVE THE INVERTED FILE.

10
Result of stopword founded
  • 0.2 lt Norm lt0.5
  • For
  • Hi
  • Is
  • It
  • On
  • That
  • With
  • And
  • in
  • 0.2 lt Norm lt0.6
  • For
  • Hi
  • Is
  • It
  • On
  • That
  • With
  • And
  • in
  • Of
  • to

11
  • Projects steps performed
  • Step1Understand what is Information Retrieval
  • Step2 Understanding what is Entropy by meaning
    and by formulas.
  • Step3Understanding the problems faced in IR and
    need for optimizing IR.
  • Step4Understanding THE STOPWORD effect on IR
  • Step 5Start the writing the Application to
    calculate the entropy of a big file.

12
  • Step6 Have the calculation output and evaluating
    result whether they have valid answer or not.

13
THANKS FOR YOUR ATTENTION
Write a Comment
User Comments (0)
About PowerShow.com