Title: Text Search System
1Text Search System
- Group 10
- Michaela Stadlerova
- Jakub Silhavy
- Gaojie He
- Hanjie Shu
2Outline
- 1.System Architecture
- 2.Components
- 3.Demonstration
- 4.Evaluation
3System Architecture
4Components
5Main Components
- Query Processing
- Document Processing
- Document Index
- Search Rank
- Clustering
6Query and Document Processing
- Tokenization
- Stemming
- Removal of Stop Words
7Document Index
- Step one Vector Generation
- Document document id, document name, maximum
term frequency, existed term and its frequency. - Step two Inverted Files
- Term Document ID, Term Frequency.
8Inverted Files
9Search Rank
- Construct Weights Vector
- Document Wi,jfi,j/ maxl(fl,j)log (N/ni),
Querywi,q(0.50.5freqi,q/maxl(freql,q))log(N/n
i) - Cosine Similarity
- sim(q,d)(q/q)(d/d)
10Clustering Bottom-up algorithm
Similarity matrix
11Clustering Bottom-up algorithm
Document weight vector
Cosine similarity (d1,d2)0.85
Similarity matrix
12Clustering Bottom-up algorithm
Document weight vector
Cosine similarity (d1,d2)0.85
Similarity matrix
findMax 0.94
merge d2 and d3
13Clustering Bottom-up algorithm
Similarity matrix
repeat until
. . .
findMax gt Constant
.
.
.
14Clustering ranking
Cluster weight vector
cluster 1
compute average
Cosine similarity (c1,query)
15DEMONSTRATION
16Evaluation of the system
Basic system (data from 20 first retrieval
documents)
Extended system
(Input first 25 documents)
17Extended System VS Basic System
- Different order of documents
- Improvement of the search experience
- The documents are grouped according to its
similarity - Generating of labels
- Precision and recall remain mostly the same
18THANK YOU!!!!