ITEC 4020 M - PowerPoint PPT Presentation

About This Presentation
Title:

ITEC 4020 M

Description:

... by JSP and contains all the hyperlinks for all documents that hold the keyword. ... the user clicks on any hyperlink that specific document displays through XML ... – PowerPoint PPT presentation

Number of Views:228
Avg rating:3.0/5.0
Slides: 11
Provided by: YOR46
Category:
Tags: itec | hyperlink

less

Transcript and Presenter's Notes

Title: ITEC 4020 M


1
  • ITEC 4020 M
  • Group 18
  • Amna Al-Omari
  • Divya Love
  • Norbert Megler
  • Omer Saleem
  • Sachin Uppal
  • Shahla Defileh

2
WEB SEARCH SYSTEM
  • Presentation Overview
  • Brief Overview of Assignment Objective
  • Structure and Functionality
  • Search Demonstration
  • Questions

3
WEB SEARCH SYSTEM
  • Introduction
  • Our website, can be found at http//unix.aml.yorku
    .ca8080/w04_g18/search.jsp
  • Our Web Search system is based on inverted file
    indexing using the XML document which has been
    created by the crawler that was supplied to us.
    Our site contains 3 main WebPages
  • The main Search page which is built by JSP and
    contains a text box and 2 buttons (reset and
    submit).
  • The result page which is built by JSP and
    contains all the hyperlinks for all documents
    that hold the keyword.
  • The display page which is built by Xml and
    displays the clicked on document.

4
WEB SEARCH SYSTEM
  • Logical structure
  • 1- java class which will read the given XML file
    and split it into 1139 separate XML documents.
  • - We read the XML file using FileInputStream and
    BufferedReader.
  • - The file is read one line at a time and each
    line is compared to the index ltPubmedArticlegt
    which signals the beginning of a new article.
  • - Upon detection of word a new XML document file
    is created
  • - The file number is kept track off and once the
    whole article is written into a file, the file
    counter is incremented by one.

5
WEB SEARCH SYSTEM
  • CONTINUED
  • 2- create a Temporary (merged) file which goes
    through the entire 1100 document and identifies
  • all terms
  • their document number
  • frequency

6
WEB SEARCH SYSTEM
  • Continued..
  • 3- Next we create the First level index.
  • - It uses a simple java class which reads from
    the Temporary file.
  • - This index includes a counter (which
    represents the total number of terms), the terms
    which appear only once, number of the document
    that includes that specific term, and the total
    number of frequency of each term all of this is
    then written into a text file.

7
  • Continued
  • 4- Next is the creation of the second level
    indexing, created by a simple java class which
    includes counter, term, document number, and
    frequency.

8
  • Searching Functionality
  • We are using MVC (model view controller)
    architecture i.e. Servlet acting as controller,
    JSP is used for displaying results and Java Bean
    has the main business logic. Once the user
    submits the keyword, the search functionality
    goes through the first level index to find the
    counter number for that specific term and then
    matches that counter number with all the XML
    document number which appears in the second
    index. Then goes through all the XML documents
    and grabs all the relevant document title for
    display.

9
  • Displaying of Results.
  • XSL files take care of this functionality. Once
    the user clicks on any hyperlink that specific
    document displays through XML which uses XSL file.

10
  • The End
  • Questions?
  • Thank You
Write a Comment
User Comments (0)
About PowerShow.com