eMichigan eWebEditPro Upgrade - PowerPoint PPT Presentation

1 / 8
About This Presentation
Title:

eMichigan eWebEditPro Upgrade

Description:

... for Mi News Wire because the changes to Inktomi were excluding MI News Wire ... Crawl MI News Wire. Department of Information Technology e-Michigan Web Development ... – PowerPoint PPT presentation

Number of Views:37
Avg rating:3.0/5.0
Slides: 9
Provided by: johnth8
Learn more at: https://www.michigan.gov
Category:

less

Transcript and Presenter's Notes

Title: eMichigan eWebEditPro Upgrade


1
(No Transcript)
2
Inktomi - A little history.
  • called
  • which is owned by
  • which is now owned by Yahoo.

3
Issues we had with the search
  • Multiple records for the same piece of content.
  • Limited advanced search functionality.
  • Assets are stored in the same document directory.
  • All documents are searched regardless of what
    agency site you are searching from.

4
What we did to fix the problems
  • Changed Inktomi settings to only index one URL
    for content that has the same Title and Body.
  • Added/Enhanced an Advance Search form.
  • Created a new collection for Mi News Wire because
    the changes to Inktomi were excluding MI News
    Wire
  • Planned enhancements for Advanced search
  • Separation of documents by agency.

5
Spider collections
  • Spider collections - We have three collections
    that are used when crawling the State of Michigan
    websites.
  • Crawl all sites that do not contain query
  • Crawl all site that use queries
  • Crawl MI News Wire

6
Time of Crawl
  • The search engine crawls every night at from
    0500 PM 730 AM of the next day.
  • There are times when the crawl does not complete
    in this time period. The crawl will then pick up
    where it left off the next evening when this
    crawl process is run.

7
Crawl Intervals
  • There are 6 revisit queues.
  • The revisit queues are used to determine when to
    re-crawl a document, which are based upon time
    intervals.
  • Minimum document revisit interval 2 days
  • After being placed in the search index, a
    document will be be placed in queue to be
    revisited in 2 days.
  • If the document has not been changed then it is
    placed in the next revisit queue.
  • Maximum document revisit interval 10 days
  • A document will never go more than 10 days
    without being revisited.

8
Weight of a document
  • Index Weights The importance of text relative
    to the body text of a document.
  • Determine where a document will appear in the
    search results, based upon the text that is being
    searched.
  • Title is weighted 8.
  • Keywords are weighted 4.
  • Description is weighted 4.
Write a Comment
User Comments (0)
About PowerShow.com