Title: Making Content Findable
1Making Content Findable
gtgt
I N F O S E E K S O F T W A R E
- Andy Feit
- Vice President General Manager
- Infoseek Software
2Todays Goals
gtgt
I N F O S E E K S O F T W A R E
- Discover ways to make your web-based or intranet
content more findable by enterprise search
engines - Share some tools and techniques for improving
navigation on your site / intranet - What to tell your content authors, and ways to
help them
3Intranets are wonderful things
gtgt
I N F O S E E K S O F T W A R E
- Corporations typically use intranets for
- information sharing
- information publishing
- document management
- email, workflow, corporate directories, etc.
Source IDC doc 19643, July 1999
4except when they arent.
gt
I N F O S E E K S O F T W A R E
- Slow access
- Difficulty accessing information
- Hard to find information
- Poor content
- Out of date
- Poor search engine
Source IDC doc 19643, July 1999
5Frustrations add up...
gt
I N F O S E E K S O F T W A R E
Hard to find information
Source IDC doc 19643, July 1999
6Frustrations add up...
gt
I N F O S E E K S O F T W A R E
Hard to find information
Too much information thats too hard to find
Information Chaos
Source IDC doc 19643, July 1999
7Information Chaos How to solve it?
gtgt
I N F O S E E K S O F T W A R E
- Information management
- Publishing standards
- Content-discovery features
8gt
I N F O S E E K S O F T W A R E
Information Chaos How to solve it?
- Information Management
- Content Management Packages
- Vignette StoryServer
- Allaire Spectra
- NetObjects Authoring Server
9Information Chaos How to solve it?
gt
I N F O S E E K S O F T W A R E
- Information Management
- Content Management Packages
- Vignette StoryServer
- Allaire Spectra
- NetObjects Authoring Server
- Publishing Workflow
10Information Chaos How to solve it?
gt
I N F O S E E K S O F T W A R E
- Publishing StandardsFindable Content
- Web-enabled
- Ability to access content from a browser
11Information Chaos How to solve it?
gt
I N F O S E E K S O F T W A R E
- Publishing StandardsFindable Content
- Web-enabled
- Clickable from other accessible pages
- Allows spiders (crawlers, robots, etc.) to find
content by following links
12Information Chaos How to solve it?
gt
I N F O S E E K S O F T W A R E
- Publishing StandardsFindable Content
- Web-enabled
- Clickable from other accessible pages
- Have descriptive titles summaries
- Page titles should be unique and descriptive
- Page summaries should accurately describe page
content
13Information Chaos How to solve it?
gt
I N F O S E E K S O F T W A R E
- Publishing StandardsFindable Content
- Web-enabled
- Clickable from other accessible pages
- Have descriptive titles summaries
- Use META tag information
- And a search engine that indexes them (ex.,
search.state.mn.us)
14Information Chaos How to solve it?
gt
I N F O S E E K S O F T W A R E
- Publishing StandardsFindable Content
- Web-enabled
- Clickable from other accessible pages
- Have descriptive titles summaries
- Use META tag information
- Use XML
- Create your own tagging schemas for e-commerce,
content, database records
15Information Chaos How to solve it?
gt
I N F O S E E K S O F T W A R E
- Content-discovery FeaturesNavigation Schemas
- Site Maps / Directories
- Automatically or manually created
- Forms-based
- Selectable from pull-downs
- Link-based
- Standard, hand-made or content-management
system-based navigation
16Information Chaos How to solve it?
gt
I N F O S E E K S O F T W A R E
- Content-discovery FeaturesSearch Engine
Software - Full-text indexing
- Natural Language search syntax
- Simplified Boolean syntax
- Ability to natively index META data (and do field
level search on it!) - Ability to index XML optimally
- Good relevancy ranking
17Information Chaos How to solve it?
gt
I N F O S E E K S O F T W A R E
Search Directory Software
Browseable, searchable index of all content
Full-text search Engine
Manageable topic hierarchy
18Information Chaos How to solve it?
gt
I N F O S E E K S O F T W A R E
Architecture of an Enterprise Search Engine
Web Browser
Navigation User Interface
Search Results
Topic Listings
Document Index
TopicRules
Search Engine Spider
Web-based Content
19Information Chaos How to solve it?
gt
I N F O S E E K S O F T W A R E
- Search Directory SoftwareChoosing the right
one - Administrative factors
- Scalability / Performance
- Control, Flexibility
- Admin UI and Ease of Management
- Good search performance, easy to use
- Real-Time Updates
- Leading edge, not bleeding edge
20Indexing Challenges
gtgt
I N F O S E E K S O F T W A R E
- But spiders cant do everything
- Some pages include common (and not-so-common)
features that make spidering and indexing a
challenge
21Indexing Challenges Dynamically Generated
Pages
gt
I N F O S E E K S O F T W A R E
- ASP, JSP, CGI, DB-driven, etc.
- Personalized or tracked pages
- Solution
- Make sure you use a search engine that can index
these types of pages - Create standard user profiles
- beware of cookies for personalization or
tracking black holes
22Indexing Challenges Image Maps
gt
I N F O S E E K S O F T W A R E
- Server-side image maps do not contain HREF links
- Solution
- Use client-side maps, which contain HREF links,
or - Place links from a server-side image map on a
separate html page - Example
- usgs.gov, clemson.edu
23Indexing Challenges Javascript-created Pages
gt
I N F O S E E K S O F T W A R E
- HTML pages created by Javascript are hard to find
by the spider - Solution
- Include a ltNOSCRIPTgt section in the code,
providing HREF links inside - Example
- mylifepath.com
24Indexing Challenges Javascript Menus Lists
gt
I N F O S E E K S O F T W A R E
- Spiders cant follow menus or lists created using
Javascript - Solution
- Include a ltNOSCRIPTgt section in the code,
duplicating the menu or list items as links - Example
- zinezone.com
25Indexing Challenges Re-Directs
gt
I N F O S E E K S O F T W A R E
- Some indexes maintain the original URL as the
target, not the actual page - Solution
- Ensure that the actual target URL is listed on a
site map or other page - Make use of robots meta tag, no index, no
follow
26Indexing Challenges META Refresh
gt
I N F O S E E K S O F T W A R E
- Older browsers some spiders do not recognize
META refresh URLs - Solution
- Include a normal HREF link to the new page on the
refresh page, or - Ensure that the actual target URL is listed on a
site map or other page
27Indexing Challenges Frames Framesets
gt
I N F O S E E K S O F T W A R E
- Some spiders will not follow links contained
within ltFRAMESETgt tags - Solution
- Include links within a ltNOFRAMESgt tag section, or
- Include the links within a text listing or site
map - Examples
- saic.com, publish.com
28Final Tips
gtgt
I N F O S E E K S O F T W A R E
- Provide GUIDELINES to your authors
- PICK a good enterprise search engine
- Give the spider good TRAILS to follow
29For More Information
I N F O S E E K S O F T W A R E
To review this presentation and its information
sources, go to http//software.infoseek.com/what
snew.htm (and by the way, you can download a
really good search engine and topic manager,
FREE! )