Blog Search Engines - PowerPoint PPT Presentation

1 / 75
About This Presentation
Title:

Blog Search Engines

Description:

Blogger gave users a free for-dummies choice of templates, a short and easy-to ... on blogging activity, as people felt compelled to discuss, ... Blog Finder ... – PowerPoint PPT presentation

Number of Views:366
Avg rating:3.0/5.0
Slides: 76
Provided by: MB82
Category:
Tags: blog | engines | finder | free | people | search

less

Transcript and Presenter's Notes

Title: Blog Search Engines


1
Blog Search Engines
  • Poonam Bhatti, Mimi Lam, Paul MacDonell, and
    Barrie Olmstead
  • LIBR 557 Advanced Information Retrieval
  • November 21, 2005

2
Brief History of Blogging
  • Blogs are kind of an outgrowth of the alternative
    press that came about in the 1960s.
  • In the strictest sense, a blog is someone's
    online record of the Web sites he or she visits.
  • 1999 Brigitte Eaton starts the first portal
    devoted to blogs with about 50 listings.
  • In July 1999, a Toronto programmer named Andrew
    Smales launched the first do-it-yourself blog
    tool called Pitas.com, helping to facilitate an
    "online diary" community. Smales later developed
    a sister site, Diaryland.

3
Blogging History Contd
  • Blogger.com, which was launched in August 1999 by
    Evan Williams, Paul Bausch, and Meg Hourihan, is
    a tool that enables anyone to not only create and
    maintain a blog, but to store on their own server
    with a personalized address, and not on a remote
    base.
  • Blogger gave users a free for-dummies choice of
    templates, a short and easy-to-navigate
    registration process, and Web hosting.
  • In the November 2000 issue of The New Yorker,
    Rebecca Mead christened the blogging phenomenon
    "the CB radio of the Dave Eggers generation."
  • The events of September 11, 2001, had a huge
    impact on blogging activity, as people felt
    compelled to discuss, argue, rant, and mourn
    online. There was also a prevailing
    dissatisfaction with Big Media.
  • It is estimated, as of summer 2005, that there
    are over 10 million blogs in the blogosphere.

4
The Structure and Anatomy of Blogs
  • Blogs are basically Web pages that consist of
    individual posts arranged in reverse
    chronological order.
  • A blogger may read a piece of news or a tidbit of
    interest and post it to his or her blog within
    seconds. Feed aggregators then query the site
    about every hour searching for new feeds.
  • Blogs have a low barrier to access and one may
    post using email, voicemail, Web forms, or a
    downloaded WYSIWYG program.
  • Many hosted blogs have preformatted template
    choices that do not require a detailed knowledge
    of HTML, CSS, or XML.
  • Blogs have home pages with mostly static content
    and a list of recent posts.
  • Posts usually have the following fields title,
    date/time stamp, body, comments, trackbacks, and
    permalinks.
  • Archives are created contiguously.

5
Feeds
  • File in which blog lists latest content/posts
  • RSS feed is new information from website in
    format that RSS reader can read
  • RSS Reader program that checks web pages for new
    content in RSS format
  • List newest content in format that software can
    read shows at once when blog has been updated
  • may contain only the title of the post, the title
    plus the first few lines of a post, or the entire
    post
  • Most blog search engines do not crawl the entire
    web crawl RSS Feeds

6
Why are blogs difficult to search?
  • Content changes on daily, or sometimes hourly,
    basis
  • Blogs organized with most recent material at top,
    while older material is further down the page
  • Most blogs dont have descriptive titles
  • Many topic on the same page
  • Difficult to search in traditional format
  • While larger search indexes (such as Google)
    index weblogs, they do not crawl web frequently
    enough to provide most up-to-date information
  • Miss the immediacy of weblogs
  • Most larger search engines have changed
    algorithms so that blogs not the most highly
    ranked sites

7
Blog Search Engines
  • Need for specialized blog search engines
  • Blog search engines crawl RSS feeds often,
    therefore provide the most recent material
  • blog search engines more focused provide
    todays internet

8
Blog Search Engines
  • Two types of blog search engines
  • Directory or Index Style
  • Weblogs organized into categories (language,
    country, alphabetical order, topic, etc.)
  • Examples Globe of Blogs, Blogwise, Weblogs.com,
    Blog Universe
  • Free text search engines
  • perform keyword searches
  • Examples Feedster, Waypath, Technorati, Day Pop

9
Weblogs.com
  • ping server that automatically notifies
    subscribers when new content is posted to a
    website or blog
  • receives millions of pings every day from blogs
    that have configured their publishing software to
    notify Weblogs.com the moment content is
    published
  • must be told when weblog has changed
  • doesn't automatically check
  • blogging tool or content management system can be
    programmed to tell Weblogs.com about the change
  • no information on how many blogs tracking

10
Weblogs.com
11
Blog Universe
12
Blog Universe - News
13
Blog Universe
  • Browse by topic no other access points
  • 7682 websites in directory
  • Blogs added by users
  • Person adding the blog decides which category to
    put it in
  • Results change constantly

14
Globe of Blogs
15
Globe of Blogs Browse by Title
16
Globe of Blogs - Results
17
Globe of Blogs
  • 28,641 weblogs registered
  • Can register blogs with Globe of Blogs to have it
    appear in directory
  • Index of weblogs indexed by author (name and
    birthday), title, topic, and location
  • Ability to search within directory
  • Check links periodically throughout year to
    remove dead links

18
Feedster
  • Free text blog search engine dedicated to
    indexing and finding blogs
  • Mission allow organizations and individuals to
    harness rich quantity of information available in
    RSS universe
  • Largest index of RSS feeds searchable index of
    over 17 million feeds and hundreds of millions of
    XML documents
  • Feedsters proprietary technology continuously
    crawls web fetching updated posts and RSS feeds
  • Fresh index of the over 17 million feeds several
    times per hour adds millions of new documents
    daily
  • Enables delivery of fresh index of information
    from millions of sources more frequently than
    traditional search engines

19
Feedster Basic Search
20
Feedster Advanced Search
21
Feedster Search Tools
  • Supports Boolean searching
  • Available operators OR, NOT, NEAR AND is
    implied
  • Proximity searches
  • Use double quotes to search for phrase
  • NEAR how close terms should be use NEAR
    to specify order of terms
  • Wildcard/Truncation
  • Stemming automatically searches for singular and
    plural forms of search terms
  • Wildcards supports single (?) and multiple ()
    character wildcards

22
Feedster Search Tools (contd)
  • Search Syntax
  • Supports field searching can limit search to
    title, description, author, feed ID, top level
    domain, site, host, URL, encoding
  • Can limit search by date or time
  • Can search for range of numbers . . .
  • Available functions
  • typo use if dont know how to spell term or
    if want to find incorrect spellings of term
  • soundslike finds variations of term (e.g.,
    soundslikechicken will find chicken, chickennn,
    and chukin)
  • literal use if want to find exact character

23
Feedster Special Features
  • Can save customized search and check back hourly
    or daily to see what the latest posting on topic
    are
  • Provides cached version of pages it finds can
    still look at page even if has disappeared

24
Feedster - Performance
  • Works quickly
  • Results vary minute by minute
  • Some of the same blogs on first page of results
    list, but in different order
  • Not always immediately clear how results related
    to search terms

25
Feedster Results Display
  • Results ranked by date (newest items first)
  • Can also choose to rank by relevance
  • When ranking by relevance, can weight terms for
    importance, using term, with 1 being normal
    value
  • Each results looks like this

26
Feedster Search Results
Results sorted by date
Results sorted by relevance
27
Feedster Help Support
  • Very detailed help and search tips
  • Explains how to use Feedster
  • Available search tools
  • Search tips
  • How to interpret results

28
Feedster Whats Missing?
  • Cannot limit results to pages written in certain
    language
  • Feedster in process of adding this feature

29
Waypath
  • A blog discovery engine
  • Includes Blogs on News
  • Utilizes Topic Streams
  • Covers fewer blogs than such engines as
    Technorati or PubSub, but it uses two types of
    RSS feeds for results
  • It is also one of the few search engines that
    indexes the entire blog, and not just the feed.

30
Waypath Basic Search
31
Waypath Basic Search
32
Waypath Advanced Search
33
Waypath Search Tools
  • Supports Boolean operators
  • Supports using parentheses to group clauses to
    form sub-queries
  • Supports single (?) and multiple () character
    wildcard searches
  • Supports fuzzy searches and looks for terms that
    are similar in spelling to the query term
  • Supports proximity searching

34
Waypath Special Features
  • Ranks the posts it returns based on matches
    between the query terms and the terms found in
    individual posts
  • Weighting terms
  • By default, unmodified query terms have a weight
    of 1.0.
  • One may specify a new weight by specifying it at
    the end of a term, separated by the symbol
  • The Waypath bookmarklet feature allows a searcher
    to access Waypath related weblog posts from any
    page by clicking on a bookmark link.

35
Waypath Performance
  • Retrieval is slow
  • Reasoning behind the order of the search results
    is unclear

36
Waypath Results Display
  • Topic Stream

37
Waypath Results Display
  • Keyword Search

38
Waypath Help Support
  • Waypath has help information on
  • Single term searching
  • Using phrases in quotes
  • Boolean operators
  • Grouping
  • Wildcard operators
  • Weighting terms
  • Fuzzy searches
  • Proximity searches

39
Waypath Whats Missing
  • Does not have cached search results for free-text
    searching
  • Cannot search for blogs in languages other than
    English
  • Could be more comprehensive in terms of the
    number of blogs indexed

40
Technorati
  • A real-time search engine
  • Currently tracking 21.3 million blogs and 1.7
    billion links
  • Rankings are based on the number of sources that
    point to a particular blog relative to other blogs

41
Technorati Basic Search
  • Search box available on the homepage

42
Technorati Advanced Search
  • On the Search page there is an options arrow that
    expands to reveal advanced features

43
Technorati Advanced Search
  • Keyword Search

44
Technorati Advanced Search
  • Website URL Search

45
Technorati Advanced Search
  • Tag Search

46
Technorati Search Tools
  • Boolean operators are integrated into search
    boxes
  • Exact phrase searching is integrated also
  • URL search
  • Metadata Tags search

47
Technorati Special Features
  • Technorati Blog Finder Beta
  • Blog Finder uses metadata tags to categorize
    posts.
  • Technorati is currently tracking 3 million tags
    in numerous languages.

48
Technorati Special Features
  • Technorati Membership
  • Offers users special features for signing up
  • Membership is free
  • Create a profile page
  • Add a photo to your blog
  • Claim your blog and trick it out with Technorati
    tools
  • Receive a Watchlist where users are kept up to
    date on topics that they select

49
Technorati Performance
  • Retrieval was fairly fast
  • Photos took longer to load but everything loaded
    within a reasonable time
  • Results seemed to be relevant to search terms
  • Did deliver what they promised the results
    ranked first were usually posted less than an
    hour ago

50
Technorati Results Display
  • Keyword search result

51
Technorati Results Display
  • Website URL search result

52
Technorati Results Display
  • Tag search result

53
Technorati Help Support
  • Has various help topics pages to assist users
  • Using Technorati
  • Blogging 101
  • Frequently Asked Questions
  • Publisher Guide
  • Tags
  • Blog Finder
  • There is also a Contact Us link beside the Help
    section that users can resort to with questions.

54
Technorati Whats Missing
  • Cannot limit search by time or date
  • Does not support proximity searching
  • Does not support wildcard or fuzzy searches

55
Daypop http//www.daypop.com
  • Well designed, easy to use, one of the most
    respected News and Blog search engines on the web

56
Daypop
  • Created and maintained by a Daniel Chan
  • Chan blogged 2000 US election but could not
    share and retrieve web info
  • Daypop online August 2001
  • A year later searched over 7,500 sites
  • Now over 59,000 news sites, weblogs, RSS feeds

57
Daypop was down for a while recently
  • Daypop has been described as the front page of
    the Internet
  • A couple of posts to the Daypop weblog when it
    was down, and Dan was on vacation

58
Daypop Search Engine
  • Keywords automatically AND ed
  • Uses to force inclusion, - to force exclusion
  • multiple phrase searching or multiple-phrase-sea
    rching (with dashes)
  • Also period, slash (e.g., good for dates), back
    slash, underscore, and ampersand
  • Drop down either news or blogs searched (or both
    at same time) also RSS news headlines or RSS blog
    posts
  • Advanced Page by language, country, time periods
    (3 hours min, 2 weeks max), and results per page

59
Daypop Pages that link to a URL
  • E.g. Noam Chomskys Blog Turning the Tide
  • Use link follow by the url of Chomsky Blog

60
Daypop Human and Automatic
  • Daypop is different from most search engines
    uses a human-edited list of sources to index
  • Daypop also crawls using its daypopbot/0.2 spider
  • meta tag to exclude ltMETA NAMEROBOTS
    CONTENTNOARCHIVEgt
  • Major news sites (3 hours), lesser (24 hours)
  • Blogs (crawled 12 hours)
  • Faster if using weblogs.com ping notification
  • Ranking algorithm word placement and proximity
    (e.g., a word in title, two search words near
    each other)
  • Does not use the Daypop Score when returning
    results

61
Daypop Scoring for Authority or Importance
  • Citation a link (a web page cites another)
  • Daypop Scoring gives more weight to citations
    that come from popular blogs
  • Idea is that this reflects relevance in a more
    meaningful way
  • Citation analysis simply counts total links
    (i.e., how many bloggers link to that blog?)
  • Technorati authority links (citations)
  • Daypop authority measured 2 ways

62
Daypop Scoring Vs. Citation Ranking
63
Daypop Trend Analysis
  • Top 40 unfiltered list of links of what is
    currently popular (updated as soon as crawled)

64
Other Daypop Trend Analysis
  • Filter results for Top News Posts or Top Blog
    Posts like Top 40 rank based on links
  • Word bursts (blogs) measurement of words with
    recent heightened usage (what is being written
    versus what is being linked to)
  • News bursts same algorithm but for front pages
    of news sites
  • Interesting feature is Top Wishlist tracks the
    weblogging communitys Amazon wishlists and the
    items on those lists.

65
Daypop Top Wishlist
66
Blog only Search Results
  • Similar to Google (keyword in context, cached
    copy, size)
  • Citations links and uses N (news source) and W
    (weblogs)

67
Daypop Comments and Criticism
  • Limited number of sites indexed is both a plus
    and a minus
  • Approach to Analysis of Web Pages
  • Element of Human Control
  • A few duplicate links
  • W Weblog,N News (occasional mistaken)
  • Engine can be slow
  • Archives not working properly
  • Occasionally whole site is down
  • No Subject Indexing
  • No related tags as with Technorati on results
    page
  • Lack of Boolean OR (no bird flu or avian flu
    search is possible at the same time)

68
Blogs and More
  • End of formal part of presentation
  • Remaining slides designed to initiate discussion

69
UBC, Librarians, and Blogs
  • Some SLAIS students have maintained blogs
  • Emily Yearwood-Lee (temp summer blog)
    http//coffeespoonsafternoons.blogspot.com
  • Heidi Dolamore http//quiddle.blogspot.com/
  • Cheryl Hill http//bc-scrapbook.blogspot.com/
  • Recent SLAIS grad Sunni Nishimura added a RSS,
    Wikis and Blog page to the UBC Library site
  • Many libraries maintain various blogs and often
    conferences are blogged. The 2005 Internet
    Library Conference blog is a recent example. An
    example of a librarian blog is Sites and
    Soundbytes
  • UBC Librarian Dean Guistini keeps a blog on
    Google Scholar
  • Map of Blogging Librarians http//www.frappr.com/
    blogginglibrarians
  • Any UBC student can start their own blog through
    the UBC Office of Learning Technology

70
Blogs as Controversial Items
  • Little scholarly literature exists on blogs and
    related issues
  • Still less on blog search engines
  • But the numbers of blogs keeps growing
    http//www.sifry.com/alerts/archives/000343.html
  • Blogs as a news items (or the creators of news)
    occasionally make it into mainstream media
    examples include Dan Rather, Kryptonite Locks,
    and the current issue of TIME Magazine and the
    small piece about blogs and rioters and France
  • So what is the deal with blogs? Egotistical navel
    gazing or real social force? Or something else?
  • Blogs as a means of archival preservation?

71
Opinions, Rants, Insights A Cross-Section of
What is Out There
  • Libraries and Related Issues
  • Library Web Logs (Laurel A. Clyde) An Actual
    Study! http//www.slais.ubc.ca/macdonell/bloglit/l
    ibraries_opinions/clyde.pdf
  • Weblogs Do They Belong in a Library (Penny
    Garrod) http//www.slais.ubc.ca/macdonell/bloglit
    /libraries_opinions/garrod.htm
  • Weblogs Their Use and Application in Science and
    Technology Libraries http//stlq.info/archives/bl
    ogstl.pdf
  • Revenge of the Blog People (Micheal Gorman)
    http//www.slais.ubc.ca/macdonell/bloglit/librarie
    s_opinions/gorman.htm
  • I See Blog People (T. Sott Plutchak)
    http//www.slais.ubc.ca/macdonell/bloglit/librarie
    s_opinions/plutchak.htm
  • The Passion of the Blog (Irene McDermott)
    http//www.slais.ubc.ca/macdonell/bloglit/librarie
    s_opinions/mcdermott.htm
  • All Generalizations Are False, Including This One
    (Marydee Ojala) http//www.slais.ubc.ca/macdonell
    /bloglit/libraries_opinions/ojala.pdf

72
Opinions, Rants, Insights A Cross-Section of
What is Out There cont
  • Social Role and Meaning
  • Blogs and the New Politics of Listening (Stephen
    Coleman) http//www.slais.ubc.ca/macdonell/blogli
    t/libraries_opinions/coleman.pdf
  • Blogs as Protected Space (Michelle Gumbrecht)
    http//www.slais.ubc.ca/macdonell/bloglit/librarie
    s_opinions/gumbrecht.pdf
  • Miscellaneous
  • An Introduction to Teaching With Weblogs ( Trey
    Martindale) http//www.slais.ubc.ca/macdonell/blo
    glit/libraries_opinions/teaching.pdf
  • Emerging Technologies (Bob Godwin-Jones)
    http//www.slais.ubc.ca/macdonell/bloglit/librarie
    s_opinions/rssblogswikis.htm
  • Finding a Blog in a Haystack (Stephen Baker)
    http//www.slais.ubc.ca/macdonell/bloglit/librarie
    s_opinions/haystack.htm
  • Your blog? who gives a _at_! (Aaron Weiss)
    http//www.slais.ubc.ca/macdonell/bloglit/librarie
    s_opinions/weiss.htm

73
Bibliography Search Engines
  • "Daypop Search." in Metamend Software Design
    Limited database online. cited 18 November
    2005. Available from http//www.metamend.com/dayp
    op-search-engine.html.
  • (Metamend also has pages on Feedster,
    IceRocket, and Technocrati)
  • Bradley, Phil. "Search Engines Weblog Search
    Engines." Ariadne, no. 36 Journal on-line.
    Available from http//www.ariadne.ac.uk/issue36/se
    arch-engines/intro.html, 18 November 2005.
  • Notess, G. "The Blog Realm News Sources,
    Searching with Daypop, and Content Management."
    Online 26, no. 5 (Sep/Oct 2002) 70-72. 132354.
  • http//www.slais.ubc.ca/macdonell/bloglit/engine
    s/notess.pdf
  • Pikas, Christina K. "Blog Searching for
    Competitive Intelligence, Brand Image, and
    Reputation Management." Online 29, no. 4 (Jul
    2005-Aug 2005) 16-21. 375032.
  • http//www.slais.ubc.ca/macdonell/bloglit/engine
    s/pikas.pdf
  • Vara, Vauhini. "New Search Engines Help Users
    Find Blogs." Wall Street Journal - Eastern
    Edition 246, no. 47 (09/07/ 2005) D1-D3.
  • http//www.slais.ubc.ca/macdonell/bloglit/engine
    s/vauhini.htm

74
Bibliography Creators and Creation of Blogs,
Users, and Types of Blogs
  • Bar-Ilan, Judit. "Information Hub Blogs." Journal
    of Information Science 31, no. 4 (2005) 297-307.
  • http//www.slais.ubc.ca/macdonell/bloglit/types/
    infohubs.pdf
  • Gouge, Marianne. "Blogs as a Means of
    Preservation Selection for the World Wide Web."
    MA. diss., School of Information and Library
    Science, University of North Carolina at Chapel
    Hill, 2004. http//etd.ils.unc.edu/dspace/bitstrea
    m/1901/108/1/mariannegouge.pdf
  • Herring, Susan C., Lois Ann Scheidt, Sabrina
    Bonus, and Elijah Wright. "Bridging the Gap A
    Genre Analysis of Weblogs." Proceedings of the
    37th Hawaii International Conference on System
    Sciences (2004).
  • http//www.slais.ubc.ca/macdonell/bloglit/types/
    herring.pdf
  • Lindahl, Charlie, and Elise Blount. "Weblogs
    Simplifying Web Publishing." Computer 36, no. 11
    (2003) 114-116.
  • http//www.slais.ubc.ca/macdonell/bloglit/types/
    lindahl.pdf

75
End of Presentation
  • Questions?
  • Any opinions that the class has on blogs or blog
    search engines?
  • Any searches or engines youd like to look at?
Write a Comment
User Comments (0)
About PowerShow.com