Issues - PowerPoint PPT Presentation

1 / 51
About This Presentation
Title:

Issues

Description:

CS5286 Search Engine Technology and Algorithms/Xiaotie Deng. Lecture 10 Personalization ... ( Alphanumeric strings qualify as tokens. ... – PowerPoint PPT presentation

Number of Views:222
Avg rating:3.0/5.0
Slides: 52
Provided by: scie241
Category:

less

Transcript and Presenter's Notes

Title: Issues


1
Lecture 10 Personalization
  • Issues
  • Techniques
  • Algorithms
  • Protocols

2
Issues
3
Relevance and Ambiguity
  • Relevance of Search Results dependent on users
  • Memorizing search
  • Keep track of search history and allow
    repetitions
  • User profiles
  • Automated generation or user own setup
  • Ambiguity of languages
  • Content understanding of webpages to improve
    search results

4
User Profile and Privacy
  • User profiles
  • Do not want to answer many questions.
  • Multiple users of the same PC make auto-profiles
    not consistent
  • Privacy concerns
  • Users do not want to reveal information

5
Approaches by Eurekster
  • Search filtered by friends
  • SearchMates
  • Friends in different aspects of life.
  • Sharing Searches
  • Sites ones friends have visited
  • Privacy
  • Searches not shared
  • http//searchenginewatch.com/searchday/article.php
    /3301481

6
Utilization of Social Networks
  • http//www.friendster.com/
  • Powered by http//www.eurekster.com/
  • to personalize and enrich search results.
  • http//searchenginewatch.com/searchday/article.php
    /3445791

7
Google personalized search
  • Bookmark history
  • Add search labels and notes
  • Yahoo has myweb feature that save text of a page
  • remove/block results/domains
  • increased password security for personalized
    search
  • Google news will be integrated with it later
  • http//searchenginewatch.com/searchday/article.php
    /3563036

8
Techniques
9
Collecting User Information
  • We use cookies to save the selected hint words
  • a server sends some information to a client to
    store
  • the server can later retrieve its data from that
    client.
  • Servlets send cookies to clients by adding fields
    to HTTP response headers.
  • Clients automatically return cookies by adding
    fields to HTTP request headers.
  • Use information can also be collected and stored
    at server side

10
Cookie as a tool
  • A servlet uses cookie to have clients hold a
    small amount of state-information associated with
    the user. Servlets can use the information in a
    cookie as the user enters a site (as a
    low-security user sign-on, for example), as the
    user navigates around a site (as a repository of
    user preferences for example), or both.

11
Names and values of Cookies
  • Each HTTP request and response header is named
    and has a single value.
  • For example, a cookie could be
  • a header named BookToBuy
  • with a value 304qty1,
  • indicating to the calling application that the
    user wants to buy one copy of the book with stock
    number 304. (Cookies and their values are
    application-specific.)

12
Multiple cookies
  • They may have the same name.
  • For example, a servlet could send two cookies
    with headers named BookToBuy one could have the
    value shown previously, 304qty1, while the other
    could have a value 301qty3.
  • These cookies would indicate that the user wants
    to buy one copy of the book with stock number
    304, and three copies of the book with stock
    number 301.

13
Multiple cookies
  • A server can provide one or more cookies to a
    client.
  • Client software, such as a web browser, is
    expected to support twenty cookies per host, of
    at least four kilobytes each

14
Cookies are shared within a server
  • Cookies that a client stores for a server are
    returned by the client to that server and only
    that server.
  • A server can contain multiple servlets.
  • Because cookies are returned to a server, two or
    more servlets running within a server share
    cookies.

15
Create a Cookie
  • The constructor for thejavax.servlet.http.Cookie
    class creates a cookie with an initial name and
    value. You can change the value of the cookie
    later with its setValue method.
  • The name of the cookie must be an HTTP/1.1 token.
    Tokens are strings that contain none of the
    special characters listed in RFC 2068.
    (Alphanumeric strings qualify as tokens.) In
    addition, names that start with the dollar-sign
    character ("") are reserved by RFC 2109.

16
Value of a cookie
  • The value of the cookie can be any string, though
    null values are not guaranteed to work the same
    way on all browsers. In addition, if you are
    sending a cookie that complies with Netscape's
    original cookie specification, do not use
    whitespace or any of these characters
  • ( ) , " / ? _at_

17
Create cookie before a writer
  • If your servlet returns a response to the user
    with a Writer, create the cookie before accessing
    the Writer. (Because cookies are sent to the
    client as a header, and headers must be written
    before accessing the Writer.)
  • If the CatalogServlet used cookies to keep track
    of a client's book order, the servlet could
    create cookies as follows

18
Create cookie before a writer
  • public void doGet (HttpServletRequest request,
    HttpServletResponse response) throws
    ServletException, IOException
  • // Check for pending adds to the shopping car
  • String bookId request.getParameter("Buy")
  • //If the user wants to add a book, remember it by
    adding a cookie
  • if (bookId ! null)
  • Cookie getBook new Cookie("Buy", bookId)
  • // set content-type header before accessing the
    Writer
  • response.setContentType("text/html")
  • // now get the writer and write the data of the
    response
  • PrintWriter out response.getWriter()
  • out.println("lthtmlgt" "ltheadgtlttitlegt Book Catalog
    lt/titlegtlt/headgt" ...)

19
Sending the Cookie
  • Cookies are sent as headers of the response to
    the client they are added with the addCookie
    method of the HttpServletResponse class.
  • If you are using a Writer to return text data to
    the client, you must call the addCookie method
    before calling the HttpServletResponse's
    getWriter method.
  • Continuing the example of the CatalogServlet, the
    following is code for sending the cookie

20
An example of storing a cookie
  • public void doGet (HttpServletRequest request,
    HttpServletResponse response)
    throws ServletException, IOException
  • ...
  • //If the user wants to add a book, remember it by
    adding a cookie
  • if (values ! null)
  • bookId values0
  • Cookie getBook new Cookie("Buy",
    bookId)
  • getBook.setComment("User has
    indicated a desire "
  • "to buy this book
    from the bookstore.")
  • response.addCookie(getBook)
  • ...

21
Retrieving Cookies
  • Clients return cookies as fields added to HTTP
    request headers. To retrieve any cookie, you must
    retrieve all the cookies using the getCookies
    method of the HttpServletRequest class.
  • The getCookies method returns an array of Cookie
    objects, which you can search to find the cookie
    or cookies that you want. (Remember that multiple
    cookies can have the same name. To get the name
    of a cookie, use its getName method.)

22
An Example of retrieve and delete
  • public void doGet (HttpServletRequest request,
    HttpServletResponse response) throws
    ServletException, IOException
  • / Handle any pending deletes from the
    shopping cart /
  • String bookId request.getParameter("Remo
    ve")
  • if (bookId ! null)
  • // Find the cookie that pertains to
    the book to remove
  • Cookie cookies request.getCookies(
    ) //find thisCookie
  • // Delete the book's cookie by
    setting its maximum age to zero
  • thisCookie.setMaxAge(0)
  • // also set content type header before accessing
    the Writer
  • response.setContentType("text/html")
  • PrintWriter out response.getWriter()
  • //Print out the response
  • out.println("lthtmlgt ltheadgt"
  • "lttitlegtYour Shopping
    Cartlt/titlegt" ...)

23
Getting the Value of a Cookie
  • To find the value of a cookie, use its getValue
    method.

24
  • public void doGet (HttpServletRequest
    request,HttpServletResponse response) throws
    ServletException, IOException
  • / Handle any pending deletes from the shopping
    cart /
  • String bookId request.getParameter("Remo
    ve")
  • if (bookId ! null)
  • // Find the cookie that pertains to that
    book
  • Cookie cookies request.getCookies()
  • for(i0 i lt cookies.length i)
  • Cookie thisCookie cookiei
  • if (thisCookie.getName().equals("Buy")
  • thisCookie.getValue().equals(bookId))
  • thisCookie.setMaxAge(0)
  • response.setContentType("text/html")
  • PrintWriter out response.getWriter()
  • out.println("lthtmlgt ltheadgt"
  • "lttitlegtYour Shopping
    Cartlt/titlegt" ...)

25
Algorithms
26
Threads of Ideas
  • History reflects user interests
  • Too many information to utilize
  • Summary of the history can be used to help
  • Positive examples? positive keywords
  • Negative examples? negative keywords
  • Use of positive/negative keywords helps with
    focused search.

27
Summarizing multiple document
  • Computation of common nodes
  • concept-match(c,G1)concept-match(c,G2)
  • concept-match(c,G1) is true if there is a c1 in
    G1
  • word( c) word (c1) (after stemming) or
  • synonym(c, c1) holds
  • Summary
  • sentences covering the shared terms
  • possible extension assign significance to the
    sentences according to the graph structure and
    extract sentences of high significance in both
    files

28
Applying Summary to Focused Information Retrieval
  • Requirement different users need different
    information even for the same query
  • Focused for a specific user, find some specific
    (focused) information for him, whether the search
    engine rank it low or high.
  • Focused queries are related in some sense.
  • Avoid putting an extra burden on users

29
The Distinguisher approach
  • We classify documents into two classes
  • Positive user is interested in them
  • Negative user is not interested in them.
  • We summarize it by extracting information that
    are in the positives not in the negatives.
  • The solution is called a distinguisher.

30
Users action
  • Input queries
  • Click the search button
  • Browse the result, pick some URLs for further
    reading if it is interesting for him
  • No more interaction with the system
  • Do not want to mark YES/NO to the result

31
A simple solution
  • Find some hint words (distinguisher)
  • Distinguisher can be extracted from the search
    result without explicit help from user
  • Re-order the web pages according to the
    distinguishers

32
Distinguisher
  • Hint words (distinguishers) are defined as
    positive indicating users preference and
    negative indicating irrelevant topics
  • Find the distinguisher that separate yes and no
    instance for those hyperlinks descriptions of
    which have been examined by user.
  • Assume that some words must be interested to the
    user, then the user can click on the URL

33
On-Line Selection
  • Make no requirement on users for them to train
    our learning algorithms
  • Provide a mechanism for user to get actively
    involved in the learning algorithms
  • Implemented in Java applet and JavaScript

34
Mathematical Model
  • Suppose user click l pages (positive pages)
    among all lm returned links.
  • We first choose n words among all the words
    appear in the lm pages such that the frequency
    these n words appear in the l pages is much
    bigger than in the other pages

35
Mathematical Model
  • Choose the hint words
  • Positive pages At least ?? of the positive
    hint words and at most ? of the negative hint
    words appear.
  • Other pages At most ? of the positive hint
    words and at least ? of the negative hint words
    appear
  • ?gt ?, ? gt ? as reassigned numbers

36
Example
  • We use paragraphs in CS web pages as positive
    examples and use paragraph in EE web pages as
    negative examples.
  • We want to distinguish the former from the
    latter.
  • Use ?2, ?0, ?2, ?1,
  • positive Computer Science
  • both appear in cs web page, none in ee web page
  • Negative Electronic Engineering
  • both appear in ee web page, one in cs web page

37
Example
  • The Department of Computer Science was
    established in 1984 and has since evolved from a
    primarily teaching-oriented department in its
    Polytechnic days into one which excels in both
    teaching and research within the Faculty of
    Science and Engineering of the now City
    University.
  • In addition to offering traditional courses such
    as foundations of computer science, computer
    architecture and software engineering, our
    curriculum also exposes our students to the
    latest advances in distributed databases,
    parallel computing, computer graphics, internet
    programming, multimedia systems and high speed
    networking.

38
Example
  • Aspiration
  • The Department of Electronic Engineering aspires
    to become a leading department of its kind in the
    Asia-Pacific region in the first decade of the
    new millennium.
  • Mission
  • To educate professional engineers of the highest
    caliber in areas of electronic and information
    engineering
  • To conduct curiosity-driven basic research,
    application-inspired frontier research and
    product-motivated applied research in
    strategically selected niche areas
  • To transfer cutting-edge technologies to both
    manufacturing and service industries, and provide
    professional consultancy for the community

39
Mathematical Model
40
Mathematical Model
41
Implementation
  • We use meta-search engine that collect data from
    commercially available search engines.
  • The initial ranking of a hyperlink is a linear
    function of their ranks in the search engines
    that provided data.
  • Use the snippets returned by the search engines
    for distinguisher collection as the users can
    only see the snippets before they can choose to
    browse the original pages.

42
Architecture
43
Distinguishing element collection
  • Use JavaScript to catch which URLs have bee
    clicked bye the user
  • When reload or change result page, do the
    collection job
  • Save the distinguishing elements in the users
    cookie file, avoid the hard usage in the server
    side

44
Distinguishing element collection
  • Java Applet for the user to modify his own cookie
    file
  • When user click the search button again, resort
    the result, show the user the new order of the
    result.
  • Take more consideration to the distinguishing
    elements than the original order returned by
    search engines, bring the specific user more
    interesting results.

45
Experiment
  • Compare the original results to our meta search
    engines
  • Interesting result that shows related information
    on staffs and postgraduate students in the City
    University of Hong Kong

46
Experiment
47
Conclusion
  • It is very useful tool for focused information
    retrieval when all queries are related in some
    sense.
  • Need improvement for removing noise when applied
    as a general tool.

48
Protocols
49
Deep Search
  • Could a protocol help user to organize the
    information in the database to point to the
    target webpage, possible with a vague
    description?
  • Directorial?
  • Interactive?
  • http//www.mach9design.com/deep/deep1.html

50
Shared Search
  • Passive
  • Utilize friends search information
  • Active
  • Could users actively help each other in search?

51
Applications
  • Project Partner Search
  • http//www.idealist-extend.net/activesearch.php
  • Expert search engine?
  • Encyclopedia wikipedia (yahoo version)
  • How to make it active?
Write a Comment
User Comments (0)
About PowerShow.com