Title: Issues
1Lecture 10 Personalization
- Issues
- Techniques
- Algorithms
- Protocols
2Issues
3Relevance and Ambiguity
- Relevance of Search Results dependent on users
- Memorizing search
- Keep track of search history and allow
repetitions - User profiles
- Automated generation or user own setup
- Ambiguity of languages
- Content understanding of webpages to improve
search results
4User Profile and Privacy
- User profiles
- Do not want to answer many questions.
- Multiple users of the same PC make auto-profiles
not consistent - Privacy concerns
- Users do not want to reveal information
5Approaches by Eurekster
- Search filtered by friends
- SearchMates
- Friends in different aspects of life.
- Sharing Searches
- Sites ones friends have visited
- Privacy
- Searches not shared
- http//searchenginewatch.com/searchday/article.php
/3301481
6Utilization of Social Networks
- http//www.friendster.com/
- Powered by http//www.eurekster.com/
- to personalize and enrich search results.
- http//searchenginewatch.com/searchday/article.php
/3445791
7Google personalized search
- Bookmark history
- Add search labels and notes
- Yahoo has myweb feature that save text of a page
- remove/block results/domains
- increased password security for personalized
search - Google news will be integrated with it later
- http//searchenginewatch.com/searchday/article.php
/3563036
8Techniques
9Collecting User Information
- We use cookies to save the selected hint words
- a server sends some information to a client to
store - the server can later retrieve its data from that
client. - Servlets send cookies to clients by adding fields
to HTTP response headers. - Clients automatically return cookies by adding
fields to HTTP request headers. - Use information can also be collected and stored
at server side
10Cookie as a tool
- A servlet uses cookie to have clients hold a
small amount of state-information associated with
the user. Servlets can use the information in a
cookie as the user enters a site (as a
low-security user sign-on, for example), as the
user navigates around a site (as a repository of
user preferences for example), or both.
11Names and values of Cookies
- Each HTTP request and response header is named
and has a single value. - For example, a cookie could be
- a header named BookToBuy
- with a value 304qty1,
- indicating to the calling application that the
user wants to buy one copy of the book with stock
number 304. (Cookies and their values are
application-specific.)
12Multiple cookies
- They may have the same name.
- For example, a servlet could send two cookies
with headers named BookToBuy one could have the
value shown previously, 304qty1, while the other
could have a value 301qty3. - These cookies would indicate that the user wants
to buy one copy of the book with stock number
304, and three copies of the book with stock
number 301.
13Multiple cookies
- A server can provide one or more cookies to a
client. - Client software, such as a web browser, is
expected to support twenty cookies per host, of
at least four kilobytes each
14Cookies are shared within a server
- Cookies that a client stores for a server are
returned by the client to that server and only
that server. - A server can contain multiple servlets.
- Because cookies are returned to a server, two or
more servlets running within a server share
cookies.
15Create a Cookie
- The constructor for thejavax.servlet.http.Cookie
class creates a cookie with an initial name and
value. You can change the value of the cookie
later with its setValue method. - The name of the cookie must be an HTTP/1.1 token.
Tokens are strings that contain none of the
special characters listed in RFC 2068.
(Alphanumeric strings qualify as tokens.) In
addition, names that start with the dollar-sign
character ("") are reserved by RFC 2109.
16Value of a cookie
- The value of the cookie can be any string, though
null values are not guaranteed to work the same
way on all browsers. In addition, if you are
sending a cookie that complies with Netscape's
original cookie specification, do not use
whitespace or any of these characters - ( ) , " / ? _at_
17Create cookie before a writer
- If your servlet returns a response to the user
with a Writer, create the cookie before accessing
the Writer. (Because cookies are sent to the
client as a header, and headers must be written
before accessing the Writer.) - If the CatalogServlet used cookies to keep track
of a client's book order, the servlet could
create cookies as follows
18Create cookie before a writer
- public void doGet (HttpServletRequest request,
HttpServletResponse response) throws
ServletException, IOException - // Check for pending adds to the shopping car
- String bookId request.getParameter("Buy")
- //If the user wants to add a book, remember it by
adding a cookie - if (bookId ! null)
- Cookie getBook new Cookie("Buy", bookId)
-
- // set content-type header before accessing the
Writer - response.setContentType("text/html")
- // now get the writer and write the data of the
response - PrintWriter out response.getWriter()
- out.println("lthtmlgt" "ltheadgtlttitlegt Book Catalog
lt/titlegtlt/headgt" ...) -
19Sending the Cookie
- Cookies are sent as headers of the response to
the client they are added with the addCookie
method of the HttpServletResponse class. - If you are using a Writer to return text data to
the client, you must call the addCookie method
before calling the HttpServletResponse's
getWriter method. - Continuing the example of the CatalogServlet, the
following is code for sending the cookie
20An example of storing a cookie
- public void doGet (HttpServletRequest request,
HttpServletResponse response)
throws ServletException, IOException - ...
- //If the user wants to add a book, remember it by
adding a cookie - if (values ! null)
- bookId values0
- Cookie getBook new Cookie("Buy",
bookId) - getBook.setComment("User has
indicated a desire " - "to buy this book
from the bookstore.") - response.addCookie(getBook)
-
- ...
-
21Retrieving Cookies
- Clients return cookies as fields added to HTTP
request headers. To retrieve any cookie, you must
retrieve all the cookies using the getCookies
method of the HttpServletRequest class. - The getCookies method returns an array of Cookie
objects, which you can search to find the cookie
or cookies that you want. (Remember that multiple
cookies can have the same name. To get the name
of a cookie, use its getName method.)
22An Example of retrieve and delete
- public void doGet (HttpServletRequest request,
HttpServletResponse response) throws
ServletException, IOException - / Handle any pending deletes from the
shopping cart / - String bookId request.getParameter("Remo
ve") - if (bookId ! null)
- // Find the cookie that pertains to
the book to remove - Cookie cookies request.getCookies(
) //find thisCookie - // Delete the book's cookie by
setting its maximum age to zero - thisCookie.setMaxAge(0)
- // also set content type header before accessing
the Writer - response.setContentType("text/html")
- PrintWriter out response.getWriter()
- //Print out the response
- out.println("lthtmlgt ltheadgt"
- "lttitlegtYour Shopping
Cartlt/titlegt" ...)
23Getting the Value of a Cookie
- To find the value of a cookie, use its getValue
method.
24- public void doGet (HttpServletRequest
request,HttpServletResponse response) throws
ServletException, IOException - / Handle any pending deletes from the shopping
cart / - String bookId request.getParameter("Remo
ve") - if (bookId ! null)
- // Find the cookie that pertains to that
book - Cookie cookies request.getCookies()
- for(i0 i lt cookies.length i)
- Cookie thisCookie cookiei
- if (thisCookie.getName().equals("Buy")
- thisCookie.getValue().equals(bookId))
- thisCookie.setMaxAge(0)
- response.setContentType("text/html")
- PrintWriter out response.getWriter()
- out.println("lthtmlgt ltheadgt"
- "lttitlegtYour Shopping
Cartlt/titlegt" ...)
25Algorithms
26Threads of Ideas
- History reflects user interests
- Too many information to utilize
- Summary of the history can be used to help
- Positive examples? positive keywords
- Negative examples? negative keywords
- Use of positive/negative keywords helps with
focused search.
27Summarizing multiple document
- Computation of common nodes
- concept-match(c,G1)concept-match(c,G2)
- concept-match(c,G1) is true if there is a c1 in
G1 - word( c) word (c1) (after stemming) or
- synonym(c, c1) holds
- Summary
- sentences covering the shared terms
- possible extension assign significance to the
sentences according to the graph structure and
extract sentences of high significance in both
files
28Applying Summary to Focused Information Retrieval
- Requirement different users need different
information even for the same query - Focused for a specific user, find some specific
(focused) information for him, whether the search
engine rank it low or high. - Focused queries are related in some sense.
- Avoid putting an extra burden on users
29The Distinguisher approach
- We classify documents into two classes
- Positive user is interested in them
- Negative user is not interested in them.
- We summarize it by extracting information that
are in the positives not in the negatives. - The solution is called a distinguisher.
30Users action
- Input queries
- Click the search button
- Browse the result, pick some URLs for further
reading if it is interesting for him - No more interaction with the system
- Do not want to mark YES/NO to the result
31A simple solution
- Find some hint words (distinguisher)
- Distinguisher can be extracted from the search
result without explicit help from user - Re-order the web pages according to the
distinguishers
32Distinguisher
- Hint words (distinguishers) are defined as
positive indicating users preference and
negative indicating irrelevant topics - Find the distinguisher that separate yes and no
instance for those hyperlinks descriptions of
which have been examined by user. - Assume that some words must be interested to the
user, then the user can click on the URL
33On-Line Selection
- Make no requirement on users for them to train
our learning algorithms - Provide a mechanism for user to get actively
involved in the learning algorithms - Implemented in Java applet and JavaScript
34Mathematical Model
- Suppose user click l pages (positive pages)
among all lm returned links. - We first choose n words among all the words
appear in the lm pages such that the frequency
these n words appear in the l pages is much
bigger than in the other pages
35Mathematical Model
- Choose the hint words
- Positive pages At least ?? of the positive
hint words and at most ? of the negative hint
words appear. - Other pages At most ? of the positive hint
words and at least ? of the negative hint words
appear - ?gt ?, ? gt ? as reassigned numbers
36Example
- We use paragraphs in CS web pages as positive
examples and use paragraph in EE web pages as
negative examples. - We want to distinguish the former from the
latter. - Use ?2, ?0, ?2, ?1,
- positive Computer Science
- both appear in cs web page, none in ee web page
- Negative Electronic Engineering
- both appear in ee web page, one in cs web page
37Example
- The Department of Computer Science was
established in 1984 and has since evolved from a
primarily teaching-oriented department in its
Polytechnic days into one which excels in both
teaching and research within the Faculty of
Science and Engineering of the now City
University. - In addition to offering traditional courses such
as foundations of computer science, computer
architecture and software engineering, our
curriculum also exposes our students to the
latest advances in distributed databases,
parallel computing, computer graphics, internet
programming, multimedia systems and high speed
networking.
38Example
- Aspiration
- The Department of Electronic Engineering aspires
to become a leading department of its kind in the
Asia-Pacific region in the first decade of the
new millennium. - Mission
- To educate professional engineers of the highest
caliber in areas of electronic and information
engineering - To conduct curiosity-driven basic research,
application-inspired frontier research and
product-motivated applied research in
strategically selected niche areas - To transfer cutting-edge technologies to both
manufacturing and service industries, and provide
professional consultancy for the community
39Mathematical Model
40Mathematical Model
41Implementation
- We use meta-search engine that collect data from
commercially available search engines. - The initial ranking of a hyperlink is a linear
function of their ranks in the search engines
that provided data. - Use the snippets returned by the search engines
for distinguisher collection as the users can
only see the snippets before they can choose to
browse the original pages.
42Architecture
43Distinguishing element collection
- Use JavaScript to catch which URLs have bee
clicked bye the user - When reload or change result page, do the
collection job - Save the distinguishing elements in the users
cookie file, avoid the hard usage in the server
side
44Distinguishing element collection
- Java Applet for the user to modify his own cookie
file - When user click the search button again, resort
the result, show the user the new order of the
result. - Take more consideration to the distinguishing
elements than the original order returned by
search engines, bring the specific user more
interesting results.
45Experiment
- Compare the original results to our meta search
engines - Interesting result that shows related information
on staffs and postgraduate students in the City
University of Hong Kong
46Experiment
47Conclusion
- It is very useful tool for focused information
retrieval when all queries are related in some
sense. - Need improvement for removing noise when applied
as a general tool.
48Protocols
49Deep Search
- Could a protocol help user to organize the
information in the database to point to the
target webpage, possible with a vague
description? - Directorial?
- Interactive?
- http//www.mach9design.com/deep/deep1.html
50Shared Search
- Passive
- Utilize friends search information
- Active
- Could users actively help each other in search?
51Applications
- Project Partner Search
- http//www.idealist-extend.net/activesearch.php
- Expert search engine?
- Encyclopedia wikipedia (yahoo version)
- How to make it active?