Title: Harvesting
1Harvesting
- Joy Veronneau
- JV11_at_Cornell.edu
- Campus Developers Meeting
- July 14, 2004
2What is Harvesting?
- Repeated searches of the electronic directory
used to collect email addresses in batches - An example from our logs
3"((snpne)(cornelledumiddlenamepne)(cnpne)
(givenNamepne)(cornelledudeptname1pne)
(uidpne)(edupersonnicknamepne))"
4There are at least five ways to harvest Cornell's
directory
- From the directory search web page
- By email
- From the command line
- From a browser
- Using finger
- Other?
5from the electronic directory search web page
- This is the most common way to harvest the
directory
Recent attacks cycled through common first names
(John, Mary, Tom, William)
Another used just two letter wildcards like lu
and nh
6by email
- Nickname fuzzy name matching
- Send mail to "somebody_at_cornell.edu"
- Get back up to 2000 email addresses for names
containing the string "somebody"
7(No Transcript)
8from the command line
- using ldapsearch with the directory server as the
host - ldapsearch -h directory.cornell.edu -b
"ouPeople, oCornell University, cus" -x
"(uidab)"
9from a browser
- Using the ldap URL such as
ldap//directory.cornell.edu/ocornell20universit
y,cus??sub?cnparker
10using finger
- finger somebody_at_cornell.edu returns up to
2000 entries including email addresses
11maybe others?
- Any finger/LDAP enabled application such as
Eudora..
12How does harvesting affect us?
- Slows down our machinery
- We get annoying SPAM
- Users are asking us to do something about it
13CPU loads on directory server
normal
harvesting
14What do we currently do to prevent harvesting?
- Right now, not much
- By design, the directory is open to everyone in
the world - Limit of 2000 entries returned per search (thats
high) - No warning on web page about proper use of
information (but are warnings even enforceable?)
15(No Transcript)
16What do other universities do to control
harvesting?
17Yale
18Yale
19University of Florida
20University of Florida
21University of Florida
22Georgetown University
23Georgetown University
24Georgetown University
25Georgetown University
26Stanford University
27Stanford University
28Summary of Options
- Reduce the search limit. Easy to do protects
against all harvesting methods. - Put a warning on the search results page. Easy
to do but maybe no benefit. - Display email addresses only to authenticated
users and use a clickable mail UI. This would
take some development work. - Allow users to choose whether or not their email
address will be displayed. This would take a lot
of development work as well as user education. - Display email addresses as graphical images for
non-authenticated users. Requires further
investigation.
txt2_at_cornell.edu
Searchable text field
Unsearchable .jpeg image
29Things to think about
- Even if we put restrictions on our search page,
other departments could still publish some
information we restrict. - What if an on-campus department has been
harvesting information for legitimate purposes?
30Discussion?