Title: Privacy-Preserving%20Databases%20and%20Data%20Mining
1Privacy-Preserving Databases and Data Mining
- Yücel SAYGIN
- ysaygin_at_sabanciuniv.edu
- http//people.sabanciuniv.edu/ysaygin/
2Outline
- Privacy an informal discussion
- Overview of data mining
- Overview of privacy preserving databases and data
mining - Privacy preserving data mining
- Privacy protection against data mining
- Privacy preserving databases
- Future research directions
3Privacy What, Why, and How
- Privacy Giving the people the right to be left
alone - It is one of the fundamental rights of people in
western civilizations - Privacy of data Giving the data owners the right
to say what can be done with their data
4Is data privacy something new?
- Privacy has been one of the fundamental rights of
people - Maybe termed differently but it has been studied
in the past - Statistical databases, statistical disclosure
control - The inference problem
5Why privacy is a really big issue these days?
- Technology is really integrated with our personal
life - With new technology Networking, WEB
- New devices Mobile Phones, RFID tags, Computers,
digital cameras - Which means that data about us, and about what we
are doing can be collected easily and at a
fraction of the cost 10 years ago. - Navigation patterns in WEB
- Location information (wireless phones, RFID tags)
- Transactions (e-commerce, POS)
- Your emails (now scanned by gmail to display
ads) (was a big discussion in the CFP conference
at Berkeley this year )
6Why privacy is a really big issue these days?
CAPPS II (Computer Assisted Passenger
Prescreening System) collects flight reservation
information as well as commercial information
about passengers. This data, in turn, can be
utilized by government security agencies.
Although CAPPS represents US national data
collection efforts, it also has an effect on
other countries.
7Why privacy is a really big issue these days?
The following sign at the KLM ticket desk in
Amsterdam International Airport demonstrates the
point Please note that KLM Royal Dutch
Airlines and other airlines are required by new
security laws in the US and several other
countries to give security customs and
immigration authorities access to passenger data.
Accordingly any information we hold about you and
your travel arrangements may be disclosed to the
concerning authorities of these countries in your
itinerary.
8Why privacy is a really big issue these days?
Some of the largest airline companies in US,
including American, United and Northwest, turned
over millions of passenger records to the FBI
SSchwartz J. Micheline M. (2004). Airlines
Gave F.B.I. Millions of Records on Travelers
After 9/11 NY Times, May 1.
9Why privacy is a really big issue these days?
Total Information Awareness (TIA) project in US,
which aims to build a centralized database that
will store the credit card transactions, emails,
web site visits, flight details of Americans was
not funded by the Congress due to privacy
concerns.
10Why is privacy a really big issue these days?
- Data about us is being collected and stored
somewhere - We need to have the right to control
- what data is collected about us,
- how long it should be stored,
- who is going to see it
- and how it is going to be used
11But we have all this security research going on
for decades!
- Security (Database, Network etc) is necessary but
not sufficient to ensure full privacy. - Once someone has access to the data what can be
done with it (e.g. giving your email to a third
party, giving away your profile, shopping
behavior etc.) needs to be regulated.
12Some of the past research in the context of
security is useful for data privacy
- Disclosure Control in statistical databases
- The inference problem and proposed solutions
- Encryption techniques
- Secure multi party computation
13So what have Data Mining and Databases to do with
Privacy?
- They deal with data mostly about people.
Therefore we need to integrate privacy into
database systems and data mining tools. - Data mining is seen as a magic tool that can find
secret information in piles of data, therefore
there is some hesitation in public about data
mining - This is partially true
- But they are just tools designed by human beings,
that need some good training data, and experts to
interpret the results.
14Data mining and Privacy Issues Gained Momentum in
US
- Pentagon has released a study that recommends
the government to pursue specific technologies as
potential safeguards against the misuse of
data-mining systems similar to those now being
considered by the government to track civilian
activities electronically in the United States
and abroad. - "Perhaps the strongest protection against
abuse of information systems is Strong Audit
mechanisms we need to watch the watchers" - Markoff J. (2002). Study Seeks
Technology Safeguards for Privacy. NY Times, 19
December. - This shows us that even the most aggressive data
collectors in the US are aware of the fact that
the data mining tools could be misused and we
need a mechanism to protect the confidentiality
and privacy of people.
15Privacy Issues Gained Momentum among researchers.
- More research funding
- More projects
- More sessions on privacy in database and data
mining conferences - Search google privacy data mining , you will
have pages of results. It was not like that 3-4
years ago. - Centers for privacy research (IBM Almaden,
Stanford, Purdue Univ. )
16Overview of Data Mining
- Data mining is a combination of statistics
- Data mining models
- Patterns (associations, sequences,)
- Clusters
- Classification
17Privacy preserving data mining
- Privacy preserving classification model
construction - Privacy preserving data clustering
- Privacy preserving association rule mining
18Privacy preserving classification
- Reference Rakesh Agrawal and Ramakrishnan
Srikant. Privacy-Preserving Data Mining.
SIGMOD, 2000, Dallas, TX. -