Privacy in Data Mining - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

Privacy in Data Mining

Description:

Data Mining: Data mining - or knowledge discovery in data bases (KDD) is the ... Ethical Sensitivity in Data Mining Results, 2004, Peter Fule and John Roddick ... – PowerPoint PPT presentation

Number of Views:91
Avg rating:3.0/5.0
Slides: 17
Provided by: krishna
Category:
Tags: data | fule | mining | privacy

less

Transcript and Presenter's Notes

Title: Privacy in Data Mining


1
Privacy in Data Mining
  • Presented by
  • Kalyan K Beemanapalli Vamshi
    Kodithava
  • Graduate Student
    Graduate Student
  • Department of CS
    Department of CS

2
Outline
  • Definitions, Introduction, Importance and
    Relevance to the Class
  • Key Issues and Key Results
  • Techniques Developed
  • Open Issues/ Research Directions
  • References
  • Conclusion

3
Definitions..
  • Data Mining Data mining - or knowledge discovery
    in data bases (KDD) is the nontrivial process
    of identifying valid, novel, potentially useful,
    and ultimately understandable patterns in data
  • Privacy Individuals desire and ability to keep
    certain information about themselves hidden from
    others. What is Corporate Privacy?
  • Relationship The primary goal of data mining is
    to extract hidden relationships and patterns
    among data items. Hence the privacy concerns of
    data mining

4
Importance.
  • Example from Health Informatics of privacy
    breaches due to data mining
  • The Database Inference Problem
  • Data mining makes the Inference problem quite
    dangerous
  • Hence, data mining algorithms are being revisited
    from the angle of privacy, security and Civil
    Liberties
  • Relevance to Class Before applying Data Mining
    Techniques it is important to know its positive
    and negative implications

5
Outline
  • Definitions, Introduction, Importance and
    Relevance to the Class
  • Key Issues and Key Results
  • Techniques Developed
  • Open Issues/ Research Directions
  • References
  • Conclusion

6
Privacy Preserving Data Mining
  • Approaches adopted for preserving privacy
  • Data Distribution
  • Data Modification
  • Data Mining Algorithm
  • Data or rule hiding
  • Privacy Preservation
  • Concerns Performance, Scalability, Data Utility,
    Level of Uncertainty, Resistance
  • Very thin line between utilizing the power of
    data mining techniques and preserving privacy.

7
Heuristic-Based Techniques
  • Association Rule Confusion
  • Classification Rule Confusion
  • Privacy Preserving Clustering
  • Cryptography-Based Techniques
  • Reconstruction-Based Techniques
  • These techniques are combined with various data
    modification techniques and different data
    distribution techniques and numerous algorithms
    have been developed

8
Reconstruction-Based Techniques
  • Outcome of the work done by Rakesh Agrawal and
    Ramakrishnan Srikant at IBM Research, Almaden
  • Step 1Creating randomized data sample by data
    perturbation of individual data records
  • Step 2 Reconstruct distributions, not values in
    individual records.
  • Step 3 By using the reconstructed distributions,
    build the decision tree classifier
  • For Reconstruction they used the Bayesian
    approach and proposed three algorithms for
    building decision trees that rely on
    reconstructed distributions

9
Accuracy Analysis
10
Outline
  • Definitions, Introduction, Importance and
    Relevance to the Class
  • Key Issues and Key Results
  • Techniques Developed
  • Open Issues/ Research Directions
  • References
  • Conclusion

11
Open Issues/ Research Directions
  • Data Warehousing and the Inference problem
  • Preparing Perturbed Databases for a combination
    of Algorithms
  • Social Effects. Work with Social Scientists to
    preserve privacy over cultures
  • Formulating legal rules and developing data
    mining algorithms accordingly
  • Privacy Inference Controller
  • Quantifying Privacy?
  • How about formulating something similar to ACID
    properties for Perturbed Databases?

12
References
  • Security and Privacy Implications of Data Mining,
    1996, Chris Clifton and Don Marks
  • Defining privacy for Data Mining , Chris Clifton,
    Murat Kantarcioglu and Jaideep Vaidya, Purdue
    University
  • Data Mining, National Security, Privacy and Civil
    Liberties, Bhavani Thuraisingham, The National
    Science Foundation
  • A Framework for Privacy Preserving Classification
    in Data Mining, 2004 Md.Zahidul Islam and
    Ljiljana Brankovic
  • Privacy Preserving Mining of Association Rules,
    2002, Alexandre Evfimlevski, Ramakrishnan
    Srikant, Rakesh Agrawal and Johannes Gehrke, IBM
    Almaden Research Center

13
References Continued
  • Privacy Preserving Data Mining, 2000, Rakesh
    Agrawal and Ramakrishnan Srikanth, IBM Research,
    Almaden
  • Privacy Preserving Data Mining, Advances in
    Crptology, 2000, Y.Lindell and Benny Pinkas.
  • Detecting Privacy and Ethical Sensitivity in Data
    Mining Results, 2004, Peter Fule and John Roddick
  • Limiting Privacy Breaches in Privacy Preserving
    Data Mining, 2003, Alexandre Evvfimievski,
    Johannes Gehrke and Ramakrishnan Srikant
  • State-of-the-art in Privacy Preserving Data
    Mining, Vassilios S.Verykios, Elisa Bertino, Igor
    Nai Fovino, Provenza, Yucel Saygin and Yannis
    Theodoridis

14
Conclusion
  • Presented an overview and brief insight into the
    new research in the area of data mining
  • Statistical Databases were the first to think
    about privacy issues when data is being analyzed
  • Techniques and approaches used by researchers for
    preserving privacy while mining the data
  • Various open issues which have to be addressed
    and the potential research directions
  • Privacy, Secrecy, Ethical Sensitivity , National
    Security Civil Liberties.

15
Conclusion.Continued
  • There is so much to talk about this issue of
    privacy in general and specifically related to
    data mining.
  • Data Mining can be used to its fullest ability
    only if researchers address the problem of
    privacy and develop techniques in this direction

16
Queries
Thank You.
Write a Comment
User Comments (0)
About PowerShow.com