Privacypreserving data mining 1 - PowerPoint PPT Presentation

1 / 23

About This Presentation

Title:

Privacypreserving data mining 1

Description:

Privacy-preserving data mining (1) Outline. A brief introduction to ... Irregularly shaped clusters and noises. Clustering algorithms. Typical ones. Kmeans ... – PowerPoint PPT presentation

Number of Views:50

Avg rating:3.0/5.0

Slides: 24

Provided by: keke9

Category:

Tags: data | irregularly | mining | privacypreserving

Transcript and Presenter's Notes

Title: Privacypreserving data mining 1

1
Privacy-preserving data mining (1)
2
Outline

A brief introduction to learning algorithms
Classification algorithms
Clustering algorithms
Addressing privacy issues in learning
Single dataset publishing
Distributed multiple datasets
How data is partitioned

3
A quick review

Machine learning algorithms
Supervised learning (classification)
Training data have class labels
Find the boundary between classes
Unsupervised learning (clustering)
Training data have no labels
Similarity measure is the key
Grouping records based on the similarity measure

4
A quick review

Good tutorials
http//www.cs.utexas.edu/mooney/cs391L/
Top 10 data mining algorithms
www.cs.uvm.edu/icdm/algorithms/10Algorithms-08.pd
f
We will review the basic ideas of some algorithms

5
C4.5 decision tree (classification)

Based on ID3 algorithm
Convert decision tree to rule set
From the root to a leave ? a rule
Prune the rules
Cross validation

Split data to N folds
In each round
training
validating
testing
Testing the generalization power
For choosing the best parameters
Final result the average of N testing results
6
Naïve bayes (classification)
Two classes 0/1, feature vector x (x1,x2,, xn)
Apply bayes rule
Assume independent features
Easy to count f(xiclass label) with the
training data
7
K nearest neighbor (classification)
instance-based learning
Classifying the point
Decision area Dz
More general kernel methods
8
Linear classifier (classification)
wTx b 0
wTx b gt 0
wTx b lt 0
f(x) sign(wTx b)

Examples
Perceptron
Linear discriminant analysis(LDA)

9
There are infinite number of linear
separators Which one is optimal?
10
Support Vector Machine (classification)

Distance from example xi to the separator is
Examples closest to the hyperplane are support
vectors.
Margin ? of the separator is the distance between
support vectors.

?
Maximizing
r

Extended to handle
Nonlinear
Noisy margin
Large datasets

11
Boosting (classification)

Classifier ensembles
Average prediction of a set of classifiers
trained on the same set of data
Intuition
The output of a classifier has certain amount of
variance
Averaging can reduce the variance ? improve the
accuracy

12
AdaBoost

Freund Y, Schapire RE (1997) A decision-theoretic
generalization of on-line learning and an
application to boosting. J Comput Syst Sci

13

Gradient boosting
J. Friedman stochastic gradient boosting,
http//citeseer.ist.psu.edu/old/126259.html

14
Challenges in Clustering

Definition of similarity measures
Point-wise
Euclidean
Cosine ( document similarity)
Correlation
Set-wise
Min/max distance between two sets
Entropy based (categorical data)

15
Challenges in Clustering

Hierarchical
1. Merging most similar pairs each step
2. Until reaching desired number of clusters
Partitioning (k-means)
1. Set initial centroids
2. Partition the data
3. Adjust the centroids
4. Iterate on 2 and 3 until converging
Other classification of algorithms
Aglommerative (bottom-up) methods
Divisive (partitional, top-down)

16
Challenges in Clustering

Efficiency of the algorithm large datasets
Linear-cost algorithms k-means
However, the costs of many algorithms are
quadratic
Perform a three-phase processing
Sampling
Clustering
Labeling

17
Challenges in Clustering

Irregularly shaped clusters and noises

18
Clustering algorithms

Typical ones
Kmeans
Expectation-Maximization (EM)
A lot of clustering algorithms addressing
different challenges
Good survey
AK Jain etc. Data Clustering A Review, ACM
Computing Surveys, 1999

19
PPDM issues

How data is distributed
Single party releases data
Multiparty collaboratively mining data
Pooling data
Cryptographic protocols
How data is partitioned
Horizontally
vertically

20
Single party

Data perturbation
Rakesh00, for decision tree
Chen05, for many classifiers and clustering
algorithms
Anonymization
Top-down/bottom-up decision tree

21
Multiple parties
user 1
user 1
user 1
network
Perturbed data
Service-based computing
Peer-to-peer computing

Perturbation anonymization
Papers 89,92,94,185,

Cryptographic approaches
Papers 95-99,104,107,108

22
How data is partitioned

Horizontally partitioned
All additive (and some multiplicative)
perturbation methods
Protocols
Kmeans, svm, naïve bayes, bayesian network
Vertically partitioned
All additive perturbation methods
Protocols
Kmeans, bayesian network

23
Challenges and opportunities

Many modeling methods have no privacy-preserving
version
Cost protocol based approaches
Limitation of column-based additive perturbation
Complexity
The advantage of geometric data perturbation
Covers many different modeling methods

Write a Comment

User Comments (0)

About PowerShow.com

Recommended Relevance Latest Highest Rated Most Viewed

Sort by:

Related More from user

CrystalGraphics Presentations

Introducing-PowerShowcom PowerPoint PPT Presentation

Introducing-PowerShowcom - Introducing-PowerShowcom (Without Music)

CrystalGraphics 3D Character Slides for PowerPoint PowerPoint PPT Presentation

CrystalGraphics 3D Character Slides for PowerPoint - CrystalGraphics 3D Character Slides for PowerPoint

Chart and Diagram Slides for PowerPoint PowerPoint PPT Presentation

Chart and Diagram Slides for PowerPoint - Beautifully designed chart and diagram s for PowerPoint with visually stunning graphics and animation effects. Our new CrystalGraphics Chart and Diagram Slides for PowerPoint is a collection of over 1000 impressively designed data-driven chart and editable diagram s guaranteed to impress any audience. They are all artistically enhanced with visually stunning color, shadow and lighting effects. Many of them are also animated. And they’re ready for you to use in your PowerPoint presentations the moment you need them. – PowerPoint PPT presentation

Related Presentations

PrivacyPreserving Data Mining PowerPoint PPT Presentation

PrivacyPreserving Data Mining - Possible to join both databases (find corresponding transactions) ... If a party cheats. Either party is caught. Or party suffers an economic loss ... | PowerPoint PPT presentation | free to view

Data Mining: Crossing the Chasm PowerPoint PPT Presentation

Data Mining: Crossing the Chasm - ... data mining is to make the transition from being an early market technology to ... in the technology adoption life ... Data mining, a great technology ... | PowerPoint PPT presentation | free to view

Infrastructure, Data Cleansing and Mining for Scientific Simulations PowerPoint PPT Presentation

Infrastructure, Data Cleansing and Mining for Scientific Simulations - Data mining applications discover hidden knowledge in environmental ... Technology & Software. Data mining technology. Clustering. K-means. Orthogonal cluster ... | PowerPoint PPT presentation | free to view

The Data Mining Visual Environment PowerPoint PPT Presentation

The Data Mining Visual Environment - The Data Mining Visual Environment Motivation Major problems with existing DM systems They are based on non-extensible frameworks. They provide a non-uniform mining ... | PowerPoint PPT presentation | free to view

Data Mining Tools For ZLE PowerPoint PPT Presentation

Data Mining Tools For ZLE - Data Mining Tools For ZLE Copying and Use Restrictions: Material under this presentation is the Intellectual Property of HP Corporation and Genus Software. | PowerPoint PPT presentation | free to view

Data Mining in Market Research PowerPoint PPT Presentation

Data Mining in Market Research - Data Mining in Market Research What is data mining? Methods for finding interesting structure in large databases E.g. patterns, prediction rules, unusual cases | PowerPoint PPT presentation | free to view

Spatial-Temporal Data Mining PowerPoint PPT Presentation

Spatial-Temporal Data Mining - Spatial-Temporal Data Mining Wei Wang Data Mining Lab Computer Science Department UCLA Outline Introduction Active Spatial Data Mining Spatial data mining trigger ... | PowerPoint PPT presentation | free to view

Data Mining Principles (required for cw, useful for any project PowerPoint PPT Presentation

Data Mining Principles (required for cw, useful for any project - Data Mining Principles (required for cw, useful for any project ) - a reminder (?) Based on Intro to Data Mining: CRISP-DM Prof Chris Clifton, Purdue Univ | PowerPoint PPT presentation | free to view

ITEC 423 Data Warehousing and Data Mining PowerPoint PPT Presentation

ITEC 423 Data Warehousing and Data Mining - ITEC 423 Data Warehousing and Data Mining Lecture 2 | PowerPoint PPT presentation | free to view

Introduction to Data Mining PowerPoint PPT Presentation

Introduction to Data Mining - Introduction to Data Mining Y cel SAYGIN ysaygin@sabanciuniv.edu http://people.sabanciuniv.edu/~ysaygin/ | PowerPoint PPT presentation | free to view

Data Transformation for Privacy-Preserving Data Mining PowerPoint PPT Presentation

Data Transformation for Privacy-Preserving Data Mining - Privacy-Preserving Data Mining Stanley R. M. Oliveira Database Systems Laboratory Computing Science Department University of Alberta, Canada PhD Thesis - Final ... | PowerPoint PPT presentation | free to view

Intelligent Data Mining PowerPoint PPT Presentation

Intelligent Data Mining - Intelligent Data Mining Ethem Alpayd n Department of Computer Engineering Bo azi i University alpaydin@boun.edu.tr | PowerPoint PPT presentation | free to view

OLAP and Data Mining PowerPoint PPT Presentation

OLAP and Data Mining - Chapter 17 OLAP and Data Mining OLTP Compared With OLAP On Line Transaction Processing OLTP Maintains a database that is an accurate model of some real-world ... | PowerPoint PPT presentation | free to view

Data Mining and privacy Presentation PowerPoint PPT Presentation

Data Mining and privacy Presentation - This is a sample presentation on data mining. The presetation looks at the critical Issues In Data Mining: Privacy, National Security And Personal Liberty Implications Of Data Mining. | PowerPoint PPT presentation | free to view

Data Warehouse dan Data Mining PowerPoint PPT Presentation

Data Warehouse dan Data Mining - Title: Data Warehouse dan Data Mining Last modified by: etik Document presentation format: On-screen Show (4:3) Other titles: Arial Georgia Wingdings 2 Wingdings ... | PowerPoint PPT presentation | free to view

Overview of Web Mining and E-Commerce Data Analytics PowerPoint PPT Presentation

Overview of Web Mining and E-Commerce Data Analytics - What is Data Mining. What do we need? Extract interesting and useful knowledge from the data. Find rules, regularities, irregularities, patterns, constraints | PowerPoint PPT presentation | free to view

Data Mining Engineering PowerPoint PPT Presentation

Data Mining Engineering - Data Mining Peter Brezany Institut f r Softwarewissenschaft Universit t Wien E-mail : brezany@par.univie.ac.at | PowerPoint PPT presentation | free to view

Approach to Data Mining from Algorithm and Computation PowerPoint PPT Presentation

Approach to Data Mining from Algorithm and Computation - Approach to Data Mining from Algorithm and ... graph mining, etc. Modeling ... 2,4 1,3,4 2,3,4 1,2,3,4 frequent Apriori uses long time much memory when ... | PowerPoint PPT presentation | free to view

Data Mining: Concepts and Techniques (2nd ed.) PowerPoint PPT Presentation

Data Mining: Concepts and Techniques (2nd ed.) - Data Mining: Concepts and Techniques (2nd ed.) Chapter 5 Frequent Pattern Mining * * | PowerPoint PPT presentation | free to view

CIS664-Knowledge Discovery and Data Mining PowerPoint PPT Presentation

CIS664-Knowledge Discovery and Data Mining - CIS664-Knowledge Discovery and Data Mining Data Warehousing and OLAP Technology Vasileios Megalooikonomou Dept. of Computer and Information Sciences | PowerPoint PPT presentation | free to view

Difference between Data Warehouse and Data Mining? PowerPoint PPT Presentation

Difference between Data Warehouse and Data Mining? - What exactly is a Data Warehouse? Termed as a special type of database, a Data Warehouse is used for storing large amounts of data, such as analytics, historical, or customer data, which can be leveraged to build large reports and also ensure data mining against it.@ http://maxonlinetraining.com/why-is-data-warehousing-online-training-important/ What is Data mining? The process of extracting valid, previously unknown, comprehensible and actionable information from large databases and using it to make crucial business decisions’ Call us at For any queries, please contact: +1 940 440 8084 / +91 953 383 7156 TODAY to join our Online IT Training course & find out how Max Online Training.com can help you embark on an exciting and lucrative IT career. | PowerPoint PPT presentation | free to view

Online Data Mining Assignment Help with Rapid Miner Program PowerPoint PPT Presentation

Online Data Mining Assignment Help with Rapid Miner Program - Our Data Mining Assignment Writers are online to provide best quality Data Mining and Data Research Assignment Solutions. Upload your Data Mining Assignments now and get Solutions https://myassignmenthelp.com/data-mining-assignment-help.html | PowerPoint PPT presentation | free to view

Online Data Mining Assignment Help -Data Research Assignment Help PowerPoint PPT Presentation

Online Data Mining Assignment Help -Data Research Assignment Help - Our Data Mining Assignment Help providers are online to deliver best quality Data Research Assignment Help Service. Get online help in Data Mining Assignment now https://myassignmenthelp.com/data-mining-assignment-help.html | PowerPoint PPT presentation | free to view

Hire out Data Mining Services to Overseas Data Plus Value PowerPoint PPT Presentation

Hire out Data Mining Services to Overseas Data Plus Value - Data Mining Services and Web research solutions provided, help companies get crucial details for their analysis and promotion techniques. As this procedure needs professionals with a good understanding in online analysis or online research, clients can take advantage of outsourcing their Data Mining, Data extraction and Data Selection services to utilize sources at a very aggressive price. https://www.dataplusvalue.com/data-mining-services-india.html | PowerPoint PPT presentation | free to view

Top Data Mining Software In 2021 PowerPoint PPT Presentation

Top Data Mining Software In 2021 - Data mining software helps businesses and other individuals to derive useful data from a vast collection of raw information in order to identify similarities, patterns, and trends among the data. | PowerPoint PPT presentation | free to view

Features and future trends of a Data mining software PowerPoint PPT Presentation

Features and future trends of a Data mining software - Data mining software helps in extracting useful information from a set of raw data; it will analyze the data and try to identify the hidden patterns. | PowerPoint PPT presentation | free to view

reason for data mining in mtech PowerPoint PPT Presentation

reason for data mining in mtech - The method involved with removing data to distinguish examples, patterns, and valuable information that would permit the business to take the information driven choice from tremendous arrangements of information is called Data Mining. | PowerPoint PPT presentation | free to view