Title: Inference Problem Privacy Preserving Data Mining
 1Inference ProblemPrivacy Preserving Data Mining 
 2Readings and Assignments
- Required 
- Pfleeger Chapter 6.5 
- Interesting reading 
- I. Moskowitz, M. H. Kang Covert Channels  Here 
 to Stay? http//citeseer.nj.nec.com/cache/papers/c
 s/1340/httpzSzzSzwww.itd.nrl.navy.milzSzITDzSz554
 0zSzpublicationszSzCHACSzSz1994zSz1994moskowitz-co
 mpass.pdf/moskowitz94covert.pdf
- Jajodia, Meadows Inference Problems in 
 Multilevel Secure Database Management Systems
 http//www.acsac.org/secshelf/book001/book001.html
 , essay 24
3Indirect Information Flow Channels
- Covert channels 
- Inference channels 
4Communication Channels
- Overt Channel designed into a system and 
 documented in the user's manual
- Covert Channel not documented. Covert channels 
 may be deliberately inserted into a system, but
 most such channels are accidents of the system
 design.
5Covert Channel
- Timing Channel based on system times 
- Storage channels not time related communication 
- Can be turned into each other
6Inference Channels
- Non-sensitive 
- information
Sensitive Information
Meta-data 
 7Inference Channels
- Statistical Database Inferences 
- General Purpose Database Inferences
8Statistical Databases
- Goal provide aggregate information about groups 
 of individuals
- E.g., average grade point of students 
- Security risk specific information about a 
 particular individual
- E.g., grade point of student John Smith 
- Meta-data 
- Working knowledge about the attributes 
- Supplementary knowledge (not stored in database)
9Types of Statistics
- Macro-statistics collections of related 
 statistics presented in 2-dimensional tables
- Micro-statistics Individual data records used 
 for statistics after identifying information is
 removed
Sex\Year 1997 1998 Sum
Female 4 1 5
Male 6 13 19
Sum 10 14 24
Sex Course GPA Year
F CSCE 590 3.5 2000
M CSCE 590 3.0 2000
F CSCE 790 4.0 2001 
 10Statistical Compromise
- Exact compromise find exact value of an 
 attribute of an individual (e.g., John Smiths
 GPA is 3.8)
- Partial compromise find an estimate of an 
 attribute value corresponding to an individual
 (e.g., John Smiths GPA is between 3.5 and 4.0)
11Methods of Attacks and Protection
- Small/Large Query Set Attack 
- C characteristic formula that identifies groups 
 of individuals
- If C identifies a single individual I, e.g., 
 count(C)  1
- Find out existence of property 
- If count(C and D)1 means I has property D 
- If count(C and D)0 means I does not have D 
- OR 
- Find value of property 
- Sum(C, D), gives value of D 
12Small/Large Query Set Attack cont.
- Protection from small/large query set attack 
 query-set-size control
- A query q(C) is permitted only if 
-  N-n ? C ? n , where n ? 0 is a parameter of 
 the database and N is all the records in the
 database
13Tracker attack
q(C) is disallowed
CC1 and C2 TC1 and C2
Tracker
C
C2
C1
q(C)q(C1)  q(T) 
 14Tracker attack
q(C and D) is disallowed
CC1 and C2 TC1 and C2
C
Tracker
C2
C1
C and D
q(C and D) q(T or C and D)  q(T)
D 
 15Query overlap attack
Q(John)q(C1)-q(C2)
C1
C2
Kathy
Paul
John
Eve
Max
Fred
Mitch
Protection query-overlap control 
 16Insertion/Deletion Attack
- Observing changes overtime 
- q1q(C) 
- insert(i) 
- q2q(C) 
- q(i)q2-q1 
- Protection insertion/deletion performed as pairs
17Statistical Inference Theory
- Give unlimited number of statistics and correct 
 statistical answers, all statistical databases
 can be compromised (Ullman)
18Inferences in General-Purpose Databases
- Queries based on sensitive data 
- Inference via database constraints 
- Inferences via updates
19Queries based on sensitive data
- Sensitive information is used in selection 
 condition but not returned to the user.
- Example Salary secret, Name public 
-  ?Name?Salary25,000 
- Protection apply query of database views at 
 different security levels
20Database Constraints
- Integrity constraints 
- Database dependencies 
- Key integrity 
21Integrity Constraints
- CAB 
-  Apublic, Cpublic, and Bsecret 
- B can be calculated from A and C, i.e., secret 
 information can be calculated from public data
22Database Dependencies
- Metadata 
- Functional dependencies 
- Multi-valued dependencies 
- Join dependencies 
- etc. 
23Functional Dependency
- FD A ? B, that is for any two tuples in the 
 relation, if they have the same value for A, they
 must have the same value for B.
- Example FD Rank ? Salary 
-  Secret information Name and Salary together 
- Query1 Name and Rank 
- Query2 Rank and Salary 
- Combine answers for query1 and 2 to reveal Name 
 and Salary together
24Key integrity
- Every tuple in the relation have a unique key 
- Users at different levels, see different versions 
 of the database
- Users might attempt to update data that is not 
 visible for them
25Example
Secret View
Name (key) Salary Address
Black P 38,000 P Columbia S 
Red S 42,000 S Irmo S
Public View
Name (key) Salary Address
Black P 38,000 P Null P 
 26Updates
Public User
Name (key) Salary Address
Black P 38,000 P Null P 
- Update Blacks address to Orlando 
- Add new tuple (Red, 22,000, Manassas) 
- If 
- Refuse update covert channel 
- Allow update 
- Overwrite high data  may be incorrect 
- Create new tuple  which data it correct 
-  (polyinstantiation)  violate key constraints
27Updates
Secret user
Name (key) Salary Address
Black P 38,000 P Columbia S 
Red S 42,000 S Irmo S
- Update Blacks salary to 45,000 
- If 
- Refuse update denial of service 
- Allow update 
- Overwrite low data  covert channel 
- Create new tuple  which data it correct 
-  (polyinstantiation)  violate key constraints
28Inference Problem
- No general technique is available to solve the 
 problem
- Need assurance of protection 
- Hard to incorporate outside knowledge
29The Inference Problem
- General Purpose Database 
- Non-confidential data  Metadata ? 
-  Undesired Inferences 
- Web Enabled Data 
- Non-confidential data  Metadata (data and 
 application semantics)  Computational Power
 Connectivity ? Undesired Inferences
30Correlated Inference
 Object. waterSource  Object 
 basin  waterSource place  Object 
 district  place address  place 
base  Object fort  base 
Base 
Place 
base
Public 
Public 
Water source
Water Source 
 31Inference Control
Access Control
Confidential
Public
X
Misinfo
Organizational Data
Attacker
X 
 32Inference Control
Confidential
Public
Misinfo
Organizational Data
- ACCESS and INFERENCE CONTROL POLICY 
- Logic-based inference detection 
- Exact and partial disclosure 
- Data and metadata protection 
- Heterogeneous data manipulation 
- Metadata discovery
33Data Mining and Privacy
- Statistical inference 
- K-anonymity 
- Correlation 
- General inference 
- Pattern ? metadata 
- Biased learning
34Next Class