Methods of Secure Computation and Data Integration

About This Presentation

Title:

Description:

Number of Views:23

Avg rating:3.0/5.0

Slides: 15

Provided by: une74

Learn more at: https://unece.org

Category:

more less

Transcript and Presenter's Notes

Title: Methods of Secure Computation and Data Integration

1
Methods of Secure Computation and Data Integration

2
General setting

3
Pooling situations

Horizontally PartitionedAgencies have different
records but same variables.
Purely Vertically PartitionedAgencies have same
records but different variables.
Partially Overlapping, Vertically
PartitionedAgencies have different records and
different variables, with some common records and
variables.

4
Horizontal partitioningKarr, Lin, Sanil, Reiter
(JCGS, 2005)

Secure data integration-- shares data but
protects sources.-- allows any analysis to be
done.
Secure summation-- shares sums without sharing
data -- allows regressions, association rules,
classifications, clustering

5
Secure summation

6
Purely vertical partitioning

Secure dot/matrix product-- shares dot/matrix
products without sharing data.-- allows
regressions, association rules, classification,
clustering.-- assumes semi-honest.
Synthetic data approaches-- share synthetic
copies of data across agencies.-- allows any
analysis when distributions used to generate
data are accurate.-- generates public use data
file.

7
Secure dot/matrix productsKarr, Lin, Reiter,
Sanil (NISS tech. report)

8
Purely vertical partitioning

Secure dot/matrix product-- share dot/matrix
products without sharing data.-- allows
regressions, association rules, classification,
clustering.-- assumes semi-honest.
Synthetic data approaches-- share synthetic
copies of data across agencies.-- allows any
analysis when distributions used to generate
data are accurate.-- generates public use data
file.

9
Synthetic data approachKohnen (PhD thesis, 2005)

Assume X not sensitive.
Pass real X to Agency B.
Agency B simulates multiple copies of Y for from
f(YX) estimated using the dataset from Agency A.
Pass the copies to Agency A.

10
Synthetic data approachKohnen (PhD thesis, 2005)

Agency A uses partially synthetic data methods
(Reiter, Surv. Meth., 2003) for inferences based
on YX.
Agency A can release fully synthetic data to
public.

11
Synthetic data approachesKohnen (PhD thesis,
2005)

Agency A simulates disguiser X that look like the
genuine values of X, ideally from distribution
close to f(XY). Pass real X and disguisers to
Agency B.
Agency B simulates multiple copies of Y for each
f(YX) estimated using the datasets from Agency
A. Pass the copies to Agency A.

12
Synthetic data approachesKohnen (PhD thesis,
2005)

Agency A discards disguisers and uses partially
synthetic data methods (Reiter, Surv. Meth.,
2003) to obtain inferences using the real X.
Agency A can release fully synthetic data to
public.

13
Partially overlapping, vertical partitioning

Secure EM algorithm-- uses secure dot
products-- continuous data estimate
covariance matrix for multivariate normal
data-- categorical data estimate parameters
of log-linear models

14
Limitations of methodsDefining a research agenda

Secure computation methods- How to specify
models without viewing data?- What if
sophisticated models needed?- How to do
posterior simulation?
Synthetic data methods- How to generate good
disguisers?
All methods- How to incorporate matching
errors, differences in data quality and
definitions?- How to account for disclosure
risks from models that fit too well?