Methods of Secure Computation and Data Integration - PowerPoint PPT Presentation

1 / 14
About This Presentation
Title:

Methods of Secure Computation and Data Integration

Description:

Methods of Secure Computation and Data Integration. Jerome ... Xiaodong Lin, University of Cincinnati. Ashish Sanil, Bristol Myers Squibb. General setting ... – PowerPoint PPT presentation

Number of Views:23
Avg rating:3.0/5.0
Slides: 15
Provided by: une74
Learn more at: https://unece.org
Category:

less

Transcript and Presenter's Notes

Title: Methods of Secure Computation and Data Integration


1
Methods of Secure Computation and Data Integration
  • Jerome Reiter, Duke University
  • Alan Karr, NISS
  • Xiaodong Lin, University of Cincinnati
  • Ashish Sanil, Bristol Myers Squibb

2
General setting
  • Multiple agencies seek to improve analyses by
    pooling their data.
  • Do not want to reveal individual data values
    unknown to other agencies.
  • Want accurate results from pooling procedures.

3
Pooling situations
  • Horizontally PartitionedAgencies have different
    records but same variables.
  • Purely Vertically PartitionedAgencies have same
    records but different variables.
  • Partially Overlapping, Vertically
    PartitionedAgencies have different records and
    different variables, with some common records and
    variables.

4
Horizontal partitioningKarr, Lin, Sanil, Reiter
(JCGS, 2005)
  • Secure data integration-- shares data but
    protects sources.-- allows any analysis to be
    done.
  • Secure summation-- shares sums without sharing
    data -- allows regressions, association rules,
    classifications, clustering

5
Secure summation
  • Obtain without sharing individual
    values
  • Agency A passes (x R) to 2nd agency.
  • Agency B adds its x to this value and passes sum
    to Agency C.
  • Process continues until all agencies have added
    their x.
  • Agency A subtracts R from the sum.

6
Purely vertical partitioning
  • Secure dot/matrix product-- shares dot/matrix
    products without sharing data.-- allows
    regressions, association rules, classification,
    clustering.-- assumes semi-honest.
  • Synthetic data approaches-- share synthetic
    copies of data across agencies.-- allows any
    analysis when distributions used to generate
    data are accurate.-- generates public use data
    file.

7
Secure dot/matrix productsKarr, Lin, Reiter,
Sanil (NISS tech. report)
  • Compute not revealing individual
    values
  • Agency A passes where
    for all i,j to Agency B.
  • Agency B sends to Agency A.
  • Agency A computes

8
Purely vertical partitioning
  • Secure dot/matrix product-- share dot/matrix
    products without sharing data.-- allows
    regressions, association rules, classification,
    clustering.-- assumes semi-honest.
  • Synthetic data approaches-- share synthetic
    copies of data across agencies.-- allows any
    analysis when distributions used to generate
    data are accurate.-- generates public use data
    file.

9
Synthetic data approachKohnen (PhD thesis, 2005)
  • Assume X not sensitive.
  • Pass real X to Agency B.
  • Agency B simulates multiple copies of Y for from
    f(YX) estimated using the dataset from Agency A.
    Pass the copies to Agency A.

10
Synthetic data approachKohnen (PhD thesis, 2005)
  • Agency A uses partially synthetic data methods
    (Reiter, Surv. Meth., 2003) for inferences based
    on YX.
  • Agency A can release fully synthetic data to
    public.

11
Synthetic data approachesKohnen (PhD thesis,
2005)
  1. Agency A simulates disguiser X that look like the
    genuine values of X, ideally from distribution
    close to f(XY). Pass real X and disguisers to
    Agency B.
  2. Agency B simulates multiple copies of Y for each
    f(YX) estimated using the datasets from Agency
    A. Pass the copies to Agency A.

12
Synthetic data approachesKohnen (PhD thesis,
2005)
  • Agency A discards disguisers and uses partially
    synthetic data methods (Reiter, Surv. Meth.,
    2003) to obtain inferences using the real X.
  • Agency A can release fully synthetic data to
    public.

13
Partially overlapping, vertical partitioning
  • Secure EM algorithm-- uses secure dot
    products-- continuous data estimate
    covariance matrix for multivariate normal
    data-- categorical data estimate parameters
    of log-linear models

14
Limitations of methodsDefining a research agenda
  • Secure computation methods- How to specify
    models without viewing data?- What if
    sophisticated models needed?- How to do
    posterior simulation?
  • Synthetic data methods- How to generate good
    disguisers?
  • All methods- How to incorporate matching
    errors, differences in data quality and
    definitions?- How to account for disclosure
    risks from models that fit too well?
Write a Comment
User Comments (0)
About PowerShow.com