Privacy Preserving Database Application Testing - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

Privacy Preserving Database Application Testing

Description:

Xintao Wu, Yongge Wang, Yuliang Zheng, UNC Charlotte. Demo. 2. Overview. Milestone. Initial investigation from May 2002 to Dec 2002 ... – PowerPoint PPT presentation

Number of Views:34
Avg rating:3.0/5.0
Slides: 19
Provided by: sisU5
Category:

less

Transcript and Presenter's Notes

Title: Privacy Preserving Database Application Testing


1
Privacy Preserving Database Application Testing
  • Xintao Wu, Yongge Wang, Yuliang Zheng,
  • UNC Charlotte

2
Overview
  • Milestone
  • Initial investigation from May 2002 to Dec 2002
  • Official starting from Sept 2003 and being
    supported by NSF CCR-0310974 ( 200k, Sept 2003
    August 2005)
  • The prototype system was finished April 2005.
    Developed using C, Oracle with 22K lines of
    source code
  • Demo at several Banks, May 2005
  • Personnel
  • Faculty Xintao Wu, Yongge Wang, Yuliang Zheng
  • Current graduate students Songtao Guo, Ying Wu,
    Chintan Sanghvi, Guodong Jiao
  • Previous graduate students Jing Jin, Amol Kedar
  • Several senior undergraduate students
  • More Info
  • http//www.cs.uncc.edu/xwu/privacy
  • xwu_at_uncc.edu

3
Motivation
  • To generate synthetic data for DB application
    testing, especially performance testing.
  • Many applications are involving large-scale
    databases with sensitive information.
  • Complete testing is essential for database
    applications to function correctly and to provide
    acceptable performance.

4
Our Approach
  • To generate synthetic databases based on a-priori
    knowledge about the current production databases
  • The needed a-priori knowledge is generally
    available from ER, DDL, Data Dictionary with
    schema, data integrity rules as well as basic
    statistical information
  • Can extract detailed statistical information if
    original data or samples from production database
    are available
  • The data can be either realistic amounts or any
    amounts
  • Better controllability, observability, and
    privacy

5
Three Characteristics of Synthetic Data
  • Valid
  • The synthetic data need to satisfy all the same
    constraints and business rules as the live data
  • Necessary for functional testing
  • Privacy preserving
  • No disclosure of any confidential information
    that need to be protected
  • Resembling to real data
  • The synthetic data need to have the similar
    statistical distributions or patterns as the live
    data
  • Necessary for performance testing as the
    statistical nature of the data determines query
    performance

We will show if data distributions are not
similar, the execution time of the same workload
may be totally different.
6
Architecture
ER
DDL
Data
Catalog
R
NR
S
Schema Domain Filter
Disclosure Assessment
Performance Assessment
Schema
Domain
General Location Model
Data Generator
Synthetic database
7
Building a Project
8
Data Dictionary Information
9
Statistical Information Extraction Basic
10
Statistical Information Extraction Advance
11
Generating Meta Data File
12
Generating Confidential File
13
Disclosure Analysis - Categorical
14
Numerical Disclosure Basic Batch Mode
15
Numerical Disclosure Basic Single Mode
16
Creating Final Categorical File
17
Creating Final Rule File (GLM Format)
18
Generating Data
Write a Comment
User Comments (0)
About PowerShow.com