Statistical database security - PowerPoint PPT Presentation

About This Presentation
Title:

Statistical database security

Description:

The query q is only allowed if all 2m implied query sets fall in the allowable ... Queries are only allowed on such groups, thus forbidding arbitrary sets. ... – PowerPoint PPT presentation

Number of Views:498
Avg rating:3.0/5.0
Slides: 13
Provided by: PaulD109
Category:

less

Transcript and Presenter's Notes

Title: Statistical database security


1
Statistical database security
  • Special purpose used only for statistical
    computations.
  • General purpose used with normal queries (and
    updates) as well as statistical ones.
  • Main problem achievment of compromise between
    the privacy needs of individuals and the right of
    organizations to know and process information
    preventing statistical inference.

2
Statistical database security
  • Issues
  • Characteristics of the SDB to be protected
    Is the database on-line (i.e. queries executed
    immediately) or off-line (queries executed
    later)? Is the SDB static (no updates) or
    dynamic?
  • Additional knowledge of users depending on the
    knowledge of a user it is easier or more
    difficult for the user to perform inference.
  • Types of attacks developer needs to know the
    type of inference attacks potential snoopers will
    use.

3
Inference protection techniques
  • Conceptual techniques definition of populations,
    partitioning.
  • Restriction-based techniques restrict the type
    of queries that may be asked or the kind of
    result that may be obtained.
  • Perturbation-based techniques distort the data
    so that the statistical results are still correct
    but possibly inferred data are incorrect.

4
Inference protection techniques
  • Conceptual techniques
  • The lattice model a lattice can be built for
    combinations of conditions on attributes. The
    n-respondent k-dominance criterion says a
    statistic is sensitive if n or fewer records
    represent more than k of the population.
  • Conceptual partitioning populations are defined
    at a semantic level. (e.g. male employees in a
    department.)

5
Inference protection techniques
  • Restriction-based techniques
  • Query-set size control a statistic query q(C) is
    permitted only if its query set X(C) satisfies
  • k ? X(C) ? N k
  • (N is the number of SDB record and k ? 0 is a
    fixed parameter.)
  • This prevents simple attacks based on very small
    or very large query sets.
  • It does not prevent more sophisticated attacks
    using trackers, general trackers, double trackers
    and union trackers.

6
Inference protection techniques
  • Restriction-based techniques
  • Expanded query-set size control given query
    q(Aa and Bb and and Cc) there are 2m implied
    query sets where m is the number of parts in the
    query
  • q(ltXagt Aa and ltXbgt Bb and and ltXcgt Cc)
    where Xi is either or not.
  • The query q is only allowed if all 2m implied
    query sets fall in the allowable range k, N
    k.
  • This technique becomes very expensive for large
    values of m.

7
Inference protection techniques
  • Restriction-based techniques
  • Query-set overlap control check the overlap in
    query sets of successive queries against the
    number of common records they have. Query q(C) is
    permitted only if
  • X(C) ? X(D) ? ?, ? gt 0
  • thus, the number of common records between query
    set of q(C) and the query set of all the query
    sets q(D) of all earlier released queries is not
    more than ?.

8
Inference protection techniques
  • Restriction-based techniques
  • Audit-based controls while query-overlap control
    may not be very effective at preventing
    inference, it is possible to detect attempts at
    such inference by observing audit-trails of
    successive queries (by the same user or by a
    group).
  • Techniques based on number of attributes the
    DBA determines that statistical queries involving
    more than d attributes are not permissible.

9
Inference protection techniques
  • Restriction-based techniques
  • Partitioning the population is divided into
    small disjoint subgroups (and population of 1 is
    not allowed). Queries are only allowed on such
    groups, thus forbidding arbitrary sets.
  • Cell suppression like with partitioning, but all
    cells which satisfy the n-respondent
    k-dominance rule are considered sensitive and
    cannot be examined.

10
Inference protection techniques
  • Perturbation-based techniques
  • Record-based perturbation the records in the
    database are distorted before applying the
    statistics.
  • Result-based perturbation the correct result is
    distorted before releasing it.
  • The difference between the true value and the
    released value of a statistic is called bias.
  • Perturbed statistics must be consistent, i.e.
    free of paradoxes. Whatever the bias, the results
    should be possible.

11
Inference protection techniques
  • Perturbation-based techniques
  • Data swapping attribute values between the
    records of the original SDB are exchanged in such
    a way that the resulting modified SDB has no
    records in common with the original SDB.
  • Random-sample queries the actual query set is
    replaced by a random sampled query set. This only
    works if the query sets are large enough,
    otherwise attacks based on small-size query sets
    become possible.

12
Inference protection techniques
  • Perturbation-based techniques
  • Fixed perturbation the values of the attributes
    used in the computation of statistics are
    modified in a fixed way (does not vary from query
    to query). This fixed way eliminates the risk of
    improving the estimates by repeating a query.
  • Query-based perturbation The perturbation is
    different for different queries.
  • Rounding The result of a statistical query is
    rounded before being released. There is
    systematic, random and controlled rounding.
Write a Comment
User Comments (0)
About PowerShow.com