Title: Authentic Publication The TRUTHSAYER Project
1Authentic PublicationThe TRUTHSAYER Project
- Chip Martel
- Premkumar Devanbu
- Michael Gertz
- April Kwong
- Glen Nuckolls
- Stuart Stubblebine
- Department of Computer Science,
- University of California, Davis
- http//truthsayer.cs.ucdavis.edu
2Databases Play a Vital Role
- Commerce credit card data, find goods
- Financial Investment sites
- Health treatments, doctors/credentials, drugs
- Many more
3Answering queries
4Goals
- Correct and complete answers (with assurance)
- Efficient Protocols
5Example Queries
- Is Credit card number 5543 Valid?
- List all Hong Kong to San Francisco flights.
- Find Digital cameras with 3-5 Mega-pixels, and
cost - List all bars within one mile of HKU
6What is a Correct Answer?
- We assume a trusted Data Owner with the official
copy of the Database Defines the correct
answer -
7What is a Correct Answer?
- We assume a trusted Data Owner with the official
copy of the Database Defines the correct
answer - Problems with a single Data Owner
- 1) May not want/be able to answer queries
- 2) Hard to keep online DB secure
- 3) Scalability
-
8Solution Third-Party Servers
- Third party sites (Publishers) get information
from the Data Owner and answer queries - Example Travel sites (Expedia, Travelocity,
Orbitz) answer using government airline Data
(FAA)
9Server Replication
Can ITrustThis Server?
Travelocity
FAA
Expedia
Data
Orbitz
10Trust Issues
- Sites have left out cheaper flights from
non-preferred airlines (deliberate) - Sites may be corrupted outside hacker or insider
- Errors
11Authentic Publication The TRUTHSAYER project.
Initially for RDB (DBSEC 2000, Jnl. Comp.
Sec.)General Model for a Variety of Data
(Algorithmica, 2004)
Owner
Publisher
Answer Verification Object
12Talk Outline
- Introduction
- Background--- Merkle Trees
- Range Queries (Multi-attribute Queries)
- A General Model for Authenticated Data Structures
- Conclusion
13Authentic Publication
- A trusted Owner digests the Data Set, and signs
it. - Untrusted Publishers receive the data
signature. - Clients submit queries to untrusted Publishers.
- Publishers return Answers (A), and Verification
Objects (A VO) - Clients use A VO to Prove the answer is
correct/complete. - Protocol is correct, and secure.
14Verifying answers
-
- Protocol provides
- Correctness Returns exact elements matching the
query. - Completeness Returns all elements matching
query. - Security Cheating is infeasible.
- Efficiency Overhead is low.
- Recall No signatures!!
15Merkle hashing a data set.
- Leaves data in some lexical order.
- One way hash function h h1 h(d1)
- Bottom-up hashing, starting with data
- Root hash value the digest of the data set.
16Merkle Trees
- Classic use prove that data value d is in the
data set - Solves Is Credit card number 5543 Valid?
- But also can verify all items in a range e.g.
camcorders from 400 to 900
17Verifying a Range
- To Show that q (5,6,8) is the Answer to 4
Used Lower Bound 3, Upper Bound 10 and starred
hash values to compute/verify root hash.
18Verifying a Range
- Query 4
- Answer 5,6,8 (in practice, key data)
Verification Object ( (h(1),3), (5,6) ) (
(8,10), )
19Authentic Publication
Hash Digest
Merkle Tree
20Security Property
- If the Answer and VO are correct, user accepts
21Security Property
- User accepts an Invalid answer only if a
specific collision in h is found (provable) - h(x,y) z in a correct VO (x,y, z are the hash
values of tree nodes), - VO uses different x, y with h(x,y)z
22Good Features
- Proofs are short (size proportional to tree
height and answer size). - Use hashes, a fast cryptographic operation
- Proofs as easy to compute as finding the answer
- No secret keys hash function and digests all are
public (no insider attack once data set is
digested).
23Extensions
- Want to handle more complex queries
- Find Digital cameras with 3-5 Mega pixels, and
cost - List all bars within one mile of HKU
24Multi-Attribute Queries
- Model as a 2-D Range query
- Find points (x,y) with a
- c
(b,d)
(a,d)
Pixels
(a,c)
(b,c)
Cost
252-Dimensional range tree
- Leaves are 2D points, or 2 attributes (cost,
pixels). Sorted by x-value in X-tree - A Y-tree for each internal node
26Searching a 2D-range Tree
- Find (x,y) with 4
- All in Associated Y-trees Match x-range
27Searching a 2D-range Tree
- Find pairs (x,y) with 4
- In X-tree subtrees rooted at 5 and 13
- Search in Associated Y-trees
28Searching a 2D-range Tree
- Find (x,y) with 4
- Answer (12,5) and (23,8) AND values in 5s
Y-tree
29Digesting a 2D-range Tree
- Digest each Y-tree as Merkle tree
- Each internal node in the X-tree gets the hash of
three values two children and associated Y-tree
value
30Range Trees
- Let k be the number of answers (out of n)
- Search O(k log2n) time, nlogn space
- improve to O(k logn) time with extra
pointers (can still get a hash digest) - VO (proof) size also O(klogn)
- Extend to d-dimensions (d-attribute query).
Search time O(klog(d-1) n), VO size same.
31Authenticated Data Structures
- Problem May want to use a variety of efficient
data-structures - B-trees (reduce disk access)
- Suffix arrays (string queries)
- Geometric data structures (items within one mile)
- Many more
32Authenticated Data Structures
- Solution General method to digest a data
structure (produce a single summary hash value). - Efficient Proof size and construction time
search time. - Secure Similar security property break only
with a specific collision in h
33Search DAGS
- Our general setting is any data structure modeled
by - A labeled Directed Acyclic Graph (DAG)
- A search process that visits DAG nodes and
determines which neighboring nodes to visit next
(based on labels of visited nodes) - This Models a wide range of structures
34A Search DAG
- Search starts at the unique source node s of
in-degree zero - Digesting starts from the sinks (here u, v )
hash the associated values
s
b
c
a
v
u
35A Search DAG
- D(u) Digest of u
- Node u data du
- D(u) h(du)
- D(v) h(dv)
s
b
c
a
v
u
36A Search DAG
- Other Digests use data and successors
- D(c) h(dc, D(v) )
- D(b)h(db,D(v),D(c))
- D(s) is DAG Digest
s
b
c
a
v
u
37Verification for Search DAG
- Traditional Merkle Tree verification is Bottom up
(hash path values to root) - We use top down verification to simulate a
correct search - Owner provides search procedure P and root digest
D(s)
38Authentic Publication
D(s), P
DAG, P
39Verification Object for DAG
- VO information so User can reproduce the search
(and thus verify answers) - Lines of VO match steps of P
- Data of a node and successor hashes
- ds, D(v1), D(v2) (successors of s)
- dv1 , D(u1), D(u2), (successors of v1)
40An Example Search
- Starts at s, then visits b then v
- VO
- ds, D(a), D(b), D(c) (line 1)
- D(s) h(ds, D(a), D(b), D(c))
- So know data ds is OK.
s
b
c
a
v
u
41An Example Search
- Starts at s, process ds and decide b is next
- VO
- ds, D(a), D(b), D(c) line 1
- db, D(v), D(c) line 2
- If D(b)h(db,D(v),D(c))
- (using D(b) from line 1)
- Data db is correct
s
b
c
a
v
u
42Verified Search
- The verified computation proceeds until all nodes
in the actual search are visited (the VO has one
line for each node visited). - The correct answer is now returned by search
procedure P.
43Verified Search
- The verified computation takes time proportional
to the original search (visits the same nodes). - Security Proof shows that a User accepts the
wrong answer only if a specific collision in hash
function h used (e.g. D(b)h(db,D(v),D(c))
44Updates
- Typically Digests are updated with work similar
to the data structures update time (e.g. length
of the search paths to updated items) - If updates are frequent, overall scheme doesnt
work well (can use time-stamped digests)
45Generalizations
- Allowing multiple Owners often want to query
data collected from several owners. Can be done,
but now need to trust owners and data collector. - Privacy VOs may reveal information about about
the data set. Methods to conceal extra data.
46Generalizations
- I/O efficient digests/VOs can use a multi-way
tree to store multiple values in one disk block
(still logically a binary tree for VO purposes,
but stored more efficiently). - Top-down search DAG approach may be improved for
specific data-structures (e.g. 2D range trees)
47Generalizations
- Collections of structured data XML documents
(can answer path queries) - Relational operations (Joins, Selection,
Projection) - Fancier Crypto operations (to reduce VO size)
48References
- P. Devanbu, M. Gertz, C. Martel, and S.
- G. Stubblebine. Authentic Third Party
- Data Publication, 14th IFIP 11.3 Working Conf. in
DB Security (DBSec 2000), - Original Authentic Publication Paper
- A General Model for Authenticated Data
Structures, Algorithmica, 2004 - Many Data Structures and Search DAG ( above
group and G. Nuckolls)
49References
- Certifying Data from Multiple Sources,
Proceedings of the 17th Database Security
Conference, 2003 - Shows how to use multiple Owners
- Flexible authentication of XML documents,
Journal Computer Security, 2004
50Survey Chapters
- Li, Hadjieleftheriou, Kollios, Reyzin
- Authenticated Index Structures for Outsourced
Databases(Overview of area and efficiency issues) - R. Sion Towards Secure Data Outsourcing
- Both in Michael Gertz and Sushil Jajodia (eds.)
"Handbook of Database Security Applications and
Trends", Springer, 2007, to appear.
51- Anagnostopoulos, M. Goodrich, R. Tamassia,
- Persistent Authenticated Dictionaries and Their
Applications (allows queries of prior DB
versions) - Authenticated Data Structures for Graph and
Geometric Searching (fancy geometric data
structures)
52Pointer for more information
http//truthsayer.cs.ucdavis.edu
53Conclusion
- A single signed Digest, can authenticate answers
to many queries - Secure against hackers and insiders
- Can handle a wide range of data structures
- Efficient protocols fast query processing and
small VOs
54Future Work
- Better Update Mechanisms
- Integration of Database optimization methods
- Actual implementation (partly done by others),
and evaluation