Title: Group Outbrief: Data Replication
1Group Outbrief Data Replication
- Ken Birman, John Connolly, Dan Geer, Barbara
Liskov, Peng Liu, Mike Reiter ( others)
2Need for a layered process
NSF long term program
- NSF runs mix of long and short-term research
projects - But industry expects concretely applicable
solutions (ideally, in COTS products) - Suggests needs for a multi-stage process
NSF near term program
Industry research partners demonstrate value
Actual use, COTS uptake
3A problem with multiple dimensions
- Big/small institutions
- The big ones already run many data centers and
have huge in-house capabilities - Small ones may outsource, limiting their options
- Focus on cost-cutting and productivity
- Technology highly appealing if it also cuts costs
- But some CIP issues transcend cost
4Problems and non-problems
- Our challenge?
- Offer replication technologies that both enhance
CIP and offer other measurable cost benefits - Also consider new capabilities that improve user
productivity and competitiveness - Not every research product will reach industry
overnight. The pipeline can be slow!
5An old problem with new facets
- Massive scale of financial data centers poses
challenges but also opportunities - Desire hot standbys many ramifications
- Degree of consistency needed must be studied
perhaps spectrum of requirements - Need to repair and recovery as quickly as
possible after a major disruption - These are systems of systems huge numbers of
components running concurrently
6The nature of data is changing
- Data rates are growing rapidly
- A range of consistency requirements
- Big transactions have stringent requirements
- For other purposes, weaker needs apply
- Might use AI techniques to recognize anomalies
and perhaps event to repair in some cases - Archival issues pose a range of concerns
- Data decay has become a worry
- Need to store data for years yet to be sure of
deletion of data later
7Scalability poses questions of its own
- Massive scale of financial data centers poses
challenges but also opportunities - High data rates stress solutions high latency
relative to throughput also a new issue - Diversity of transactions
- Some transactions are of critical value, others
less so criteria for a good solution will vary
(extreme reliability vs throughput) - Some transactions arent even directly
represented as such (for example, publication of
the 10-year bond rate a transaction but not of
a conventional sort also update to application
state) - Even to say the data is risky we need a
spectrum of options for a spectrum of uses
8Organization of research agenda items
- Traverse the stack bottom up
- Data transport
- Core replication technologies
- Systems-level issues
- Software engineering tools
- Higher level systems of systems issues
9Research in data transport
- Explore ways of connecting data centers to
extremely high-bandwidth networks - E.g. lambda rail 32 x 10Gb (later 40Gb)
ethernets - Example of a high value testbed
- In fact many testbeds are needed (but thats
another topic) - But latencies are (relatively) high
- TCP wont work in such settings
- Solution should benefit many communities
- Research topic Develop new protocols that can
handle these torrents of data - May need to encrypt or compress data on the fly
- End-to-end correctness criteria needs to be
revisited - Could BFT be dropped to a low level in some
way? - Must imagine clusters at both ends of any link
10Scalability of replication
- Scalability of replication technologies
- Classical deterministic schemes with huge numbers
of groups, overlapping groups, huge groups, or
groups with some members far away - Scalable BFT? Other hardened schemes?
- Probabilistic techniques advantages/limitations.
Programming with probabilistically consistent
substrates - Opportunity to use innovative techniques to
detect and heal inconsistency in massively
replicated systems, or even across systems
11Replication with barriers
- Institutional barriers have proliferated
- E.g. trading/investment banking, research/private
client investment - These create logical barriers visible at the
replication layer and offer a model of certain
kinds of insider threats - Scenario creates new research challenges
replication aware of barriers - Need a representation of business rules that can
be used to infer constraints
12Theory of replication
- Even what we know needs to be re-examined!
- Can groups be oblivious to one-another, or must
some form of system-wide consensus be employed to
obtain consistency? - Are there inherently non-scalable properties, or
just implementations? - When should a large-scale system use
probabilistic techniques and tools vs.
deterministic ones? - What issues arise in systems of systems?
13Transactions and BFT
- Database transactions widely used with replicated
data - Can we adapt transactions to backup data
centers? - Move to asynchronous transaction semantics?
- We also know about Byzantine fault tolerance in
servers. But what about Byzantine clients? - Protection against seemingly legitimate clients
who seek to misuse API to disrupt a system - Explore extension of Byzantine model to cover the
full spectrum of issues - Doing this would have the added benefit of
protection against many kinds of accidents
14Software Engineering Options
- Could a compiler automate the creation and
management of checkpoints, for transmission to a
remote backup site? - How can data replication be integrated into
programming environments, like .NETs CLR? - Type checking issues, optimization challenges,
correct presentation of replication technology - Must seamlessly equip developers with tools to
build better, less complex, systems. - For example, aspect oriented programming is
yielding productivity benefits.
15Virtualization of execution env.
- Use virtualization to enhance functional
replication opportunities (and security) - A virtualized system can more easily be
reconstituted after a major disruption - Data replication is the key to making this work
- Also offers potential for sandboxing, containing
many kinds of security breaches - Trading room in a box offers intuition into the
goal here. Open the box, turn the key
16Higher level systems issues
- Major institutions run systems that talk to
counterparts in other systems - Need attention to rule/policy representation,
composition, reconciliation - Concerns about consistency of this type of data
- Conjunction of firewalls, barriers with need to
communication - Aggressive replication brings new risks,
vulnerability associated with rapid change
17Monitoring, management, control
- Develop new methods for managing and monitoring
and controlling systems - Goal would be to automate what is now manual
- but simultaneously to facilitate automated
regeneration of lost capabilities in the event of
an outage or attack - Industry gains lower cost of ownership
- CIP gains a lever for automating robustness
18Replication and outsourcing
- Smaller institutions outsource data warehousing
and processing needs - Hence they would benefit from anything that a
large institution requires - Converse problem concentration risk due to
outsourced functionality, shared infrastructure,
service providers who manage to corner some
critical role
19Summary?
- Industry knows a lot about replication and uses
replication extensively - Yet despite this knowledge, there is a great deal
that we dont know - Solving such problems offers us a chance to
reduce cost of ownership for data centers - App development, Q/A, production, upgrades
- This, in turn, offers leverage for CIP concerns