Title: Choosing the correct data movement technology
1EM416 Choosing the Correct Data Movement
Technology
Chris Kleisath Director of Engineering iAnywhere
Solutions kleisath_at_ianywhere.com
2Overview
- When choosing a data movement technology you need
to consider - The business requirements for the distributed
database. - The technological limitations of your
environment. - The development and administrative resources
available.
3Enterprise Data Movement
Wired and Wireless Communication Links
4Distributed Systems
- Using any data movement technology means we have
a distributed database. - This almost always implies some form of
distributed application.
5What is a Distributed System?
- C.J. Dates working definition
- A distributed database system consists of a
collection of sites, connected together via some
sort of network, in which - Each site is a database system in its own right
- Sites have agreed to work together (if
necessary), so that a user at any site can access
data anywhere in the network exactly as if the
data were all stored at the users own site.
6Distributed SystemsPractical Factors
- Not all systems require that all data be
available to all sites. - Not all systems require that all data be
consistent between all sites all of the time. - The degree to which your system must meet the
ideal definition is the single biggest factor in
choosing your data movement technology.
7Issues when distributing data
- Local autonomy
- Data partitioning (fragmentation)
- Consistency
- Transaction control
- Accessibility (connection)
- Topology
8Local autonomy
- Each site should operate independently of the
other sites. - No site should depend on another site for its
successful functioning. - A centralized database provides the lowest level
of local autonomy. - Decentralized systems provide the highest level
of local autonomy.
9Data Partitioning
- Also known as fragmentation.
- Only the data needed by a site is present at the
site. - The database at a site is a complete subset of
the data. - Some data will need to be duplicated between
sites.
10Data PartitioningUpdate Anywhere
- Primary keys must be unique across the entire
distributed system. - If multiple sites insert into the same table.
- Requires a conflict detection and resolution
mechanism. - If multiple sites are able to change the same row.
11Tight vs. Loose Consistency
- Which version of the data is being used?
Waterloo
Paris
12Consistency
- Tight consistency requires all data to be in a
consistent state. - Loose consistency permits data to be
out-of-date. - Latency is the measure of how long it takes the
data to become consistent. - In some cases it is never consistent since there
are always changes that have not been moved.
13Transaction Control
- Your chosen technology must pass the ACID test
- Atomicity, Consistency, Isolation, Durability
- Only committed data should move.
- Committed data must move.
- Failure to successfully move committed data must
be detectable. - Changes must be applied in the same order on all
databases.
14Accessibility
- What kind of network do you have between the
sites? - High-speed LAN/WAN
- Low-speed Dial-up (RAS)
- Wireless
- Indirect (email, ftp)
- Internet (HTTP)
- Sneaker-net
15Topology
- What kind of relationship exists between the
sites. - Peer-to-peer
- Each site can transfer data to any other site.
- No centralized master copy can exist.
- Conflict resolution is extremely difficult
- There is no place to detect and resolve the
conflict
16Topology
- Hierarchical
- Each site passes data up and down the hierarchy.
- A central master copy (consolidated database)
exists. - Data must pass through the consolidated to move
to another site. - Conflict detection and resolution is implemented
on the consolidated.
17Other Issues
- Number of sites
- Some technologies are better suited to mass
deployment. - Vendors
- Are the databases at each site the same product?
- Is the technology commercially available and
supported?
18FactorsSummary
- Each of the following factors will influence your
choice of data movement technologies - Local autonomy
- Data partitioning (fragmentation)
- Consistency
- Transaction control
- Accessibility (connection)
- Topology
19Types of Data Movement
- All technologies can be categorized as one of
- Online
- Synchronization
- Replication
20Online
- Changes are made simultaneously on all
databases.
Please wait while your account is being updated
Withdraw 100
Waterloo
Paris
21Online
- In its simplest form the application updates
all of the databases directly. - Underlying technology is normally Two Phase
Commits - Sybase products
- EAServer
- Not really data movement but is appropriate in
some systems.
22Online CharacteristicsLocal Autonomy
- Very low level of local autonomy.
- If one site is down the entire system is down.
X
Sorry the System is Unavailable
Withdraw 100
Waterloo
Paris
23Online CharacteristicsData Partitioning
- Data can be partitioned as required.
- If the data is partitioned the application must
update the row(s) everywhere. - Since transactions are applied at all databases
simultaneously no primary key or conflict issues
arise.
24Online CharacteristicsConsistency
- Use when tight consistency is an absolute
requirement. - Transactions will succeed or fail on all
databases.
25Online CharacteristicsTransaction Control
- A Distributed Transaction Server (DTS) should be
used. - Ensures the transaction is applied on all sites
or not at all. - Very expensive to code yourself.
- Both ASA and ASE provide support for a DTS
26Online CharacteristicsAccessibility
- Requires a reliable network connection between
sites. - Transactions will fail if one database is
unavailable. - Application speed will be affected by network
speed.
27Online Characteristics
- Typically a peer-to-peer topology.
- Since all databases are updated at once no master
copy is required.
28Online CharacteristicsOther Issues
- Simple to understand
- Looks just like a centralized database.
- Very few sites can be supported
- Consider the cost of updating many databases at
once. - Vendors
- Heterogeneous environments are easily supported.
29Synchronization
- Current state of the data is moved between
databases. - Can be a complete refresh or only the rows that
have changed. - Sybase products
- MobiLink
30Synchronization
1234
10
UPDATE Product SET qty_oh 8WHERE sku_key 1234
1234
10
31Synchronization CharacteristicsLocal Autonomy
- High local autonomy
- Site database must have all of the data required
for the application to run.
32Synchronization CharacteristicsData Partitioning
- Data is usually partitioned.
- Each site has common data and site specific data.
- Update anywhere requires
- Unique primary keys.
- Conflict detection and resolution mechanism.
33Synchronization CharacteristicsConsistency
- Low to high consistency.
- Data is only consistent immediately after
synchronization. - Frequency of synchronization affects level of
consistency but in all cases there is some
latency.
34Synchronization CharacteristicsTransaction
Control
- Transaction boundaries are not maintained.
- Some operation sequences can not be synchronized.
(i.e. insert then delete of a row with the same
primary key value) - Most synchronization technologies batch the
operations. - e.g. all deletes, then inserts, then updates
35Synchronization CharacteristicsAccessibility
- Requires a stable network connection during the
synchronization process. - Connection speed affects the amount of data that
can be reasonably synchronized.
36Synchronization CharacteristicsTopology
- Both peer-to-peer and hierarchical topologies are
possible. - Peer-to-peer is difficult if update anywhere is
permitted. - Which copy of the data is correct?
- Who resolves an update conflict?
37Synchronization CharacteristicsOther Issues
- Heterogeneous environments can be supported.
- Be aware of compatibility issues. E.g. Oracle
allows 1 varchar column/table, ASA has no limit. - Because each site synchronizes independently many
sites can be supported.
38MobiLink
ASA, ASE, Microsoft, Oracle, IBM
Serial
HTTP, TCPIP
HotSync, Wireless
ASA, PalmOS, CE, Pagers, Phones
39MobiLink Characteristics
- Complete local autonomy.
- Complete control over data partitioning on the
consolidated through the use of scripts. - Uses the consolidated databases scripting
language or Java. - No partitioning allowed on the remote.
40MobiLink Characteristics
- Session based.
- Only changed records are synchronized.
- Connection only required while synchronizing.
- Bi-directional by default.
- Medium to high latency.
- Low to medium data volume.
41MobiLink Characteristics
- Hierarchical topology.
- Consolidated can be any ODBC-based database
- Sybase, Microsoft, Oracle, IBM
- ASA and/or UltraLite remotes.
- Optimized for thousands of remotes.
- Scalable based on consolidated databases
capabilities.
42MobiLink Synchronization Components
Consolidated Database Server
Consolidated Data Store
MobiLink Client(ASA or UltraLite)
Remote Database Server (ASA or UltraLite)
MobiLinkServer
Remote Data Store
43MobiLink Synchronization Server
- Provides interface between consolidated database
and remote server. - Works with ODBC-based host databases.
- Responsible for ensuring the synchronization
process completes. - Supports multiple simultaneous synchronizations.
44MobiLink Consolidated Synchronization Logic
- SQL statements executed against the consolidated
database. - Written in language of consolidated database or
Java. - Guides the synchronization server.
- Controls the flow of data in both directions.
- Handles conflicts.
45MobiLink Remote Synchronization Logic
- ASA and UltraLite keep track of changes to the
data. - A synchronization component is provided to
- Scan for changes to create the upload stream
- Receive the download stream and apply the
changes to the remote
46Replication
- Transactions (changes) are moved between the
databases. - Uses store and forward mechanism.
- Site(s) must have a common starting point.
- Sybase Products
- SQL Remote
- Replication Server
47Replication
1234
10
UPDATE Product SET qty_oh 9WHERE sku_key 1234
UPDATE Product SET qty_oh 8WHERE sku_key 1234
1234
10
48Replication CharacteristicsLocal Autonomy
- High local autonomy.
- Database must have all of the data required for
the application to run.
49Replication CharacteristicsData Partitioning
- Data is usually partitioned.
- Each site has common data and site specific data.
- Update anywhere requires
- Unique primary keys.
- Conflict detection and resolution mechanism.
50Replication CharacteristicsConsistency
- Low to high consistency is possible.
- Speed of store and forward messaging system
determines how consistent the database is. - Some latency is always present.
51Replication CharacteristicsTransaction Control
- Mechanism must exist to guarantee transactions
are - Sent and applied in the correct order.
- No transactions are skipped
52Replication CharacteristicsAccessibility
- Whether a direct connection is required or not is
dependant on the latency requirements. - Not required in high latency implementations.
53Replication CharacteristicsTopology
- Both peer-to-peer and hierarchical topologies can
be used. - Conflict resolution normally requires a
hierarchical model.
54Replication CharacteristicsOther Issues
- Only transactions are moved therefore
- It is possible to support many sites.
- Throughput is usually independent of database
size.
55Replication Server
Replicate Sites
Primary Sites
Replication Agent
Replication Server
- DirectCONNECT
- (Native drivers)
- Adaptive Server/Enterprise
- Adaptive Server/Anywhere
- Oracle
- Informix
- OS/390 DB2
- Replication Toolkit for MVS
- DirectCONNECT/
- Anywhere (ODBC)
56Replication Server Characteristics
- Transactions are sent to Replication Server which
stores and forwards them to the interested sites. - Assumes there is normally a high speed
connection. - Near real time (low latency).
- High data volumes.
- Moderate number of sites.
- Heterogeneous databases supported.
- Uni-directional by default.
57Replication Server
Replicate Sites
Primary Sites
Replication Agent
Replication Server
58Replication Server ComponentsPrimary Site
- Origin of the data being moved.
- Multiple vendors RDBMS supported.
- Keeps a record of all transactions. Normally
this is in the transaction log but it depends on
the RDBMS.
59Replication Server ComponentsReplication Agent
- Scans the primary sites record of transactions.
- Passes the committed transactions, in the order
they were applied, to Replication Server.
60Replication Server ComponentsReplication Server
- Receives transactions from the Replication
Agents. - Stores the transactions until they are
successfully applied on all replicate sites. - Maintains a connection to all replicate sites.
- Automatically recovers when a connection is
dropped and restored - Determines which site(s) require the transaction
and applies them in the correct order.
61Replication Server ComponentsReplication Server
- Prevents circular transactions.
- Provides user programmable function strings to
allow manipulation of the transaction. - Data conversions (e.g. date formats)
- Conversion of SQL in heterogeneous environments.
- Detects SQL errors.
62Replication Server ComponentsReplicate Site
- Applies SQL sent by Replication Server.
- A replicate site can also be defined as a primary
site if bi-directional replication is required.
63SQL Remote
ASE
ASA
OR
ASA
ASA
64SQL Remote Characteristics
- Complete local autonomy.
- Partitioning based on
- Column values
- Subqueries
- Where clauses
- Message based (no connection)
- MAPI (Microsoft), VIM (Lotus), SMTP, FTP and File
- Very loose consistency.
65SQL Remote Characteristics
- Built in guaranteed message delivery.
- Hierarchical
- Consolidated is either ASA or ASE
- Remotes are ASA
- Homogeneous.
- Many (thousands) of remotes.
- Low to medium data volumes.
66SQL Remote Components
Consolidated DatabaseServer
Message Agent
Consolidated Data Store
Message System
Remote DatabaseServer
Message Agent
Remote Data Store
67SQL Remote ComponentsConsolidated Database
- Contains a copy of all data that is replicating.
- Performs conflict detection and resolution.
- Transactions are recorded in the transaction log.
- Maintains additional data in the transaction log
about what transactions are eligible to replicate
and how they are partitioned.
68SQL Remote ComponentsMessage Agent
- Scans the transaction log for committed
transactions that are eligible to replicate. - Builds messages for the sites that have
subscribed to the transactions. - Interfaces with the message system.
- Guarantees that transactions are
- Sent in the correct order.
- Applied in the correct order and only applied
once. - No transactions are skipped.
69SQL Remote ComponentsMessage Agent
- Receives transactions from the message system.
- Applies the transactions.
- Prevents circular transactions.
- Detects update conflicts.
- Detects SQL errors.
70SQL Remote ComponentsMessage System
- Provides the store and forward technology.
- Support for
- MAPI
- SMTP
- VIM
- File
- FTP
71SQL Remote ComponentsRemote Database
- Contains data the site is subscribed to.
- Transactions are recorded in the transaction log.
- Maintains additional data in the transaction log
about what transactions are eligible to replicate
and how they are partitioned.
72Which Technology should I Choose?
- Depends on the business requirements and
technological infrastructure available. - Consistency and latency are the biggest factors.
73Use EAServer When
- Absolute consistency is required (Zero latency).
- Transactions must fail when one of the site
databases is unavailable. - There are very few sites.
74Use MobiLink When
- Latency is permitted.
- Local autonomy is required.
- A reliable connection exists.
- You have low to medium data volumes.
- There are heterogeneous databases.
- You do not require transaction boundaries to be
maintained. - You have a hierarchical topology.
- You have many remotes.
- You must know when your changes have been
synchronized.
75Use Replication Server When
- Near real-time consistency is required.
- You have high data volumes.
- Local autonomy is required.
- There are heterogeneous databases.
- You require transaction boundaries to be
maintained. - You have a peer-to-peer topology.
- You can implement a hierarchical topology
- A small number of sites.
76Use SQL Remote when
- Latency is not a factor.
- No direct connection exists (or is not permitted)
or the connection is unreliable,. - Local autonomy is required.
- You have low to medium data volumes.
- Homogeneous (ASA ASE) databases.
- Hierarchical topology.
- You have many remotes.
- You require transaction boundaries to be
maintained.
77Or Combine them
- All four products are compatible.
- Use the strengths of each to solve your business
problems. - Replication Server or EAServer between main
geographical databases. - MobiLink or SQL Remote for mass deployed devices.
78Summary
- Sybase has many different methods of maintaining
data in distributed databases. - Your business requirements dictate which method
is best. - All the technologies can be used together on the
same database.
79EM 416 Choosing the Correct Data Movement
Technology
Chris Kleisath Director of Engineering iAnywhere
Solutions kleisath_at_ianywhere.com