Title: SAS915 Choosing the Correct Data Movement Technology
1SAS915 Choosing the Correct Data Movement
Technology
- Reg Domaratzki
- International and Sustaining Engineering
- iAnywhere Solutions
- rdomarat_at_ianywhere.com
2Overview
- When choosing a data movement technology you need
to consider - The business requirements for the distributed
database. - The technological limitations of your
environment. - The development and administrative resources
available.
3Enterprise Data Movement
Wired and Wireless Communication Links
4Distributed Systems
- Using any data movement technology means we have
a distributed database. - This almost always implies some form of
distributed application.
5What is a Distributed System?
- C.J. Dates working definition
- A distributed database system consists of a
collection of sites, connected together via some
sort of network, in which - Each site is a database system in its own right
- Sites have agreed to work together (if
necessary), so that a user at any site can access
data anywhere in the network exactly as if the
data were all stored at the users own site.
6Distributed Systems Practical Factors
- Not all systems require that all data be
available to all sites. - Not all systems require that all data be
consistent between all sites all of the time. - The degree to which your system must meet the
ideal definition is the single biggest factor in
choosing your data movement technology.
7Issues when distributing data
- Local autonomy
- Data partitioning (fragmentation)
- Consistency
- Transaction control
- Accessibility (connection)
- Topology
8Local autonomy
- Each site should operate independently of the
other sites. - No site should depend on another site for its
successful functioning. - A centralized database provides the lowest level
of local autonomy. - Decentralized systems provide the highest level
of local autonomy.
9Data Partitioning
- Also known as fragmentation.
- Only the data needed by a site is present at the
site. - The database at a site is a complete subset of
the data. - Some data will need to be duplicated between
sites.
10Data Partitioning Update Anywhere
- Primary keys must be unique across the entire
distributed system. - If multiple sites insert into the same table.
- Requires a conflict detection and resolution
mechanism. - If multiple sites are able to change the same row.
11Tight vs. Loose Consistency
- Which version of the data is being used?
Waterloo
Paris
12Consistency
- Tight consistency requires all data to be in a
consistent state. - Loose consistency permits data to be
out-of-date. - Latency is the measure of how long it takes the
data to become consistent. - In some cases it is never consistent since there
are always changes that have not been moved.
13Transaction Control
- Your chosen technology must pass the ACID test
- Atomicity, Consistency, Isolation, Durability
- Only committed data should move.
- Committed data must move.
- Failure to successfully move committed data must
be detectable. - Changes must be applied in the same order on all
databases.
14Accessibility
- What kind of network do you have between the
sites? - High-speed LAN/WAN
- Low-speed Dial-up (RAS)
- Wireless
- Indirect (email, ftp)
- Internet (HTTP)
- Sneaker-net
15Topology
- What kind of relationship exists between the
sites? - Peer-to-peer
- Each site can transfer data to any other site.
- No centralized master copy can exist.
- Conflict resolution is extremely difficult
- There is no place to detect and resolve the
conflict
16Topology
- Hierarchical
- Each site passes data up and down the hierarchy.
- A central master copy (consolidated database)
exists. - Data must pass through the consolidated to move
to another site. - Conflict detection and resolution is implemented
on the consolidated.
17Other Issues
- Number of sites
- Some technologies are better suited to mass
deployment. - Vendors
- Are the databases at each site the same product?
- Is your data movement technology developed by one
of the database vendors you are also using?
18Factors Summary
- Each of the following factors will influence your
choice of data movement technologies - Local autonomy
- Data partitioning (fragmentation)
- Consistency
- Transaction control
- Accessibility (connection)
- Topology
19Types of Data Movement
- All technologies can be categorized as one of
- Online
- Synchronization
- Replication
20Online
- Changes are made simultaneously on all
databases.
Please wait while your account is being updated
Withdraw 100
Waterloo
Paris
21Online
- In its simplest form the application updates all
of the databases directly. - Underlying technology is normally Two Phase
Commits - Sybase products
- EAServer
- Not really data movement but is appropriate in
some systems.
22Online Characteristics
- Local Autonomy
- Very low level of local autonomy.
- If one site is down the entire system is down.
X
Sorry the System is Unavailable
Withdraw 100
Waterloo
Paris
23Online Characteristics
- Data Partitioning
- Data can be partitioned as required.
- If the data is partitioned the application must
update the row(s) everywhere. - Since transactions are applied at all databases
simultaneously no primary key or conflict issues
arise. - Consistency
- Provides the highest level of consistency.
- Transactions will succeed or fail on all
databases.
24Online Characteristics
- Transaction Control
- A Distributed Transaction Server (DTS) should be
used. - Ensures the transaction is applied on all sites
or not at all. - Very expensive to code yourself.
- Both ASA and ASE provide support for a DTS
- Accessibility
- Requires a reliable network connection between
sites. - Transactions will fail if one database is
unavailable. - Application speed will be affected by network
speed.
25Online Characteristics
- Topology
- Typically a peer-to-peer topology.
- Since all databases are updated at once no master
copy is required. - Other Characteristics
- Simple to understand
- Looks just like a centralized database.
- Very few sites can be supported
- Consider the cost of updating many databases at
once. - Vendors
- Heterogeneous environments are easily supported.
26Synchronization
- Current state of the data is moved between
databases. - Can be a complete refresh or only the rows that
have changed. - Sybase products
- MobiLink
27Synchronization
1234
10
UPDATE Product SET qty_oh 8WHERE sku_key 1234
1234
10
28Synchronization Characteristics
- Local Autonomy
- High local autonomy
- Site database must have all of the data required
for the application to run. - Data Partitioning
- Data is usually partitioned.
- Each site has common data and site specific data.
- Update anywhere requires
- Unique primary keys.
- Conflict detection and resolution mechanism.
29Synchronization Characteristics
- Consistency
- Low to high consistency.
- Data is only consistent immediately after
synchronization. - Frequency of synchronization affects level of
consistency but in all cases there is some
latency. - Transaction Control
- Transaction boundaries are not maintained.
- Most synchronization technologies batch the
operations. - e.g. all deletes, then inserts, then updates
30Synchronization Characteristics
- Accessibility
- Requires a stable network connection during the
synchronization process. - Connection speed affects the amount of data that
can be reasonably synchronized. - Topology
- Both peer-to-peer and hierarchical topologies are
possible. - Peer-to-peer is difficult if update anywhere is
permitted. - Which copy of the data is correct?
- Who resolves an update conflict?
31Synchronization Characteristics
- Other Characteristics
- Heterogeneous environments can be supported.
- Be aware of compatibility issues. E.g. Oracle
allows 1 varchar column/table, ASA has no limit. - Because each site synchronizes independently many
sites can be supported.
32MobiLink
ASA, ASE, Microsoft, Oracle, IBM
Serial
HTTP, TCPIP
HotSync, Wireless
ASA, PalmOS, CE, Pagers, Phones
33MobiLink Characteristics
- High local autonomy.
- Complete control over data partitioning on the
consolidated through the use of scripts. - Uses the consolidated databases scripting
language, Java or .NET - No partitioning allowed on the remote.
34MobiLink Characteristics
- Session based.
- Only changed records are synchronized.
- Connection only required while synchronizing.
- Bi-directional by default.
- Medium to high latency.
- Low to medium data volume.
35MobiLink Characteristics
- Hierarchical topology.
- Consolidated can be any RDBMS with a suitable
ODBC Driver - We currently support Sybase, Microsoft, Oracle,
and IBM RDBMS - Not all ODBC Drivers are created equal
- For the most recent information on supported ODBC
Drivers see http//my.sybase.com/detail?id1011880
- ASA and/or UltraLite remotes.
- Optimized for thousands of remotes.
- Scalable based on consolidated databases
capabilities.
36MobiLink Synchronization Components
Consolidated Database Server
Consolidated Data Store
MobiLink Client(ASA or UltraLite)
Remote Database Server (ASA or UltraLite)
MobiLinkServer
Remote Data Store
37MobiLink Synchronization Server
- Provides interface between consolidated database
and remote server. - Works with ODBC-based host databases.
- Responsible for ensuring that the two main phases
of synchronization (upload and download) are
atomic. - Supports multiple simultaneous synchronizations.
38MobiLink Consolidated Synchronization Logic
- SQL statements executed against the consolidated
database. - You write the synchronization scripts, so you
have complete control over the flow of data in
your system - Written in language of consolidated database,
Java or .NET - Guides the synchronization server.
- Controls the flow of data in both directions.
- Handles conflicts, using the scripts youve
written.
39MobiLink Remote Synchronization Logic
- ASA and UltraLite keep track of changes to the
data. - A synchronization component is provided to
- Scan for changes to create the upload stream
- Receive the download stream and apply the
changes to the remote
40Replication
- Transactions (changes) are moved between the
databases. - Uses store and forward mechanism.
- Site(s) must have a common starting point.
- Sybase Products
- SQL Remote
- Replication Server
41Replication
1234
10
UPDATE Product SET qty_oh 9WHERE sku_key 1234
UPDATE Product SET qty_oh 8WHERE sku_key 1234
1234
10
42Replication Characteristics
- Local Autonomy
- High local autonomy.
- Database must have all of the data required for
the application to run. - Data Partitioning
- Data is usually partitioned.
- Each site has common data and site specific data.
- Update anywhere requires
- Unique primary keys.
- Conflict detection and resolution mechanism.
43Replication Characteristics
- Consistency
- Low to high consistency is possible.
- Speed of store and forward messaging system
determines how consistent the database is. - Some latency is always present.
- Transaction Control
- Mechanism must exist to guarantee transactions
are - Sent and applied in the correct order.
- No transactions are skipped
44Replication Characteristics
- Accessibility
- Whether a direct connection is required or not is
dependant on the latency requirements. - Not required in high latency implementations.
- Topology
- Both peer-to-peer and hierarchical topologies can
be used. - Conflict resolution normally requires a
hierarchical model.
45Replication Characteristics
- Other Characteristics
- Only transactions are moved therefore
- It is possible to support many sites.
- Throughput is usually independent of database
size.
46Replication Server
Replicate Sites
Primary Sites
Replication Agent
Replication Server
- DirectCONNECT
- (Native drivers)
- Adaptive Server/Enterprise
- Adaptive Server/Anywhere
- Oracle
- Informix
- OS/390 DB2
- Replication Toolkit for MVS
- DirectCONNECT/
- Anywhere (ODBC)
47Replication Server Characteristics
- Transactions are sent to Replication Server which
stores and forwards them to the interested sites. - Assumes there is normally a high speed
connection. - Near real time (low latency).
- High data volumes.
- Moderate number of sites.
- Heterogeneous databases supported.
- Uni-directional by default.
48Replication Server Components
- Primary Site
- Origin of the data being moved.
- Multiple vendors RDBMS supported.
- Keeps a record of all transactions. Normally
this is in the transaction log but it depends on
the RDBMS. - Replication Agent
- Scans the primary sites record of transactions.
- Passes the committed transactions, in the order
they were applied, to Replication Server.
49Replication Server Components
- Replication Server
- Receives transactions from the Replication
Agents. - Stores the transactions until they are
successfully applied on all replicate sites. - Maintains a connection to all replicate sites.
- Automatically recovers when a connection is
dropped and restored - Determines which site(s) require the transaction
and applies them in the correct order.
50Replication Server Components
- Replication Server
- Prevents circular transactions.
- Provides user programmable function strings to
allow manipulation of the transaction. - Data conversions (e.g. date formats)
- Conversion of SQL in heterogeneous environments.
- Detects SQL errors.
51Replication Server Components
- Replicate Site
- Applies SQL sent by Replication Server.
- A replicate site can also be defined as a primary
site if bi-directional replication is required.
52SQL Remote
ASE
ASA
OR
ASA
ASA
53SQL Remote Characteristics
- Complete local autonomy.
- Partitioning based on
- Column values
- Subqueries
- Where clauses
- Message based (no connection)
- MAPI (Microsoft), VIM (Lotus), SMTP, FTP and File
- Very loose consistency.
54SQL Remote Characteristics
- Built in guaranteed message delivery.
- Hierarchical
- Consolidated is either ASA or ASE
- Remotes are ASA
- Homogeneous.
- Many (thousands) of remotes.
- Low to medium data volumes.
55SQL Remote Components
Consolidated DatabaseServer
Message Agent
Consolidated Data Store
Message System
Remote DatabaseServer
Message Agent
Remote Data Store
56SQL Remote Components
- Consolidated Database
- Contains a copy of all data that is replicating.
- Performs conflict detection and resolution.
- Transactions are recorded in the transaction log.
- Maintains additional data in the transaction log
about what transactions are eligible to replicate
and how they are partitioned.
57SQL Remote Components
- Message Agent
- Scans the transaction log for committed
transactions that are eligible to replicate. - Builds messages for the sites that have
subscribed to the transactions. - Interfaces with the message system.
- Guarantees that transactions are
- Sent in the correct order.
- Applied in the correct order and only applied
once. - No transactions are skipped.
58SQL Remote Components
- Message Agent
- Receives transactions from the message system.
- Applies the transactions.
- Prevents circular transactions.
- Detects update conflicts.
- Detects SQL errors.
59SQL Remote Components
- Message System
- Provides the store and forward technology.
- Support for
- MAPI
- SMTP
- VIM
- FILE
- FTP
60SQL Remote Components
- Remote Database
- Contains data the site is subscribed to.
- Transactions are recorded in the transaction log.
- Maintains additional data in the transaction log
about what transactions are eligible to replicate
and how they are partitioned.
61Which Technology should I Choose?
- Depends on the business requirements and
technological infrastructure available. - Consistency and latency are the biggest factors.
62Use EAServer When
- Absolute consistency is required (Zero latency).
- Transactions must fail when one of the site
databases is unavailable. - There are very few sites.
63Use MobiLink When
- Latency is permitted.
- Local autonomy is required.
- A reliable connection exists.
- You have low to medium data volumes.
- There are heterogeneous databases.
- You do not require transaction boundaries to be
maintained. - You have a hierarchical topology.
- You have many remotes.
64Use Replication Server When
- Near real-time consistency is required.
- You have high data volumes.
- Local autonomy is required.
- There are heterogeneous databases.
- You require transaction boundaries to be
maintained. - You have a peer-to-peer topology.
- You can implement a hierarchical topology
- A small number of sites.
65Use SQL Remote when
- Latency is not a factor.
- No direct connection exists (or is not permitted)
or the connection is unreliable,. - Local autonomy is required.
- You have low to medium data volumes.
- Homogeneous (ASA ASE) databases.
- Hierarchical topology.
- You have many remotes.
- You require transaction boundaries to be
maintained.
66Or Combine them
- All four products are compatible.
- Use the strengths of each to solve your business
problems. - Replication Server or EAServer between main
geographical databases. - MobiLink or SQL Remote for mass deployed devices.
67Summary
- Sybase has many different methods of maintaining
data in distributed databases. - Your business requirements dictate which method
is best. - All the technologies can be used together on the
same database.
68iAnywhere Solutions at TechWave2003
Activities for iAnywhere Solutions
- Keynote Extending Information Anywhere
- Wednesday, August 6 800 a.m - 930 a.m.
- Part Two Extending Information Anywhere
- Learn how iAnywhere Solutions, the m-business
subsidiary of Sybase, is addressing the mobile
and wireless landscape and previews iAnywhere
Solutions future technology plans including its
popular, leading mobile portal, AvantGo. - Ask the iAnywhere Experts on the Technology
Boardwalk - Drop in during exhibit hall hours and have all
your questions answered by our technical experts! - Appointments outside of exhibit hall hours are
also available to speak one-on-one with our
Senior Engineers. Ask questions or get your
yearly technical review ask us for details - m-Business Pavilion
- Visit the m-Business Pavilion in the exhibit hall
to see how companies like Intermec have built
m-Business solutions using iAnywhere Solutions
technology
69iAnywhere Solutions at TechWave2003
Activities for iAnywhere Solutions
- Wi-Fi Hotspots brought to you by Intel
iAnywhere Solutions - You can enjoy wireless internet access via a
Wi-Fi hotspot provided by Intel and iAnywhere
Solutions. Using either a laptop or PDA that is
Wi-Fi 802.11b wirelessly-enabled, visitors can
access personal email, the internet ,and
"TechWave To Go", a My AvantGo channel providing
up-to-date information about TechWave classes,
events and more. - Developer Community
- A one-stop source for technical information!
- Access to newsgroups,new betas and code samples
- Monthly technical newsletters
- Technical whitepapers,tips and online product
documentation - Current webcast,class,conference and seminar
listings - Excellent resources for commonly asked questions
- All available express bug fixes and patches
- Network with thousands of industry experts
- http//www.ianywhere.com/developer/
70Sybase TechWave 2003
SAS915 Choosing the Correct Data Movement
Technology
Reg Domaratzki International and Sustaining
Engineering iAnywhere Solutions rdomarat_at_ianywhere
.com