Title: Mobile Data Management
1Mobile Data Management
- Sanjay Kumar Madria
- Department of Computer Science
- University of Missouri-Rolla
- Rolla, MO 65401
2Wireless Technologies
- Wireless local area networks (WaveLan, Aironet
Possible Transmission error - Cellular wireless Low bandwidth
- Packet radio (Metricom) -Low Bandwidth
- Satellites (Inmarsat, Iridium) Long Latency
3 Mobility Constraints
- CPU
- Power
- Bandwidth
- Delay tolerance
- Physical size
- Constraints on peripherals and GUIs (modality of
interaction) - Locations change dynamically
4Why Mobile Data Mgmt?
- Wireless Connectivity and use of PDAs ,
- handheld computing devices on the rise
- Workforces will carry extracts of corporate
- databases with them
- Need central database repositories to serve
- these work groups and keep them fairly
- upto-date and consistent
5Applications
- Sales Force Automation - especially in
- pharmaceutical industry, consumer goods,
- parts
- Financial Consulting and Planning
- Insurance and Claim Processing - Auto,
- General, and Life Insurance
- Real Estate/Property Management,
- Maintenance and Building Contracting
6Data Processing Scenario
- One server or many servers (corporate data,
- inventory, HR, orders/billing)
- Shared Data
- Some Local Data per client , mostly subset of
- global data
- Need for accurate, up-to-date information
- Limitations
- Short connect time per session
- Infrequent connections
- Clients may remain dormant for extended periods
of time - Clients not reachable from servers at all times
7What is Mobility?
- A device that moves
- Between different geographical locations
- Between different networks
- A person who moves
- Between different geographical locations
- Between different networks
- Between different communication devices
- Between different applications
8 Device mobility
- Plug in laptop at home/work on Ethernet
- Occasional long breaks in network access
- Wired network access only (connected gt
well-connected) - Network address changes
- Only one type of network interface
- May want access to information when no network is
available hoard information locally - Cell phone with access to cellular network
- Continuous connectivity
- Phone remains the same (high-level network
address) - Network performance may vary from place to place
9 Device mobility.
- Can we achieve best of both worlds?
- Continuous connectivity of wireless access
- Performance of better networks when available
- Laptop moves between Ethernet, WaveLAN and
Metricom networks - Wired and wireless network access
- Potentially continuous connectivity, but may be
breaks in service - Network address changes
- Radically different network performance on
different networks
10 People mobility
- Phone available at home or at work
- Multiple phone numbers to reach me
- Breaks in my reachability when Im not in
- Cell phone
- Only one number to reach me
- Continuously reachable
- Sometimes poor quality and expensive connectivity
- Cell phone, networked PDA, etc.
- Multiple numbers/addresses for best quality
connection - Continuous reachability
- Best choice of address may depend on senders
device or message content
11 Mobility means changes
- How does it affect the following?
- Hardware
- Lighter
- More robust
- Lower power
- Wireless communication
- Cant tune for stationary access
- Network protocols
- Name changes
- Delay changes
- Error rate changes
12 Changes...
- Fidelity
- High fidelity may not be possible
- Data consistency
- Strong consistency no longer possible
- Location/transparency awareness
- Transparency not always desirable
- Names/addresses
- Names of endpoints may change
- Security
- Lighter-weight algorithms
- Endpoint authentication harder
- Devices more vulnerable
13 Changes...
- Performance
- Network, CPU all constrained
- Delay and delay variability
- Operating systems
- New resources to track and manage energy
- Applications
- Name changes
- Changes in connectivity
- Changes in quality of resources
- People
- Introduces new complexities, failures, devices
14 Example changes
- Addresses
- Phone numbers, IP addresses
- Network performance
- Bandwidth, delay, bit error rates, cost,
connectivity - Network interfaces
- PPP, eth0, strip
- Between applications
- Different interfaces over phone laptop
- Within applications
- Loss of bandwidth triggers change from color to
BW - Available resources
- Files, printers, displays, power, even routing
15- Most RDBMS vendors support the ISDB
- scenario - but no design and optimization
- aids
- Specialized Environments for ISDB apps
- Sybase Remote Server
- Synchrologic iMOBILE
- Microsoft SQL server - mobile app support
- Oracle Lite
- Xtnd-Connect-Server (Extended
Technologies) - Scoutware (Riverbed Technologies)
16Personal Communication System (PCS)
17Personal Communication System (PCS)
18Personal Communication System (PCS)
- Mobile cells
- The entire coverage area is a group of a number
of cells. The size of cell depends upon the
power of the base stations.
19Personal Communication System (PCS)
20Personal Communication System (PCS)
- Problems with cellular structure
- How to maintain continuous communication between
two parties in the presence of mobility? - Solution Handoff
- How to maintain continuous communication between
two parties in the presence of mobility? - Solution Roaming
- How to locate of a mobile unit in the entire
coverage area? - Solution Location management
21Personal Communication System (PCS)
A process, which allows users to remain in touch,
even while breaking the connection with one BS
and establishing connection with another BS.
22Personal Communication System (PCS)
- Handoff
- To keep the conversation going, the Handoff
procedure should be completed while the MS (the
bus) is in the overlap region.
23Personal Communication System (PCS)
- Handoff types with reference to the network
- Intra-system handoff or Inter-BS handoff
- The new and the old BSs are connected to the
same MSC.
24Personal Communication System (PCS)
- Intra-system handoff or Inter-BS handoff
- Steps
- The MU (MS) momentarily suspends conversation and
initiates the handoff procedure by signaling on
an idle (currently free) channel in the new BS.
Then it resumes the conversation on the old BS.
25Personal Communication System (PCS)
- Intra-system handoff or Inter-BS handoff
- Upon receipt of the signal, the MSC transfers the
encryption information to the selected idle
channel of the new BS and sets up the new
conversation path to the MS through that channel.
The switch bridges the new path with the old
path and informs the MS to transfer from the old
channel to the new channel.
26Personal Communication System (PCS)
- Intra-system handoff or Inter-BS handoff
- After the MS has been transferred to the new BS,
it signals the network and resumes conversation
using the new channel.
27Personal Communication System (PCS)
- Intra-system handoff or Inter-BS handoff
- Upon the receipt of the handoff completion
signal, the network removes the bridge from the
path and releases resources associated with the
old channel.
28Personal Communication System (PCS)
- Handoff types with reference to the network
- Intersystem handoff or Inter-MSC handoff
- The new and the old BSs are connected to
different MSCs.
29Personal Communication System (PCS)
Administrative constraints
- Billing.
- Subscription agreement.
- Call transfer charges.
- User profile and database sharing.
- Any other policy constraints.
30Personal Communication System (PCS)
Technical constraints
- Bandwidth mismatch. For example, European 900MHz
band may not be available in other parts of the
world. - Service providers must be able to communicate
with each other. Needs some standard. - Mobile station constraints.
31Personal Communication System (PCS)
- Two basic operations in roaming management are
- Registration (Location update) The process of
informing the presence or arrival of a MU to a
cell. - Location tracking the process of locating the
desired MU.
32Personal Communication System (PCS)
Two-Tier Scheme
HLR Home Location Register A HLR stores user
profile and the geographical location. VLR
Visitor Location Register A VLR stores user
profile and the current location who is a visitor
to a different cell that its home cell.
33Personal Communication System (PCS)
Two-Tier Scheme steps. MU1 moves to cell 2.
34Personal Communication System (PCS)
- Steps
- MU1 moves to cell 2. The MSC of cell 2 launches
a registration query to its VLR 2. - VLR2 sends a registration message containing MUs
identity (MIN), which can be translated to HLR
address. - After registration, HLR sends an acknowledgment
back to VLR2. - HLR sends a deregistration message to VLR1 (of
cell 1) to delete the record of MU1 (obsolete).
VLR1 acknowledges the cancellation.
35Personal Communication System (PCS)
- Steps
- VLR of cell 2 is searched for MU1s profile.
- If it is not found, then HLR is searched.
- Once the location of MU1 is found, then the
information is sent to the base station of cell
1. - Cell 1 establishes the communication.
36Personal Communication System (PCS)
Two-Tier Scheme steps location search
37Personal Communication System (PCS)
Two-Tier Scheme steps location update
38Mobile Database Systems (MDS)
- A Reference Architecture (Client-Server model)
39Data Processing Issues
- Processing at the Server
- Processing at the Client
- Update Propagation and Installation
- Consistency Management
- Less Serious
- Concurrent Transactions
- Client Data Recovery
40(No Transcript)
41Database Issues in Mobile Computing
- Query and Transaction Processing
- Replication Management
- Location Management
- Limitations
- Data Distribution, Mobility Management and
Scalability - Role of wireless medium in info distribution
- Dealing with short battery life
- Dealing with prolonged disconnection
- Periods
- Bandwidth Management
42Mobility Management andScalability
- Location management
- Changing topologies
- Handoffs
- Resource finding
- Replication
- Resource sharing
43Bandwidth Management
- Clients assumed to have weak and/or
- unreliable communication capabilities
- Broadcast--scalable but high latency
- On-demand--less scalable and requires
- more powerful client, but better response
- Client caching allows bandwidth
- conservation
44Energy Management
- Battery life expected to increase by only
- 20 in the next 10 years
- Reduce the number of messages sent
- Doze modes
- Power aware system software
- Power aware microprocessors
- Indexing wireless data to reduce tuning time
45Large Impact
- Distributed data management
- Querying wireless data
- Handling/representing fast-changing data
- Scale
- Tariff-driven query optimization
- Security
- User interfaces
46Query Processing
- New Issues
- Energy Efficient Query Processing
- Location Dependent Query Processing
- Old Issues - New Context
- Cost Model
47Location Management
- New Issues
- Tracking Mobile Users
- Old Issues - New Context
- Managing Update Intensive Location Information
- Providing Replication to Reduce Latency for
Location Queries - Consistent Maintenance of Location Information
48Transaction Processing
- New Issues
- Recovery of Mobile Transactions
- Lock Management in Mobile Transaction
- Old Issues - New Context
- Extended Transaction Models
- Partitioning Objects while Maintaining
Correctness
49Dissemination-based Data Delivery Using Broadcast
Disks
50Broadcast Disk
- Proposes a mechanism called Broadcast Disks to
provide database access to mobile clients. - Server continuously and repeatedly broadcasts
data to a mobile client as it goes by. - Multiple disks of different sizes are
superimposed on the broadcast medium. - Exploits the client storage resources for caching
data.
51Server Broadcast Programs
- Data server must construct a broadcast program
to meet the needs of the client population. - Server would take the union of required items and
broadcast the resulting set cyclically. - Single additional layer in a clients memory
hierarchy - flat broadcast. - In a flat broadcast the expected wait for an item
on the broadcast is the same for all items.
52Server Broadcast Programs
53Server Broadcast Programs
- Broadcast Disks are an alternative to flat
broadcasts. - Broadcast is structured as multiple disks of
varying sizes, each spinning at different rates.
54Server Broadcast Programs
Flat Broadcast
Skewed Broadcast
Multi-disk Broadcast
55Server Broadcast Programs
- For uniform access probabilities a flat disk has
the best expected performance. - For increasingly skewed access probabilities,
non-flat disk programs perform better. - Multi-disk programs perform better than the
skewed programs.
56Server Broadcast Programs
- Generating a multi-disk broadcast
- Number of disks (num_disks) determine the number
of different frequencies with which pages will be
broadcast. - For each disk, the number of pages and the
relative frequency of broadcast (rel_freq(i)) are
specified.
57 Server Broadcast Programs
58 Client Cache Management
- Improving the broadcast for one probability
access distribution will hurt the performance of
other clients with different access
distributions. - Therefore the client machines need to cache pages
obtained from the broadcast.
59 Client Cache Management
- With traditional caching clients cache the data
most likely to be accessed in the future. - With Broadcast Disks, traditional caching may
lead to poor performance if the servers
broadcast is poorly matched to the clients access
distribution.
60 Client Cache Management
- In the Broadcast Disk system, clients cache the
pages for which the local probability of access
is higher than the frequency of broadcast. - This leads to the need for cost-based page
replacement.
61 Client Cache Management
- One cost-based page replacement strategy replaces
the page that has the lowest ratio between its
probability of access (P) and its frequency of
broadcast (X) - PIX - PIX requires the following
- 1 Perfect knowledge of access probabilities.
- 2 Comparison of PIX values for all cache resident
pages at cache replacement time.
62 Client Cache Management
- Another page replacement strategy adds the
frequency of broadcast to an LRU style policy.
This policy is known as LIX. - LIX maintains a separate list of cache-resident
pages for each logical disk - Each list is ordered based on an approximation of
the access probability (L) for each page. - A LIX value is computed by dividing L by X, the
frequency of broadcast. The page with the lowest
LIX value is replaced.
63Prefetching
- An alternative approach to obtaining pages from
the broadcast. - Goal is to improve the response time of clients
that access data from the broadcast. - Methods of Prefetching
- Tag Team Caching
- Prefetching Heuristic
64Prefetching
- Tag Team Caching - Pages continually replace each
other in the cache. - For example two pages x and y, being broadcast,
the client caches x as it arrives on the
broadcast. Client drops x and caches y when y
arrives on the broadcast.
65Prefetching
- Simple Prefetching Heuristic
- Performs a calculation for each page that arrives
on the broadcast based on the probability of
access for the page (P) and the amount of time
that will elapse before the page will come around
again (T). - If the PT value of the page being broadcast is
higher than the page in cache with the lowest PT
value, then the page in cache is replaced.
66 Read/Write Case
- With dynamic broadcast there are three different
changes that have to be handled. - 1 Changes to the value of the objects being
broadcast. - 2 Reorganization of the broadcast.
- 3 Changes to the contents of the broadcast.
67Conclusion
- Broadcast Disks project investigates the use of
data broadcast and client storage resources to
provide improved performance, scalability and
availability in networked applications with
asymmetric capabilities.
68(No Transcript)
69- Mobility
- System conguration is no longer static the
center of activity, - the topology, the system load, and locality,
change - dynamically
- need to search to locate objects
- various forms of heterogeneity
70Wireless Communications
- offer less bandwidth
- more expensive
- less reliable
- Consequently, connectivity is weak and often
intermittent
71Portable Devices
- light and small to be easily carried around
- Such considerations, in conjunction with a given
cost and - level of technology ) mobile elements with less
resources - (e.g., memory, screen size and disk capacity)
- reliance on battery
- can be more easily accidentally damaged, stolen,
or lost, thus, - less secure and reliable
72- Mobile units are still characterized as
- unreliable and prone to hard failures, i.e.,
theft, loss or - accidental damage,
- resource-poor relative to static hosts.
- Examples InfoPad 16 and ParcTab 28 projects
73- Adaptability
- A mobile system is presented with resources of
varying number and quality - Connectivity conditions vary from total
disconnections to full connectivity - Available resources are not static either, for
instance a - docked" mobile computer may have access to a
larger display or memory. - the location of mobile elements changes and so
does the network conguration and the center of
computational activity
74- Example during disconnection, a mobile host may
work - autonomously, while during periods of strong
connectivity, depend - heavily on the xed network sparing its scarce
local resources
75- Disconnections disconnected operation -
autonomous - operation of a mobile host during disconnection.
- Weak connectivity Operation should be tuned for
communication environments characterized by low
bandwidth, high latency, and expensive prices. - Mobility Basic support such as as establishing
new communication links as well as advanced
support such as migrating executing processes and
database transactions in progress. - Failure recovery Since mobile elements are
prone to hard failures, methods for failure
handling and recovery are important.
76Data Dissemination by Broadcast
- Pull-based data delivery or on demand data
delivery A client - explicitly requests data items from the server.
- Push-based data delivery The server repetitively
broadcasts - data to a client population without a specic
request. Clients - monitor the broadcast and retrieve the data items
they need as - they arrive.
77- Applications
- Dissemination-based information feeds such as
stock quotes and sport tickets, electronic
newsletters, mailing lists, traffic and weather
information systems, cable TV on the Internet - Commercial Products for example
- the AirMedia's Live Internet broadcast network
6 Hughes Network Systems' DirectPC 26 - Teletext and Videotex systems 11, 28
78- The Datacycle project 16 at Bellcore a
database circulates on a high bandwidth network
(140 Mbps). Users query the database by ltering
information via special massively parallel
transceivers. - The Boston Community Information System (BCIS)
18 - broadcast news and information over an FM channel
to clients - with personal computers equipped with radio
receivers
79Hybrid Delivery
- Push vs Pull
- Push suitable when information is transmitted to
a large number of clients with overlapping
interests the server saves several messages - the server is prevented from being overwhelmed
by client requests. - Push is scalable performance does not depend on
the number of clients Pull cannot scale beyond
the capacity of the server - or the network.
- In push, access is only sequential Thus, access
latency degrades with the volume of data In pull,
clients play a more active role
80- Hybrid Delivery
- clients are provided with an uplink channel,
called backchannel, - to send messages to the server.
- Sharing the channel
- if the same channel is used for both broadcast
delivery and for - the transmission of the replies to on demand
requests
81- Use of the backchannel
- - to provide feedback and prole information to
the server - - to directly request data
- Which pages? to avoid overwhelming the server
- Page i not in cache and the number of items
scheduled to appear - before i on the broadcast is greater than a
threshold parameter
82Selective Broadcast
- Broadcast an appropriately selected subset of
items and provide - the rest on demand
- In 25, the broadcast is used as an air-cache
for storing - frequently requested data. The broadcast content
continuously - adjusts to match the hot-spot of the database.
The hot-spot is - calculated by observing broadcast misses
indicated by explicit - requests for data not on the broadcast.
- In 19 the database is partitioned into a
\publication group" - that is broadcast and an \on demand" group. The
criterion for - partitioning is to minimize the backchannel
requests while - constraining the response time below a predened
upper limit.
83On Demand Broadcast
- the server chooses the next item to broadcast on
every broadcast - tick based on the requests for data it has
received - Various strategies 28 broadcast the pages in
the order they are - requested (FCFS), or the page with the maximum
number of - pending requests.
- A parameterized algorithm for large-scale data
broadcast based - only on the current queue of pending requests 7
84Organization of Broadcast Data
- Access time average time elapsed from the moment
a client - expresses its interest to an item to the receipt
of the item on the - broadcast channel
- Tuning time the amount of time spent listening
to the - broadcast channel
- Organize the broadcast to minimize access and
tuning time
85Efficient Concurrency Control for Broadcast
Environments
- Jayavel Shanmugasundaram
- Arvind Nithrakashyap
- Rajendran Sivasankaran
- Krithi Ramamritham
86Outline
- Broadcast environments
- Inapplicability of existing techniques
- Suitable correctness criterion
- Mechanisms
- Performance Results
- Conclusion
87Why Broadcast Data?
- Millions of clients that need to see current and
consistent data - Server handling all client requests
- gt not scalable
- More scalable solution
- Periodically broadcast all data items
- Clients read items off broadcast
- Datacycle Herman, Broadcast Disks Acharya
88Example eAuctions
- Numerous potential clients
- Only a small fraction contact
- server to offer bids
- Need access to current and consistent data
89Broadcast Environment Characteristics
- Large number of clients
- Mobile clients with scarce power resource
- gt Low client to server bandwidth
- Plentiful server to client bandwidth
- gt Asymmetric communication medium
90Mutually Consistent Reads
R(x)
R(y)
R(z)
TrBegin
TrEnd
time (broadcast cycles)
Are x, y, and z mutually consistent?
91Outline
- Broadcast environments
- Inapplicability of existing techniques
- Suitable correctness criterion
- Mechanisms
- Performance Results
- Conclusion
92Why Not Traditional CC Techniques?
- Approach 1 Dynamic conflict resolution
- Excessive communication
- e.g., locking
- acquiring read locks by client transactions
- server swamped with lock requests
- client uses precious uplink bandwidth
- Approach 2 Avoid potential serializability
conflicts
93Schedules
time
94Serialization Orders
T2 T4
T4T1T2
95Serialization Orders
T2 T4
T4T1T2
Even if ClientB does not exist, ClientA will have
to abort transaction T1
96Serializability?
- Serializability - a global property
- All read only transactions
- Required to see same serial order of update
transactions, even if executing at different
clients - Required to be serializable w.r.t. all update
transactions, even if updates do not affect
values read - Inappropriate for broadcast environments
97Outline
- Broadcast environments
- Inapplicability of existing techniques
- Suitable correctness criterion
- Mechanisms
- Performance Results
- Conclusion
98Broadcast Data Requirements
- Mutual consistency
- server maintains mutually consistent data
- clients read mutually consistent data
- Currency
- clients see data that is current
99 A Sufficient Criterion
- All update transactions are serializable.
- Each read-only transaction is serializable with
respect to the - update transactions it (directly or
indirectly) reads from. -
C4
W4(y)
Server
W2(x)
C2
T2T4
ClientA
R1(y)
R1(x)
T4T1
ClientB
R3(x)
R3(y)
T2T3
100 A Sufficient Criterion
- All update transactions are serializable.
- Each read-only transaction is serializable with
respect to the - update transactions it (directly or
indirectly) reads from. -
external consistency Weihl 87 update
consistency Bober and Carey 92
101Implications
- Decoupled correctness criterion
- Clients need not communicate with server or other
clients - Weaker correctness criterion
- Reduces unnecessary aborts
102Outline
- Broadcast environments
- Inapplicability of existing techniques
- Suitable correctness criterion
- Mechanisms
- Performance Results
- Conclusion
103The Algorithm F-Matrix
- Server functionality
- Client functionality
- Nature of Control Information
- broadcast by the server with the data
- helps clients determine consistency of reads
- Client read-only validation protocol
104Server Functionality
- Ensures conflict serializability of update
transactions - Broadcasts during each cycle
- Committed values of data items at start of cycle
- Control matrix
- Incrementally maintains control matrix as updates
occur
105Client Functionality
106Control Matrix Intuition
107Control Matrix
Objects n objects all
initialized at cycle 0 C n x n control
matrix C(x,y) max( cycle in which T
commits ), where T affects the latest
committed value of y
and also writes to x
108Precond. for Consistent Reads
T previously read x from broadcast cycle b
RT set of (x ,b) pairs C is the matrix at
the beginning of current cycle
read y iff read-condition(y) holds
forall (x,b) in RT, C(x,y) lt b i.e.,
no transaction that affected y wrote x
after t read x
109Smaller Control Matrix
- Partition objects into groups
- Control matrix n x numgroups
- SC(x,s) max y in s C(x, y)
- Updating an object in s update to any
object in s - Fewer entries to transmit compared to C
group2
group 1
read-condition(y) forall (x , b) in RT
SC(i , s) lt b
T is currently reading y T had read x No tr
that affected any object in y s group
changed x after T read it
110Group Size
- Increasing size of group gt more
unnecessary conflicts - Reducing size of group gt increased
control information overhead. - n groups gt F-Matrix
- one group gt Datacycle
- achieves serializability
- Read-condition for Datacycle
111R-Matrix
- To achieve Mutual Consistency
Read condition objects previously read have
not been updated by other transactions
or the object being read has not been
updated since the beginning of the transaction
112Outline
- Broadcast environments
- Inapplicability of existing techniques
- Suitable correctness criterion
- Mechanisms
- Performance Results
- Conclusion
113Effect of Client Tr. Length
F-Matrix -- has best perf. -- scales very well
114Summary of Results
- F-Matrix gt R-Matrix gt Datacycle
- Weaker abort condition leads to better response
times - F-Matrix is highly scalable with respect to
- Client/Server transaction length
- Server transaction rate
- Number of Objects/Size of Objects
- R-Matrix better only at very small object sizes
- In many cases F-Matrix is very close to
F-Matrix-ideal
115Conclusion
- Need for mutual consistency currency
- Efficient mechanism - F-matrix
- R-matrix is a low overhead alternative
- F-matrix delivers!
- In Paper Caching to exploit weak currency
requirements
116Related Work
- Broadcast Environments
- Datacycle Herman
- supports serializability
- Broadcast Disks Acharya
- consistency not considered
- ProMotion Chrysanthis
- assumes caches
117Examples
- Business data, e.g., Vitria, Tibco
- Election coverage data
- Stock related data
- Traffic information
- Electronic auctions