Title: Towards High-Availability for IP Telephony using Virtual Machines
1Towards High-Availability for IP Telephony using
Virtual Machines
- Devdutt Patnaik, Ashish Bijlani and Vishal K Singh
2Outline
- Virtualization
- High Availability (HA) in Virtualized Platforms
- XEN and REMUS (HA solution for XEN)
- Remus applied to IP Telephony (IPT) applications
- Scalability and Reliability of IPT applications
using Virtualization - Experimental Results
- Conclusion
3Virtualization and its Benefit
- Abstraction layer (Hypervisor) between the
physical hardware and the OS. - Single physical machine can host multiple virtual
machines each running a different OS
application stack - VMMs
- Xen, VMWare, Microsoft HyperV
- Benefits
- Server consolidation
- Green computing
- Cost savings space and power
- High Availability
- Reliability solutions, ease of upgrades with near
zero down-times
4Virtualized hosting for IP Telephony
- Virtualized hosting for IP Telephony already
available - Avaya, Cisco, Asterix etc.
- IP Telephony in Cloud
- Scalability ability to elastically add/remove
additional servers while supporting
High-Availability for all servers - Reliability protection against hardware and
software failures - HA features in virtualization platforms
- Memory state check pointing
5Virtualization and High Availability
- Seamless fail-over, Efficient and transparent
migration of VM to another physical machine - Live Migration with very small down-times
- Minimal or no impact to client nodes
- Asynchronous check-pointing
- Continuously syncs the state between the primary
and secondary host - We use
- Remus A High Availability Solution for XEN
6Remus on XEN
- Remus is a High Availability solution available
on the Xen VMM - Remus uses continuous check-pointing and keeps a
consistent client view of network state - The secondary machine hosts a paused replica of
the primary VM - Uses a heart-beat mechanism
- Failure to receive periodic heart-beat on
secondary will un-pause the backup VM - Heart beat time-out can be configured
Fig 1
Image http//osnet.cs.nchu.edu.tw/powpoint/semina
r/2008/Remus.pdf
7Remus on XEN (contd.)
- Remus modes of operation
- Net Mode Highly reliable
- No-Net Mode better performance with negligible
packet loss in case of failure - Tunable for Reliability vs. Performance
Disk writes and Network Writes
Fig. 2
- Net Mode Buffers outgoing network packets until
execution state is synced with the back up VM (on
secondary host). - reliability at cost of performance
Image http//osnet.cs.nchu.edu.tw/powpoint/semina
r/2008/Remus.pdf
8Remus applied to IP Telephony- Scale with
Reliability
- Our work using HA in XEN extends architecture
for fail-over and load sharing for IP Telephony
proposed by Kundan Singh et. al. - Challenges
- Overheads of virtualization on IP Telephony
performance - Co-Hosted/Co-located media server causes
interference because of heavy I/O workload
9Reliability and Scalability using Virtual Machines
- Scalability using load balancer (LB)
- LB can elastically add more VMs as demand grows
- Reliability using Remus in XEN
Stateless Load balancer
Reliability Architecture using Virtual machines
- For every primary Virtual Machine there is a
back up VM in paused state. - Since, backup VM is paused, it allows to place
other running VMs on the same physical machine - Provides N to M elastic/backup model (m back up
for n primary)
10Reliability and Scalability using Virtual
Machines (contd.)
- Reliability
- Provided by Xen Remus
- Failure of primary starts the execution of the
secondary with IP address takeover - Clients continue to execute un-affected
- Signaling and Media Server
- Co-located on same VM
- allows better utilization,
- no overhead of inter-vm communication
- Placed on different VM
- elastic scaling of media and signaling VMs
11Studying Performance Implications
- Experimental setup
- Primary /Backup Servers
- Intel Core 2 Quad Processors, 2.5 Ghz, 8 GB RAM,
4MB L2 Cache - Hypervisor Xen 3.2.1 Remus
- Default Credit Scheduler configuration
- Guest OS Para Virtualized Linux 2.6.18
- IP Telephony Workload
- Modeled our workload using SIPStone
- Measured success of registrations during
failover - Used UDP and TCP as transport for registrations
- Used OpenSIPs as SIP server
- RTPProxy as Media Server
- SIPp for generating signaling and media traffic
12Analysis and Results Signaling
- Guest VM and Domain 0 both have high CPU
utilization with tcp_n (new tcp connection for
each REGISTER) - UDP and tcp_1 (1 tcp connection for all
REGISTER) have similar overhead.
CPU utilization (in guest VM, dom0) Udp means
with udp transport, tcp_1 means same connection
for all call, tcp_n means new connection for
each call
With Remus NET mode, Registration overhead.
13Analysis and Results Signaling
- CPU overhead increases with proportionately with
signaling loads - Dom0 has significant overheads due to
check-pointing overheads. - Net Mode gives good results for Signaling
- With 1400 regs/sec failure was induced
- with 100 completion of all by failover to the
back up
14Analysis and Results Media
- Media loads with Net Mode gives poor results
- Media with No-Net gives good performance even
with 400 streams with 2 losses - This can be further reduced by tweaking scheduler
parameters - 100 fail-over of all calls in progress during
media experiments
Net Mode 100, 200, 400, 600 and 800 streams
No Net Mode 100, 200, 400, 600 and 800
streams
15Conclusion
- Using No-Net mode for media streams gives us a
balance between performance(loss and delay) and
reliability(failover) while still being able to
migrate 100 of all calls in progress (using TCP)
which is a significant result - Net Mode for Signaling is a good configuration
with 100 registration completion with failover - No-Net mode for the Media server deployment
provides significant improvement in performance
loss and delay reduces significantly - While the No-Net configuration performs better
for media, it may not provide call completion
guarantees during the fail-over operation for
signaling - Migration of user registration and call setup
operations was 100 successful
16Contributions
- Extended load sharing and failover architecture
using Virtualization - Proposed use of high availability feature in
virtualized platforms to achieve reliability in
IP Telephony - Proposed placement scheme of signaling and media
applications for scale(elasticity) and efficiency
(utilization) - Systematic evaluation of overheads involved in
use of virtualization for IP Telephony
Applications - Demonstrated that High Availability using Virtual
Machines can be deployed for medium scale IP
Telephony infrastructure
17Future Work
- More detailed analysis of overheads
- Overhead because of check pointing in
virtualization platform - Overhead because of I/O in Domain 0
- Propose solutions to improve performance
- Improve I/O handing in XEN VMM
- Propose better VM placement algorithm for IP
Telephony applications - Utilizing fine grained overhead measurements for
resource allocation - Considering I/O (media) vs. memory (signaling
state replication) optimizations - Elasticity with co-location of media and
signaling server on same VM
18Questions