Title: Enhance Features and Performance of Content Switches
1Enhance Features and Performance of Content
Switches
- Chandra Prakash
- Department of Computer ScienceUniversity of
Colorado at Colorado Springs
2Outline of the Talk
- Introduction
- Existing Content Switch related
products/techniques - Basic Architecture of Content Switch (CS)
- TCP Delayed Binding and proposed improvements
- Performance results of various schemes for
improving TCP Delayed Binding - Handle multiple requests in HTTP Keep-Alive
connection - Enhancements to CS
- Handling muitlple packets of a HTTP request
- Handling different data encoding formats
- Improved XML rule matching
- High-Availability of Linux CS Cluster
- Conclusion
- Future Work
3Content Switch-based Cluster
RIP1
Real Server1
RIP2
WAN/LAN
Internet
VIP
RIP3
CIP
Virtual Server/ Content Switch
Real Server3
Client
CIP Client IP Address VIP Virtual IP
Address RIP Real Server IP Address
4Existing Content Switch Related Products
- Alteon's series (A180e/A184) products support
URL-based Server load balancing - F5's Big-IP product supports load balancing and
contents switching - Foundry network's ServerIron product supports
URL, Cookie, and SSL Session ID-based switching - Intels XML accelerator products can distribute
Web load based on XML tag values
5Existing Content Switch Related Techniques
- MAC address translation (MAT)
- MAC multicast
- Half network address translation (HNAT), also
known as NAT in Linux Virtual Server Project
(http//www.linuxvirtualserver.org) - Full network address translation (FNAT)
- IP tunneling
6MAC Address Translation
server to client traffic
client traffic
real server 1 ethernet IP128.198.192.1 loopback
IP128.198.192.182
LAN
client traffic
real server 2 ethernet IP128.198.192.2 loopback
IP128.198.192.182
virtual server VIP128.198.192.182
real server 3 ethernet IP128.198.192.3 loopback
IP128.198.192.182
7MAC Multicast
server to client traffic
client traffic
client traffic
real server 1 IP1
LAN
client traffic
real server 2 IP2
switch
client traffic
real server 3 IP3
MAC Multicast group
8Half NAT
client traffic
server to client traffic
client traffic
real server 1 ethernet IP128.198.192.1 Default
GW128.198.192.182
server to client traffic
real server 2 ethernet IP128.198.192.2 Default
GW 128.198.192.182
virtual server VIP128.198.192.182
LAN
real server 3 ethernet IP128.198.192.3 Default
GW 128.198.192.182
9Full NAT
server to client traffic
server to client traffic
client traffic
real server 1 IP1
client traffic
real server 2 IP2
virtual server VIP
real server 3 IP3
10IP Tunneling
server to client traffic
client traffic
real server 1 IP1
client traffic
real server 2 IP2
virtual server VIP
packet destined for real server
encapsulation at virtual server
RIP
client packet
real server 3 IP3
decapsulation at real server
11Basic Operations of Content Switching
CS Content Switching
CS RuleEditor
CS Rules
Incoming Packets
Packet Classification
Header ContentExtraction
CS Rule Matching Algorithm
Packet Routing(Load Balancing)
Forward Packet To Servers
Network Path Info
Server Load Status
Load Balancing Repository
12TCP Delayed Binding(Basic Scheme)
client
server
content switch
SYN(CSEQ)
step1
SYN(DSEQ)
step2
ACK(CSEQ1)
step4
SYN(CSEQ)
step5
SYN(SSEQ)
step6
ACK(CSEQ1)
step7
ACK(SSEQ1)
step8
DATA(CSEQ1)
ACK(SSEQ1)
DATA(SSEQ1)
DATA(DSEQ1)
step9
ACK(CSEQlenR1)
ACK(CSEQLenR1)
step10
ACK(DSEQ
lenD1)
ACK(SSEQlenD1)
lenR size of http request.
.
lenD size of return document
13Pre-Allocate Scheme if Guess is Correct
Pre-allocated server
client
content switch
SYN(CSEQ)
SYN(CSEQ)
step1
SYN(SSEQ)
SYN(SSEQ)
step2
ACK(CSEQ1)
ACK(CSEQ1)
step4
DATA(CSEQ1)
DATA(CSEQ1)
- Guess routing decision based on IP/Port/History
- Advantage
- Faster than TCP delay binding.
- Possible direct route between client and server
- Reduce session processing overhead no need to
convert server sequence
.
14Pre-allocate Scheme if Guess is Wrong
Pre-allocated server
client
content switch
SYN(CSEQ)
SYN(CSEQ)
step1
SYN(SSEQ)/ ACK(CSEQ1)
step2
SYN(SSEQ)/ ACK(CSEQ1)
step4
DATA(CSEQ1)/ACK(SSEQ1)
DATA(CSEQ1)/ ACK(SSEQ1)
Server sent HTTP 404
RST
step6
Right server
step7
SYN(CSEQ)
SYN(RSEQ)/ ACK(CSEQ1)
step8
Sequence conversion needed for right server now
ACK(RSEQ1)
step9
step10
DATA(SSEQ1)/ACK(CSEQLenR1)
DATA(RSEQ1)/ACK(CSEQlenR1)
step11
step12
ACK(SSEQlenD1
ACK(RSEQlenD1)
15Filter Process Scheme
Filter Processrun on server
client
server
content switch
SYN(CSEQ)
step1
step3
DATA(CSEQ1)/ACK(DSEQ1)
step4
step5b
SYN(CSEQ)
Migrate(Data, CSEQ, DSEQ)
step5
a
SYN(SSEQ)/ ACK(CSEQ1)
step6
step7
DATA(CSEQ1)/ACK(SSEQ1)
step8
ACK(DSEQ
lenD1)
ACK(SSEQlenD1)
step10
16Performance Metrics
- Processing time vs document size for GET request
- Processing time vs document size for POST request
- Obtained results for individual schemes using
Webbench by varying the delay and number of
threads sending request - Plot of max sustainable requests/sec vs. number
of rules - Plot of max throughput in bytes/sec vs. number of
rules
17Benchmark Configuration
- fladnag.uccs.edu - content switch -
128.198.192.184 - Linux 2.2-16-3 - vinci.uccs.edu - real server 1 -
128.198.192.193 - Linux 2.2-16-3 - gandalf.uccs.edu - real server 2 -
128.198.192.194 - Linux 2.2-16-3 - dilbert.uccs.edu - client -
128.198.192.195 - Windows NT 4.0 - For plot of processing time vs. document size.
- used a Perl script that sends GET and POST
requests - with varying request and response sizes.
- For response time and throughput measurement.
- used Webbench
18Processing Time vs. Response Size for GET Request
19Processing Time vs. Request Size for POST Request
20Comparison of Overall Webbench Requests/Second
Metric
21Comparison of Overall Webbench Throughput
(Bytes/Second) Metric
22Request/Sec vs. Number of CS Rules
23Throughput in Bytes/Sec vs. Number of CS Rules
24Handling Multiple Requestsin a Keep-Alive
Connection
- Determine when new request arrives
- Verify that previous request has been completely
received - TCP payload size is gt 0
- Key assumption is only one outstanding request is
sent at a time by client, i.e., requests are not
pipelined. - Reuse connections
- Store each connection control information in a
hash table keyed by real server address, once it
is established.
25Keep-Alive Connection Hash Table
- For each real server, hash table entry stores
following parameters - rs_addr
- cli_str_seq
- cli_str_ack_seq
- rs_last_next_seq
- rs_last_ack_seq
26Client to Real Server Sequence Translation in a
Keep-Alive Connection
- Sequence number of packet sent by client and
forwarded to current real server - rs_last_ack_seq (cli_cur_seq - cli_str_seq)
- Here cli_cur_seq is the sequence number of
currently forwarded client packet. - Acknowledgment number of packet sent by client
and forwarded to current real server - rs_last_next_seq (cli_cur_ack_seq -
cli_str_ack_seq)
- Here cli_cur_ack_seq is the acknowledgment
number of currently forwarded - client packet.
27Real Server to Client Sequence Translation in a
Keep-Alive Connection
- Sequence number of packet sent by real server and
forwarded to client - cli_str_ack_seq (rs_cur_seq -
rs_last_next_seq) - Here rs_cur_seq is the sequence number of
currently forwarded real server packet. - Acknowledgment number of packet sent by real
server and forwarded to client - cli_str_seq (rs_cur_ack_seq -
rs_last_ack_seq)
- Here rs_cur_ack_seq is the acknowledgment
number of currently forwarded - real server packet
28Handling Multiple Packets of a HTTP Request
- Request may span over multiple TCP segments which
requires queuing of incoming packets - Determine when request is completely received
which requires parsing of HTTP header content,
e.g., Content-Length tag in requests like PUT
and POST - Keeping in sync with client and server TCP
- HTTP request fragmentation example, where TCP
Segment n contains - POST /cgi-bin/cs622/purchase.pl HTTP/1.0\r\n
- Referer http//archie.uccs.edu/acsd/lcs/xmldemo.
html\r\n - Connection Keep-Alive\r\n
- Content-type application/x-www-form-urlencoded\r\
n - Content-length 7
- and TCP Segment n1 contains
- 53\r\n
- data (753 bytes)
29Handling Different Data Encodings in XML Document
- Typically there are two encoding techniques
- text/xml
- consist of plain ascii text with no specical
encoding - x-www-form-urlencoded
- Consist of text where special characters are
encoded as XX, where XX is the hexadecimal
value of the special character. - For example, newline and left anchor (lt)
characters are encoded as "0A" and "3C
respectively.
30Improved Rule Specification
- ltpurchasegt
- ltcustomerIDgt111222333lt/customerIDgt
- ltitemgt
- ltproductIDgt309121544lt/productIDgt
- ltunitPricegt5000lt/unitPricegt
- lt/itemgt
- ltitemgt
- ltproductIDgt309121538lt/productIDgt
- ltunitPricegt200lt/unitPricegt
- lt/itemgt
- lt/purchasegt
- Many tags with the same name make rule
specification ambiguous, e.g, the item tag in
above XML sample document - A rule specification like purchase1item2uniP
rice1 gt 200 allows to access unitPrice tag
of the second item
31High-Availability of Linux Content Switch (HA-LCS)
- Address issues related to fault tolerance of
- Virtual server
- real servers or services on the real server
- high-availability of data files (e.g. HTML docs)
- The setup is based on existing configuration of
high-availability of LVS with with following key
software components - Heartbeat (for fault tolerance of virtual server)
- Mon (for fault tolerance of services of real
server) - Coda (for high-availability of data files)
32HA-LCS Architecture
user
real server 1
mon
heartbeat
Coda file system
primary
real server 2
mon
heartbeat
backup
virtual server cluster
LAN
real server 3
33HA-LCS Configuration
- fladnag.uccs.edu - content switch (primary) -
128.198.192.184 - Linux 2.2.16-3 - walden.uccs.edu - content switch(secondary) -
128.198.192.203 - Linux 2.2.16-3 - vinci.uccs.edu - real server 1 (coda client)
- 128.198.192.193 - Linux 2.4.2-2 - gandalf.uccs.edu - real server 2 (coda client) -
128.198.192.194 - Linux 2.4.2-2 - wait.uccs.edu - coda server -
128.198.192.202 - Linux 2.4.2-2
34Unique Constraints Imposed in HA-LCS as Compared
to HA-LVS
- In LCS, switching rules based on application
content are hard wired in kernel rule module. To
change a switching rule requires - modify rule module code to reflect changed rule
- compile modified rule module
- remove old rule module
- insert new old module
- In LVS, switching rules based a simple load
balancing policy and can be changed via built in
commands
35HA-LCS Configuration Notes
- Heartbeat
- Setting up ha.cf and haresources file
- Note The IP address specified in the
haresources file should not be configured via
OS - Mon
- Setting up mon.cf and real server
failure/startup handler script wk_up.ksh - Coda
- Setting up coda server using vice-setup and
configuring client - When creating coda volume on server the
INSTALL.linux setup file says create volume as - createvol_rep codaroot E0000100 /vicepa, where
codaroot is root volume - On the other hand, online coda help says root
volume be set as coda.root. - While creating coda admin, do not specify user id
as 1, even though instructions say one can use
coda admin user id as 1.
36Performance of Different File Systems with HA-LCS
37Conclusion
- Implemented three schemes to study improvement on
TCP delayed binding. Pre-allocate scheme gave
best results followed by basic and filter scheme. - Implemented scheme to handle multiple requests in
a given connection coming in a non-pipelined
fashion. - Proposed ways to handle request sent in a
pipelined fashion in a Content Switch - Addressed issues related to content switch
processing - handling multiple packets in a request,
- improving rule matching,
- handling different data encoding formats.
- Implemented a highly-available Linux Content
Switch (LCS) system - Identified key issues related to fault tolerance
in LCS and implemented the solutions.
38Future Work
- Improve the reliability of LCS by moving the
content switch processing from IP layer to
Transport layer. - Enhance load balancing policies by considering
network path status and server load. - Improve performance by reusing connections in a
connection pool to avoid setup overhead - Utilize latest protocols, e.g., ASAP/ENRP, for
managing fault-tolerant clusters.