Reliability and Security in the CoDeeN Content Distribution Network PowerPoint PPT Presentation

presentation player overlay
1 / 36
About This Presentation
Transcript and Presenter's Notes

Title: Reliability and Security in the CoDeeN Content Distribution Network


1
Reliability and Security in theCoDeeN Content
Distribution Network
  • Limin Wang, KyoungSoo Park, Ruoming Pang, Vivek
    Pai, Larry Peterson
  • Princeton University

2
What Is CoDeeN?
  • Academic Content Distribution Network
  • Forward/reverse proxies, redirector
  • 100 proxy servers on PlanetLab
  • Continuous service, decentralized control
  • Deployed for getting real traffic

3
Goals of CoDeeN
  • Provide open content distribution
  • Improve web performance reliability
  • Platform for testing new innovations
  • Particularly in live environments
  • Keep CoDeeN running 24/7
  • Security
  • Reliability

4
How Does CoDeeN Work?
origin
CoDeeN Proxy
Each CoDeeN proxy is a forward proxy, reverse
proxy, redirector
5
By The Numbers
  • Running 24/7 since June 2003
  • Over 870,000 unique IPs as clients
  • Over 500 million requests serviced
  • Valid rates up to 400K reqs/hour
  • Roughly 3-4 million reqs/day aggregate
  • Highest-traffic project on PlanetLab
  • not including PlanetLab Dec 2003 upgrade

6
Types of Security Problems
  • Spammers
  • Bandwidth hogs
  • High request rates
  • Content thieves
  • Worrisome anonymity
  • Commonality using CoDeeN to do things they would
    not do directly

7
The Root of All Trouble
CoDeeN Proxy
http/tcp
http/tcp
origin
(Malicious) Client
8
Spammers
  • SMTP (port 25) tunnels via CONNECT
  • Relay via open mail server
  • Range 100s to 100,000 per day, per node
  • IRC channels (port 6667) via CONNECT
  • Captive audience, high port
  • POST forms (formmail scripts)
  • Exploit website scripts

9
Bandwidth Hogs
  • Webcam trackers
  • Mass downloads of paid cam sites
  • Cross-Pacific traffic
  • Simultaneous large file downloads
  • Steganographers
  • Large files small images
  • All uniform sizes

10
High Request Rates
  • Google crawlers
  • Dictionary crawls baffles Googlians
  • Click counters
  • Defeat ad-supported game
  • Password crackers
  • Attacking random Yahoo! accounts

11
Content Theft
  • Licensed content theft
  • Journals and databases are expensive
  • Intra-domain access
  • Protected pages within the hosting site

12
Worrisome Anonymity
  • Request spreaders
  • Use CoDeeN as a DDoS platform!
  • TCP over HTTP
  • Non-HTTP Port 80
  • Access logging insufficient
  • Vulnerability testing
  • Low rate, triggers IDS

13
Approaches to Security
  • Desired allow only safe accesses
  • No research in partially open proxies
  • Our approach
  • Rate limiting
  • Privilege separation

14
Rate Limiting
Minute
Hour
Day
  • 3 scales capture burstiness
  • Exceptions
  • Login attempts
  • Vulnerability tests
  • Repetition, request spreading

15
Privilege Separation
Site A Proxy
Site A Server
16
Other Techniques
  • Limiting methods GET, (HEAD)
  • Local users not restricted
  • Modifying request stream
  • Most promising future direction
  • Sanity checking on requests
  • Browsers, machines very different

17
Reasons for rejecting requests
18
Reliability in Context
  • Real information hard to get
  • Bearing on future p2p services
  • Non-dedicated nodes
  • Resource competition
  • Reliability more than just churn
  • Decentralized
  • No NOC, no human monitoring

19
Approaches to Reliability
  • Retry/failover
  • Penalty in latency
  • Multiple simultaneous requests
  • Wasting resources
  • Idempotency is not guaranteed
  • /dir/prog/query /dir/prog?query
  • Active monitoring/avoidance
  • Failure duration, monitoring frequency

20
Active Monitoring
  • Local Monitoring
  • Resource availability of this node
  • File descriptors/sockets, system CPU time, DNS
    lookup performance, uptime, load average, free
    disk space
  • Peer Monitoring
  • UDP heartbeat local monitoring data
  • HTTP/TCP wget fetch

21
Monitoring Implications
  • Missed heartbeats
  • Bad link, node down
  • Slow acknowledgements
  • Overloaded node
  • Connect failures
  • Resource exhaustion
  • Selective port filtering
  • Application/OS bugs

22
Node Avoidance Counts/Causes
23
Node Stability
24
DNS Problems
  • DNS Lookup Failure
  • Cacheable DNS name lookup gt 5 secs
  • Local failures
  • Overloading, cron jobs, misconfigurations
  • Generally 10 DNS showing problems
  • Critical in CoDeeNs operation
  • No response from reverse proxy

25
Solutions to DNS Problems
  • Avoiding faulty nodes
  • Map objects in a page to same proxy
  • Reduce DNS lookups
  • Persistent connection
  • CoDNS
  • Middleware to provide reliable DNS service
  • Effectively removes DNS problems

26
Daily Request Volume
x 106
27
Daily Client Population Count
28
Lessons Directions
  • Few substitutes for reality
  • Non-dedicated hardware really interesting
  • Failure modes not present in NS-2
  • Current measures pretty effective
  • Very slow arms race
  • Breathing time for better solutions

29
Future Work
  • Robot detection
  • Abusers are usually robots
  • Machine learning, high dim clustering
  • CoDeploy
  • Efficient large-file distribution service
  • Dynamic split/reassembly of GBs via HTTP
  • CoDNS
  • Faster, more reliable/predictable DNS service
  • Fully operational, used by CoDeeN

30
More Info
  • http//codeen.cs.princeton.edu
  • Thanks
  • Intel, HP, iMimic, PlanetLab Central

31
Effectiveness of Monitoring
  • Instability patterns of nodes important
  • (In)stability duration
  • Stability measure
  • No status change over two intervals
  • Stable time pretty dynamic!
  • Monitoring is effective

32
Monitors Other Venues
  • Routinely trigger open proxy alerts
  • Educating sysadmins, others
  • Really good honeypots
  • 6000 SMTP flows/minute at CMU
  • Spammers do 1M HTTP ops/day
  • Early problem detection
  • Failing PlanetLab nodes
  • Compromised university machines

33
Security Concerns
  • Use a popular protocol
  • HTTP
  • Emulate a popular tool/interface
  • Web proxy servers
  • Allow open access
  • With HTTPs lack of accountability
  • Be more attractive than competition
  • Uptime, bandwidth, anonymity

34
Attempted SMTP Tunnels/Day
35
UDP Heartbeat Lowdown
Liveness ..X.. ..X.. ..... .X.XX ..... ...X.
..... MissAcks 10w00 00001 00000 0w066 00010
000v0 00020 LateAcks 00000 00000 00000 00000
00000 00000 00000 NoFdAcks 00000 00000 00000
00000 00000 00000 00000 VersProb 00000 00000
00000 00000 00000 00000 00000 MaxLoads 41022
11111 11141 20344 11514 14204 11111 SysMxCPU
81011 11111 11151 10656 11615 15564 11111
WgetProx 00w00 00100 00010 0w110 00000 000s0
00010 WgetTarg 11w11 10301 01021 1w220 00111
101t0 11121
X00001113
36
Challenges in Deployment
  • Security
  • HTTP Popular protocol
  • Open to public access
  • Provides a level of indirection for abusers
  • Reliability
  • Non-dedicated resources
  • CoDeeN depends on reverse proxy
  • Unreliability leads to service interruption
Write a Comment
User Comments (0)
About PowerShow.com