New Technologies in the JANET Web Cache Service Martin Hamilton - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

New Technologies in the JANET Web Cache Service Martin Hamilton

Description:

Awarded by competitive tender to Loughborough University Computing Services and ... sms_client - http://www.styx.demon.co.uk/ WAP emulator - http://www.gelon.net ... – PowerPoint PPT presentation

Number of Views:67
Avg rating:3.0/5.0
Slides: 27
Provided by: martinh6
Category:

less

Transcript and Presenter's Notes

Title: New Technologies in the JANET Web Cache Service Martin Hamilton


1
New Technologies in the JANET Web Cache
ServiceMartin HamiltonGeorge
Neisserhttp//wwwcache.ja.net/support_at_wwwcache.
ja.net
2
What is the JANET Web Cache Service ?
  • National caching service for the UK education and
    research community.
  • Funded by JISC.
  • Awarded by competitive tender to Loughborough
    University Computing Services and Manchester
    Computing.
  • Largest "site" on JANET.
  • 155 Megabits/second aggregate traffic.
  • 70-80 million transactions/day.
  • 700-800 Gigabytes transferred/day.
  • Being used by some 170 institutions.

3
What is Web Caching ?
  • Caches keep copies of popular Internet content.
  • First site to fetch a URL causes it to be cached.
  • Subsequent visits get the cached copy.
  • Exceptions for things like secure (SSL) content,
    cookies, and dynamic content (CGI).
  • Web caching seen as essential by most ISPs and
    large Internet sites.
  • Caches can also be used for content filtering -
    e.g. legal requirement for FE sites.

4
Service configuration
  • Cache machines (34 of these) are typically
    Pentium II or III processor, 512 Megabytes
    memory, dual 100 Megabit/second Ethernet, and 6
    or 12 Ultra2 SCSI disks for cached objects.
  • Small number (currently 3) of load balancers to
    distribute requests between caches.
  • Caches and load balancers all running Linux and
    the Squid Web Cache server.
  • Some 1.5TB of pooled cache disk.

5
New technologies covered today...
  • Automation of service monitoring and
    availability.
  • Automating operations, so that a small number of
    people can run a huge service.
  • "Glue" needed to link monitoring and management
    tools with email/paging/WAP.
  • Incident and change logging/reporting.
  • Management of machines at remote sites.
  • Identify useful info for other service operators.

6
Problems encountered
  • A server goes down, e.g. crashes or locks up.
  • The service (e.g. Squid cache) goes down, but the
    server is still up.
  • The machine or service is slow/overloaded.
  • Time taken for machines to recover after a crash
    - Unix fsck process.
  • Knowing who changed what, and when.
  • Capturing long terms stats for profiling.

7
Problem Machine goes down
  • Spotting the problem - can get away with using
    ping for this. Many other tools available to
    automate this basic testing.
  • Fixing may require local action (e.g. push the
    reset button), but most Unix systems support
    serial console access. Linux also has serial
    access to the LILO boot loader.
  • Serial console useful for remotely managed kit,
    and also remote (off-site) access to local kit in
    an emergency.

8
Solution Linux Virtual Server
9
Linux Virtual Server explained
  • Layer 4 switch in software. High service
    availability through redundancy.
  • Load balances traffic across multiple "real
    servers" using a virtual IP address per server
    weightings.
  • Real server death only affects current users -
    traffic routes around dead servers.
  • Now fully deployed on the JANET caches.
  • Useful for other services too, e.g. Websites.
    Note that e-mail and DNS have automatic fallback
    already.

10
Problem Service goes down
  • e.g. Squid dies when disks fill up.
  • Older Squids used to lose track of disk
    consumption and fill disks up after a time.
  • Can spot if Squid is running OK by SNMP.
  • LVS monitor uses SNMP for service upness and
    performance check.
  • What constitutes your service? Can you measure
    its availability automatically?

11
Problem Overloading
  • Performance metrics available via SNMP already,
    plus addons like df and top.
  • Can also try to use the service, e.g. fetch via
    proxy HTTP and measure performance.
  • Fetch a test URL via each cache at intervals.
  • Consider what you want to do with the info, e.g.
    tune LVS weightings, make case to management for
    more funding -)

12
Solution SNMP network monitoring
13
Solution SNMP service monitoring
14
Problem Filestore check (fsck)
  • Bugbear of traditional Unix systems.
  • After a crash, 6 x 9GB disks can take over half
    an hour to check -(
  • Possible solution - trialling Linux journalling
    filesystem ReiserFS, which is also a lot faster
    than the conventional ext2 filesystem.
  • Generally useful for server and workstation
    applications. Can be a work-around for other
    problems, e.g. recovery of remote systems much
    less painful after a crash.

15
Tracking changes - manually
  • Web form - who, what, when?
  • Search/browse interface for analysis/reporting.
  • Only requires Unix, HTTP server, Perl.
  • Nightly summary mailshot for management.
  • Also being used by EMMAN and several groups at
    L'boro.
  • Easier to use than paper record and more readily
    available. Structure allows for sensible queries.

16
Solution Change logging system
17
Tracking changes - automatically
  • Mail from service monitoring script.
  • Urgent warnings (e.g. machine down) gatewayed to
    cellphone using sms_client modem.
  • LVS monitor logs incidents with timestamp,
    machine name, and type of problem.
  • Mobile phone (SMS) message size very limited.
    Must be careful not to send too many messages,
    and to provide positive feedback - i.e. that the
    service/machine recovered.

18
Long term stats
  • Daily log file analysis overnight (Calamaris,
    squidtimes, squidclients our own code).
  • Log file summaries - possible to usefully
    summarise 1GB down to 5MB!
  • Dynamic monitoring of Ethernet traffic levels and
    Squid performance metrics via SNMP and
    MRTG/rrdtool. Stats can hang around forever.
  • 30GB disk 200! Figure out what to monitor and
    keep historical stats. You won't regret it.

19
WAP - Tomorrow Today -)
  • Phones buggy - easy to crash, which can require a
    trip to the service centre.
  • Different vendors support different features,
    e.g. Nokia doesn't do tables.
  • Screens far too small for detailed info.
  • Space on "cards" very limited on some phones,
    e.g. Nokia is 1397 characters.
  • But... very easy to create content for!

20
WAP example - LVS stats
  • 1.1//EN" "http//www.wapforum.org/DTD/wml_1.1.xml"
  • Wed May 31 192502 2000
  • babylonnchor
  • kair/
  • wilburhor
  • ... more cards ...


21
WAP in practice - 1
22
WAP in practice - 2
23
WAP redux
  • Phones use Wireless Markup Language (WML) instead
    of HTML. WML is very simple by comparison.
  • One line tweak to Web server config required for
    serving WML documents.
  • Easy to create WML automatically from monitoring
    scripts.
  • Watch out for bugs and incompatibilities! Use
    Internet emulators to save on phone bill.

24
Current Future developments
  • Two way WAP control for common jobs, e.g. restart
    Squid, take a faulty disk out of service, reboot
    a machine.
  • Failover of load balancers, so cluster survives
    death of primary load balancer.
  • Mirror service integration, so that caches
    automatically find mirrored resources - e.g. from
    the UK Mirror Service.
  • "Cluster digests", to give sites an accurate
    impression of JANET cache hit rates.

25
Closing thoughts...
  • Much of this technology is truly new - didn't
    exist in 1997 when we started the JANET Web Cache
    Service.
  • Perl and cron used extensively to glue other
    tools togther.
  • Most of the software used existed already, so it
    wasn't necessary to develop it from scratch.
  • Don't be afraid to lead from the front - JANET
    cache team members have been very active in Web
    caching development internationally.

26
Useful links
  • LVS - http//www.linuxvirtualserver.org/
  • MRTG - http//www.mrtg.org/
  • Perl - http//www.perl.org/
  • ReiserFS - http//devlinux.com/namesys/
  • L'boro change logging system - http//lanlord.lbor
    o.ac.uk/martin/change/
  • sms_client - http//www.styx.demon.co.uk/
  • WAP emulator - http//www.gelon.net/
Write a Comment
User Comments (0)
About PowerShow.com