Fermi Linux Server Vendor Qualification - PowerPoint PPT Presentation

About This Presentation
Title:

Fermi Linux Server Vendor Qualification

Description:

Vendor's direct contact to Fermilab asking to participate. 21 May 2003 ... (Thunder i7501 Pro) and Tyan S2723 (Tiger i7501) had issues with 10/100 ethernet... – PowerPoint PPT presentation

Number of Views:229
Avg rating:3.0/5.0
Slides: 30
Provided by: steve112
Learn more at: https://home.fnal.gov
Category:

less

Transcript and Presenter's Notes

Title: Fermi Linux Server Vendor Qualification


1
Fermi Linux ServerVendor Qualification
  • HEPiX
  • May 21, 2003
  • Steven C. Timm
  • For the Fermi Linux Vendor Qualification Taskforce

2
OUTLINE
  • Fermilab Hardware Procurement Strategy
  • Goals of Qualification
  • Procedures of Qualification
  • Results of Qualification

3
SUMMARY
  • The 2003 Fermi Linux Server Vendor Qualification
    focused on 1U Intel servers.
  • First phase was a technical evaluation which
    identified 18 technically qualified vendors.
  • All these vendors participate in a
    price-performance bidthe top five make the
    vendor list. (Currently ongoing).
  • We remember all technically qualified vendors and
    rotate them in as necessary.
  • We are not making a new qualified desktop vendor
    list at this time
  • Public web page http//www-oss.fnal.gov/scs/publ
    ic/qualify2003

4
Members of Fermi Linux Server Vendor
Qualification Taskforce
  • The taskforce involved personnel from five
    different departments plus key members of
    management. All major purchasers of server
    hardware were represented. Also represented were
    the computer room logistics staff.
  • Members Steven Timm (chair), Margaret Greaney,
    Troy Dawson, Lance Weems, Hans Wenzel, Bruce
    Karrels, Don Holmgren, Phil Lutz, Stan Naymola,
    Mark Kaletka, Gerry Bellendir.

5
Fermi Hardware Procurement Strategy
  • Buy a hardware solution fully integrated as
    possible, including installation
  • Identify vendors that know Fermilab requirements
    and are willing to work with Fermi Linux.
  • Replacement parts via 3 year warranty, service
    provided by Fermilab.

6
Fermi Linux Vendor List--History
  • Two previous Fermi Linux qualifications, 1999 and
    2001.
  • 1999desktops as farm workers, 5 vendors
  • 2001separate vendor lists for desktops and 2U
    rackmount servers
  • Also two special evaluations for 2U rackmounts
    and AMD.
  • Vendor list used in all major Fermi acquisitions,
    1500 machines from 1999-2002.
  • Also used by outside groups KEK, INFN,
    Northwestern, MIT, Geneva, Carnegie Mellon,
    Pittsburgh, Edinburgh, others

7
Evaluation performance/price
  • Overriding goal has been to get the best
    performance possible at the lowest price.
  • We have succeeded wellFrom 1999 to 2002 Fermi
    cycles per dollar increased by a factor of
    6Moores law should have only given us a factor
    of four.
  • Users are happy with quantity of computing that
    they got for their money.
  • But still, in this evaluation, we are looking for
    better long term reliability, not race to the
    bottom for price only.

8
Evaluation Performance/price
  • Problem One node not the best test of long-term
    price/performance by a company.
  • Small businesses best able to take time to follow
    directions of evaluation process and give
    support.
  • Small businesses not always able to deliver large
    orders in timely manner with good initial
    quality.
  • Single node prices not a good predictor of bid
    level on a real bidand we shouldnt be asking
    anyway.
  • Address by getting technical qualification done
    first, then doing a price/performance bid.

9
Evaluation Vendor attrition
  • Some vendors on list have gone out of business
  • Others disqualified for bad performance
  • Others stopped bidding on their own, or bid
    ridiculously high
  • Address by
  • Select vendor list on performance/price basis
    from all those technically qualified.
  • Keeping track of all technically qualified
    vendors, add to list if necessary
  • Supplement list if special hardware (AMD, blades,
    desktop) required.

10
Evaluation Initial quality
  • Problem Going too low on the price curve
    Sometimes vendors bid too low and try to deliver
    poor quality systems
  • Addressed, from the beginning, with tough 30-day
    acceptance test and lemon law
  • In various cases Fermilab has required vendors to
    do swaps on all units of PS, case, motherboard,
    disk drives, and racks.
  • Cost of Fermi labor to resolve the problem less
    than difference between the winning bid and the
    next highest bid.
  • All issues have been resolved through this
    process and the systems have all had productive
    lives.
  • NOWalso address with references and hard numbers
    on initial quality.

11
Evaluation Components
  • Problem Rapidly changing components
  • In commodity market, components change rapidly.
  • From beginning of eval to issuance of purchase
    orderabout six months
  • CPU speeds go up, cases change.
  • Impossible to track for laptop, difficult to
    track for desktop.
  • OK for server market but results in higher heat
    loads and current draws.
  • ADDRESS by thermal specs that are broad enough so
    that if there are problems, vendor still has to
    fix.

12
Goals
  • We want to identify vendors who are best capable
    to deliver rackmounted solutions
  • Competent in Linux
  • Build quality 1U Servers
  • Can integrate into rackmount environment with
    good thermals in a timely and professional manner
  • Have high performance
  • Have good support and troubleshooting

13
Vendor Selection
  • Existing vendors on Fermi Linux list
  • Sales to other Fermi Departments
  • Advertisements at trade shows
  • Survey of other DOE labs at HEPiX
  • Vendors direct contact to Fermilab asking to
    participate.

14
Chronology
  • We made contact with 45 vendors in all.
  • 29 vendors attended Jan 28. info meeting
  • 24 vendors submitted acceptable configuration on
    Feb. 4
  • 21 vendors submitted acceptable benchmarks and
    were cleared to ship unit on Mar. 4all got it
    here by Mar 11.
  • 18 vendors identified as technically qualified

15
Specifications
  • 1U Dual Intel Xeon, 2.4 GHz or faster
  • 400 MHz front side bus or faster
  • 1 GB RAM (RDRAM or DDR SDRAM)
  • Disks 1 ?20Gb system 2 x ?40Gb data
  • 100Mbit Ethernet
  • Video
  • CDROM, Floppy

16
Why just 1U Xeon
  • AMD hardware shows high initial failure rate,
    high current, high heat.
  • 1U is most challenging thermal caseif they can
    build 1U we believe they can build 2U.
  • Intel chips are supposed to be faster than AMD at
    the moment
  • Intel chips supposed to run cooler, draw less
    current.
  • Simplicitya platform we already mostly
    understand, just one from each vendor
  • Spacewe dont have space to put so many 2U.

17
Linux Competence
  • Vendor identifies hardware thats compatible with
    Linux. (Much easier than it used to be).
  • Vendor loads Fermi Linux onto evaluation node
  • Have to configure lm_sensors on the node
  • Runs our supplied test to check and see if they
    did it right.
  • They are only allowed to ship the unit to
    Fermilab if it is right.

18
Electrical
  • Electric current measured with ammeter at
    startup, idle, and full CPU load.
  • Current draw ranges 2.4GHz, 1.6-2.0A, 2.8 GHz,
    2.0-2.3A, 3.06GHz, 2.1-2.35A
  • Likely that with purchase of 2.8 or 3.06GHz
    machines we can only have seven machines per
    circuit, not eight as in the past.
  • Those with higher current draw also tend to have
    more fans and be better internally cooled.
  • Bright sideThis current similar to 750MHz
    machines bought 3 years ago, 2.5x the performance
    for the same current.

19
Thermal
  • Measured ?T from front to back of unit for all.
  • Used internal temperature probes on each unique
    type of case.
  • All units in evaluation much cooler than the 1U
    units bought in FY2002.
  • Due to better thermal characteristics of Intel
    chip and many more added internal fans and
    blowers.
  • Northbridge chipset chips in some machines ran
    hotter than the CPUs. Important to watch size
    of heatsink on these chips.
  • Still analyzing the data we took but confident
    that all units are acceptable.

20
Thermals continued
21
Quality 1U Servers
  • Open each machine to verify quality of
    construction
  • Run burn-in on each machine for two weeks
  • Thermal measurements in real rack situation
  • Electrical current measurements
  • Verify all components meet specs.

22
Integration capabilities contd.
  • Vendors are asked to submit sample proposal for
    full rack of systems
  • Standard Fermi rack configuration is base of
    proposal but they can suggest extras.
  • Goal is to (1) learn if they can integrate and
    (2) get new ideas on how to improve our setup.
  • Also they must submit info on clusters they have
    installed before, with real temperature and
    reliability numbers.

23
Performance
  • Vendors are supplied CD-ROM of CDF and D0
    Benchmark
  • Performance measured in Fermi Cycles where PIII 1
    GHz1000 Fermi Cycles.
  • We repeat test when machine gets here
  • QCD benchmark, seti_at_home, tiny also run.
  • Would be ideal to use SPEC CPU2000but published
    results not repeatable with compilers used by
    Fermi.
  • Price doesnt enter in technical evaluation.

24
Performance
  • 3 CPU speeds measured, 2.4, 2.8, 3.06 GHZ,
  • 1000 FermiCyclesPIII 1 GHz.
  • Average performance, 1779, 2041, 2223 Fermi
    Cycles respectively.
  • 400MHZ vs 533 MHz front side bus is 2.5 effect
    for farms software, much bigger for QCD.
  • AMD MP2200 --1771 Fermi Cycles
  • Performance is projected to faster clock speeds
    in anticipation that some vendors will bid faster
    chips.

25
Support and Troubleshooting
  • Each vendor gets software callrelated to the
    configuration of Fermi Linux, solvable by E-mail
    or phone
  • Each vendor gets hardware calldesigned to
    trigger an on-site service call.
  • We manufacture one if necessary.
  • Points for prompt response, correct response.

26
Conclusions
  • 18 technically qualified vendorsin alphabetical
    order
  • Ace, Angstrom, APPRO, ASA, Aspen, Atipa,
    Concentric, Dell, HP, IBM, Koi, Penguin,
    Promicro, PSSC, Rackable, Racksaver, Richardson,
    Western Scientific
  • Price/performance bid will weed them down to
    five.
  • 21 vendors is too many to bring in, will be more
    discriminating next time.

27
Component issues
  • Boards OK Intel SE7501 series, Supermicro X5DPx
    series, Tyan 2721, Tyan 2723
  • Both Tyan S2721-533 (Thunder i7501 Pro) and Tyan
    S2723 (Tiger i7501) had issues with 10/100
    ethernetresolved by changing resistor value on
    the board
  • Some manufacturers offer cold-swap and hot-swap
    capabilities on drives, very nice.
  • Issues in Intel E7501 chipsetslower disk
    throughput than some earlier chipsets, but
    adequate for our needs.

28
Price/performance bid
  • All vendors who pass our technical requirements
    are participating in a price/performance bid on a
    small number of nodes (48)
  • Top five will be the Fermi Linux Qualified
    Vendors
  • We will keep track of all technically qualified
    vendors to replenish the list if
  • A vendor goes out of business
  • A vendor stops bidding, or bids consistently very
    high on Fermi RFPs
  • A particular RFP requires special
    capacitiesMyrinet, AMD, blade servers, desktop

29
Future Plans
  • Blade server evaluation coming up.
  • Requires change in install philosophyno floppy,
    CDROM, serial console available.
  • Essential to address power and space concerns in
    Feynman and elsewhere.
Write a Comment
User Comments (0)
About PowerShow.com