CS267 Hints and Tricks - PowerPoint PPT Presentation

1 / 12
About This Presentation
Title:

CS267 Hints and Tricks

Description:

Millennium Central Cluster. 99 Dell 2300/6350/6450 Xeon Dual/Quad: 332 ... Millennium Cluster. Mm1 through mm34 have dual processors each with GB of memory. ... – PowerPoint PPT presentation

Number of Views:36
Avg rating:3.0/5.0
Slides: 13
Provided by: Ner46
Category:

less

Transcript and Presenter's Notes

Title: CS267 Hints and Tricks


1
CS267 Hints and Tricks
  • This presentation will probably involve audience
    discussion, which will create action items. Use
    PowerPoint to keep track of these action items
    during your presentation
  • In Slide Show, click on the right mouse button
  • Select Meeting Minder
  • Select the Action Items tab
  • Type in action items as they come up
  • Click OK to dismiss this box
  • This will automatically create an Action Item
    slide at the end of your presentation with your
    points entered.

David Culler culler_at_cs.berkeley.edu Eric Fraser
fraser_at_cs.berkeley.edu Albert Goto
goto_at_cs.berkeley.edu Matt Massie
massie_at_cs.berkeley.edu Pete Sakosky
sakosky_at_cs.berkeley.edu
2
Cluster Counts
  • Network Of Workstations (NOW) HP, Sparc,
    UltraSparc clusters retired.
  • Millennium Central Cluster
  • 99 Dell 2300/6350/6450 Xeon Dual/Quad 332
    processors
  • Total 211GB memory, 3TB disk
  • Myrinet 2000 1000Mb fiber ethernet
  • OceanStore/ROC cluster, Astro cluster, Math
    cluster, Cory cluster, more
  • CITRIS Cluster 1 3/2002 deployment (Intel
    Donation)
  • 4 Dell Precision 730 Itanium Duals 8 processors
  • Total 20 GB memory, 128GB disk
  • Myrinet 2000 1000Mb copper ethernet
  • CITRIS Cluster 2 2002-2003 deployment (Intel
    Donation)
  • 128 Dell McKinley class Duals 256 processors
  • Total 512GB memory, 8TB disk
  • Myrinet 2000 1000Mb copper ethernet

3
Current Gigabit Network
4
Frontend machines
  • Millennium Cluster
  • Napa.millennium.berkeley.edu
  • Sonoma.millennium.berkeley.edu
  • CITRIS pilot Cluster
  • Lime.millennium.berkeley.edu
  • SSH
  • from .berkeley.edu only. Let us know if there
    is somewhere else you need access from.
  • Cant ssh out from here only in.
  • Job execution
  • Dont run local jobs on frontends.
  • Use gexec and mpirun instead.

5
Cluster Nodes
  • Millennium Cluster
  • Mm1 through mm34 have dual processors each with ½
    GB of memory.
  • Mm35 through mm98 have quad processors each with
    2 or 4 GB of memory.
  • Gb Ethernet and myrinet 2000 to all machines.
  • CITRIS pilot Cluster
  • Lime, lemon, orange, and grapefruit, each dual
    processor itaniums with 5GB of memory.
  • Gb Ethernet, no myrinet (yet)
  • High security restrictions on port access outside
    of cluster, but open within clusters.

6
How to be a good citizen
  • 800 users total on central cluster, 75 major
    users for 2/2002 average 65 total CPU
    utilization
  • Jobs are interactive not batch scheduled
  • Resources are limited
  • So, users need to play fair
  • Run small jobs first before testing large jobs
  • Look at the state of the cluster before running
    anything.
  • Machines with load higher than the number of
    processors are probably overloaded.

7
filesystems
  • Avoid using your home directory for any cluster
    I/O.
  • Use /work for all job staging.
  • Shared
  • ¼ TB
  • 9 Day deletion policy, not for storage!
  • Use local /scratch spaces for any big I/O needs.
  • Not shared
  • 9-18GB per node
  • 4 Day deletion policy, not for storage!

8
Data copying tool
  • Ky tool can be used for moving data to/from
    /scratch
  • Tree copy, I tell 2 friends, and so on, and so
    on.
  • E.g. ky /work/user/mydata/ mm1 mm2 mm3 will
    copy all data inside of /work/user/mydata/ to
    /scratch/user/mydata on mm1, mm2, mm3,
  • Can be used to bring data back from
    /scratch/user in a random sleep mode.
  • Also another tool called pcp which does a
    similar thing.

9
Cluster Status
  • Ganglia Cluster Monitoring Environment
  • Gstat displays available machines ordered by
    relative availability (what will be chosen by
    gexec)
  • GUI based status link from picture off
    http//www.millennium.berkeley.edu/

10
Gexec
  • Gexec (rexec obsolete)
  • http//ganglia.sourceforge.net/docs/
  • E.g. gexec n 10 myprogram arg1 arg2
  • Can specify explicit nodes to circumvent load
    balancing.
  • Virtual node numbers available as ENV variable.

11
MPI
  • Use mpirun
  • Two versions available
  • Gbit ethernet P4 version
  • Myrinet GM version
  • http//www.millennium.berkeley.edu/mpi/
  • E.g. mpirun np 10 ./myprogram arg1 arg2
  • Uses gexec for remote execution.

12
To Do list
  • Integrate Globus toolkit with cluster toolkit
  • Batch scheduler
  • Web-based front-end for job submission/status/outp
    ut aka Hotpage from SDSC
  • Itanium cluster benchmarking
  • Parallel cluster filesystem!!
Write a Comment
User Comments (0)
About PowerShow.com