Title: Roberto Pereira, Miguel Erazo
1PRIME/GreenLight project Progress Report
- Roberto Pereira, Miguel Erazo
- Florida International University
December 2009
2Outline
- Motivation and Objectives
- PRIME overview
- Installation
- Methodology
- Future work
3Motivation and Objectives
4Motivation
- The information technology industry consumes as
much energy and has roughly the same carbon
footprint as the airline industry - Every dollar spent on power for IT equipment
requires that another dollar be spent on cooling
5Objectives
- Provide the scientific community useful
guidelines regarding the energy consumption of
distributed simulations/emulations of network
models - Develop a large-scale Grid application
performance evaluation platform based on PRIME
6PRIME overview
7The PRIME network simulator
- Simulator /Emulator of computer networks based
on the SSF specification
- Able to simulate from tens of thousand to
millions of nodes
- Emulation is supported via OpenVPN
- Distributed simulation/emulation supported
through MPI
8The PRIME network simulator
Network model
Emulation infrastructure
Distributed simulation
9A specific deployment
The network model topology, traffic, and
applications
Define alignments, partition the network and map
to physical machines
10Installation
11Platform
- PRIME installed in Lincoln, Abe and QueenBee in
Teragrid - Simple network models run using PBS scheduler
- A number of useful tools were used and tested,
i.e. Perfsuite
12Perfsuite
- Collection of tools, utilities, and libraries for
software performance analysis - Uses the Performance Application Programming
Interface (PAPI) - Installed in Abe and QueenBee
13Utilities
- psrun is used to gather hardware performance
information - psprocess is used to post-process the results of
a performance analysis experiment
14Methodology
15The approach
- Measure the time that an application, i.e. PRIME,
uses each computing resource and then derive the
energy consumption by extracting from the
specifications the power signature of each these
resources
16CPU
- We use Perfsuite for measuring CPU time
- We consider two states for the CPU
17Memory
Basic block diagram of a CPU
CPU
18Memory
- When There is a cache miss 2 things happen
- 1 )The data requested by the CPU is fetched.
- 2) There is also a pre-fetch.
19Memory
- If data/instructions are not found in caches, the
main memory is accessed. - The PAPI event PAPI_PRF_DM (Data prefetch cache
misses) is not available in the infrastructure
provided by Abe in Teragrid - We compute the memory time taking into account
the number of accesses due to L2 cache misses
only
20Memory
- We will be Using Synchronous DDR2 DRAM at 667MHz
with internal array cells of 8 bits.
21Memory
- Second generation of DDR, improvement in bus
width.
22Memory
- Array cells of 8 bits.
- Dual Data Rate, transmits twice per cycle.
- Second generation, bus width of 4.
- Data per access (bits) (bus width) (clock
multiplier). - 64 bits in our case.
23Memory
3
2
5
4
1
1) The correct row is activated.
2) Delay between row activation and column
activation (tRCD).
3) The correct column is activated.
4) The data is retrieved from the array (CL).
5) The data is sent to the memory controller
(tDPD).
24Memory
- The manufacturers bandwidth assumes the best
case, so we will need to make a more accurate
approximation. - We use the Total Access Time Address Transport
Time, the Data Access Time, and the Data
Transport Time - The memory is Synchronous so the Address
Transport time equals a clock cycle.
25Memory
- tRCD Is the Row to Column access Delay.
- CL is the Column Access time. (Clock cycles)
- tAC Is the minimum Access time.
- tDPD Is the Data Propagation Delay.
- BMM is number of subsequent accesses in burst
mode.
26Disk
- For the Hard disk drive we will use the Internal
Sustained Transfer Rate (ISTR). - ISTR depends on the track the files are located.
- The transfer is slower is the files are
fragmented.
27Disk
- Outer tracks have more
- sectors per track.
- We will approximate an average position.
- ISTR optimal for files in
- adjacent tracks and sectors.
28Disk
- We will use the command pidstat from SYSSTAT.
- Includes page faults, cache misses and direct
accesses. - With the total number of bytes read/written and
the Internal Sustained Transfer Rate we can
calculate the total time.
29Future work
30Future activities cont.
- Find a suitable methodology for approximating the
energy consumption of the network - Pick a network model to be used for the
experiments - Run the experiments on Teragrid
31Future activities cont.
- Process results
- Compose the paper
32Timeline
33References
- 1 Kansal, A., and Zhao, F. "Fine-grained energy
profiling for power-aware application design" In
Workshop on Hot Topics in Measurement and
Modeling of Computer Systems (2008)KANSAL, A.,
AND ZHAO, F. - 2 X. Feng, R. Ge, and K. Cameron, "Power and
energy profiling of scientific applications on
distributed systems" Proc. 19th Intl Parallel
Distributed Processing Symp. (IPDPS 05), Apr.
2005. - 3 R. Joseph and M. Martonosi, "Run-time Power
Estimation in High Performance Microprocessors"
Proceedings of the 2001 international symposium
on Low power electronics and Design (ISLPED01)
2001 - 4 V. Shnayder, M. Hempstead, B. rong Chen, G.
Werner-Allen, and M. Welsh, Simulating the power
consumption of large-scale sensor network
applications, in Proceedings of the Second ACM
Conference on Embedded Networked Systems (SenSys?
), Nov. 2004. - 5 R. Jain, D. Molnar, and Z. Ramzan, "Towards
understanding algorithmic factors affecting
energy consumption switching complexity,
randomness, and preliminary experiments" In Proc.
of the 2005 joint workshop on foundations of
mobile computing, pages 7079. ACM, 2005. - 6 F. Bellosa, "The Benefits of Event-Driven
Accounting in Power-Sensitive Systems". In
Proceedings of the SIGOPS European Workshop,
September 2000. - 7 Perfsuite http//perfsuite.ncsa.uiuc.edu/
- 8 PAPI http//icl.cs.utk.edu/papi/
- 9 SYSSTAT http//pagesperso-orange.fr/sebastien.
godard/ - 10 G. Torres, "Understanding RAM Timings"
http//www.hardwaresecrets.com/article/26/ - 11 Kingston Memory Module Specification
KVR667D2D8F5? - 12 DDR2 http//www.hardwaresecrets.com/article/1
67 and 10 - 13 SDRAM latency http//en.wikipedia.org/wiki/SD
RAM_latency - 14 CAS Latency (page 200) http//books.google.co
m/books?idHLpTtLjEXqcClpgPA200otsAMDTH6D5HUd
qSDRAM2020latency20formulapgPA200vonepage
qftrue - 15 Calculating SDRAM cache-line-fill latency
http//www.dewassoc.com/performance/memory/hampel_
rambus.htm - 16 DRAM Normal Access Mode http//ieeexplore.iee
e.org/stamp/stamp.jsp?tparnumber332332isnumber
7848 - 17 DRAM Operation http//www.ece.cmu.edu/ece548
/localcpy/dramop.pdf - 18 DRAM Specifications http//www.cs.albany.edu/
sdc/CSI404/dramperf.pdf - 19 Hard Disk Performance http//www.storagerevie
w.com/guide2000/ref/hdd/perf/perf/spec/index.html