Computer Architecture - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

Computer Architecture

Description:

... is an absolutely stupendous manual devoted to the Pentium III ... http://developer.intel.com/design/pentiumii/manuals/245127.htm ... In both these manuals, ... – PowerPoint PPT presentation

Number of Views:27
Avg rating:3.0/5.0
Slides: 16
Provided by: jb20
Category:

less

Transcript and Presenter's Notes

Title: Computer Architecture


1
Computer Architecture
  • Lab 5.1
  • Prof. Jerry Breecher
  • CSCI 240
  • Fall 2001

2
What you will do in this lab.
  • The purpose of this lab is to let you use some of
    the concepts youve acquired about Pipelines.
    You will be examining the code produced by the
    compiler and better understand what it can do.
  • You have only one task before you
  • Use the tool provided, called soak.c, to
    determine properties of the memory subsystem.
    Using this tool, and a lot of ingenuity, you can
    find out the following information
  • Size of the L1 cache
  • Size of the L2 cache
  • Data Access speed of the L1 cache
  • Data Access speed of the L2 cache
  • Data Access speed of the Main Memory
  • The level of associativity of the L1 and L2
    caches.
  • The time lost due to a TLB miss.
  • The level of associativity of the TLB.
  • The time required for a mis-aligned data read.

Wow! Is this open ended or what!!
3
What you will do in this lab.
  • What is a verbal lab?
  • You prepare, document and tie up all the pieces
    of your lab just as if you were handing it in.
    Instead you and your teammate talk over the
    results with me. The discussion will be
    professional, the way I would talk about a
    problem with a junior colleague.
  • You are expected to bring to this discussion
  • All notes youve written about the problem. It
    is NOT acceptable to say I think the answer was
    42.
  • Many of you at the last verbal discussion said
    things like I dont know why the answer came out
    the way it did. This again is not the way a
    professional discussion takes place. You are
    expected to have your facts straight and to
    understand what it is you accomplished. Your job
    is to tame this piece of silicon and know what
    its doing.

4
Where To Get Documentation
  • There is an absolutely stupendous manual devoted
    to the Pentium III architecture (which is what we
    have in the lab)
  • The Intel Architecture Optimization Reference
    Manual
  • http//developer.intel.com/design/pentiumii/manual
    s/245127.htm
  • Local copy at
  • http//babbage.clarku.edu/jbreecher/docs/Intel
    Architecture Optimization Reference Manual.pdf
  • For Pentium 4, the manual is
  • Pentium 4 and Xeon Processor Optimization
  • http//developer.intel.com/design/pentium4/manuals
    /248966.htm
  • Local copy at
  • http//babbage.clarku.edu/jbreecher/docs/Pentiu
    m 4 Xenon Processor Optimization.pdf
  • In both these manuals,
  • Chapter 1 contains lots of great information
    about The Pipeline used in the processors.
  • Chapter 2 contains guidelines for Optimizing
    Performance.
  • There are also excellent coding examples
    throughout.

5
Task 1
  • Steps To Accomplish This Task.
  • 1. There is no way you can do this task until you
    have a complete and thorough understanding of the
    memory hierarchy. You can gain this
    understanding in the lecture or by reading the
    book.
  • 2. Develop a plan! This is a big undefined
    project. You need to figure out a plan for each
    of the pieces. I will actually do Parts A and E
    as examples for you.
  • 3. Sit and think. What are the inputs you want
    to use for Part B? Try them. Then sit and think
    some more. Do your results match your picture of
    how the cache works?
  • You will be evaluated on your methodology!

6
About Soak
  • Say soak and you will get lots of information
    about the program. Heres a bit about the
    inputs
  • soak lttotal_memgt ltstepgt ltMega-touchesgt
  • Total memory is the span of memory to be touched.
  • Step is the number of bytes jumped on each memory
    touch.
  • Mega-touches is how many million memory reads you
    will do.
  • For example, soak 128 32 100 touches
    memory locations
  • 0, 32, 64, 96, 0, 32, 64, 96, with a total of
    100,000,000 touches since there are a total of
    4 touches in a cycle, there will be a total of
    25,000,000 cycles.
  • Heres where to find the code for soak.c
  • http//babbage.clarku.edu/jbreecher/docs/soak.c
  • You may wish to read this code. Its bigger than
    the throwaway tidbits youve seen so far.

7
About Soak
  • The relevant part of the soak program
  • get_current_time( start_seconds )
  • for ( j 0 j lt iterations j )
    ? The Outer Loop
  • for ( i 0 i lt steps_per_iteration i )
    ? The Inner Loop
  • / A "touch" is defined as one cycle
    within this loop /
  • new_ptr (STRUCT )( (int)memory_ptr
    global (step_size i) )
  • global new_ptr -gt trash
  • get_current_time( end_seconds )

8
About Soak
  • The relevant part of the soak program
  • movl global,ecx
  • movl new_ptr,eax
  • .p2align 4,,7
  • .L188
  • leal 1(edx),edi
  • cmpl 0,-48(ebp)
  • jle .L187
  • xorl ebx,ebx
  • movl -48(ebp),edx
  • .p2align 4,,7

.L192 leal (ecx,esi),eax addl
ebx,eax movl (eax),ecx addl
-44(ebp),ebx decl edx jnz
.L192 .L187 movl edi,edx cmpl
-32(ebp),edx jl .L188 movl
eax,new_ptr movl ecx,global
9
About Soak
  • The relevant part of the soak program
  • .L188
  • leal 1(edx),edi lt-- Outer Loop
  • cmpl 0,-48(ebp) lt-- Outer Loop
  • jle .L187 lt-- Outer Loop
  • xorl ebx,ebx lt-- Outer Loop
  • movl -48(ebp),edx lt-- Outer Loop
  • .L192
  • leal (ecx,esi),eax lt-- Inner
    Loop
  • addl ebx,eax lt-- Inner Loop
  • movl (eax),ecx lt-- Inner Loop lt--
    Memory Touch
  • addl -44(ebp),ebx lt-- Inner Loop
  • decl edx lt-- Inner Loop
  • jnz .L192 lt-- Inner Loop
  • .L187
  • movl edi,edx lt-- Outer Loop
  • cmpl -32(ebp),edx lt-- Outer Loop
  • jl .L188 lt-- Outer Loop

10
Other Support Material
  • Useful Rabbit Codes
  • L2_LINES_IN Number of lines allocated (loaded
    into) in L2. These are requests that miss the L2
    and go to main memory.
  • L2_RQSTS Numbers of requests to the L2 cache
    this includes both requests that hit the cache
    and those that miss and must then go to main
    memory.
  • INST_RETIRED Number of instructions retired.
  • MISALIGN_MEM_REF Number of instructions that
    accessed memory not on the correct mod boundary.
    This causes the hardware to do extra work to
    bring the data in.
  • There are also the various rabbit codes youve
    used in previous labs.
  • rabbit soak .. produces all codes. rabbit -g
    2 soak . gets most of these codes.
  • These types are more fully described in
  • IA32 SDM Vol3 System Programmers Guide.pdf
    starting on page A-22.

11
Other Support Material
  • How To Write A Shell Command
  • In planning your tests, its easier to write a
    shell script as a way of remembering what you did
    and as a way of repeating some or all of an
    experiment. Here are the steps you might follow
  • vi my_commands
  • cat my_commands
  • soak 2048 32 100
  • soak 4096 32 100
  • chmod 700 my_commands
  • my_commands

12
About The Caches
  • This is what I get on johnson when I run
    arch_params
  • Cache and TLB
  • Instruction TLB ... 4 kb pages, 4-way set
    associative, 32 entries
  • Data TLB .......... 4 kb pages, 4-way set
    associative, 64 entries
  • L2 cache .......... 256 kb, 8-way set
    associative, 32 byte line size
  • L1 instruction cache 16 kb, 4-way set
    associative, 32 byte line size
  • L1 data cache ..... 16 kb, 4-way set
    associative, 32 byte line size
  • I believe that at least one of these numbers is
    wrong, though it could simply be the silicon
    outsmarting me.
  • The way that data is replaced in a cache causes
    results to be rather tricky. For example, lets
    suppose that the L1 cache is 16,384 bytes in
    size. When you run a test that asks for exactly
    this much memory (soak 16384 32 100) you get one
    time. In the best of all worlds, running a test
    touching more memory (soak 20000 32 100) would
    give a completely different time. But its not
    that simple. The reason is that in the 20000
    byte test, only some of the memory may be kicked
    out. So some values hit in the cache and some
    dont, giving an unusual timing.

13
About The Caches
  • Memory replacement is defined as Pseudo-LRU.
    We will talk about this in class.
  • Its possible to get different timings for the
    same test!! This happens when one time the data
    fits in the cache, and another time it doesnt.
    I dont know why this is.
  • So to get proper timings, its necessary to go to
    a memory size much larger than the next smaller
    cache. What do I mean by that?
  • Doing the TLB test is tricky (as if the other
    tests arent!) You need to develop an access
    pattern that touches many more pages than are in
    the TLB, and compare that with a run where the
    total pages touched do fit in the TLB.

14
Task 1
  • Heres the way to do Part E
  • How do you tell that youre getting the data from
    one memory level rather than another? Well,
    they take different amounts of time! So try
    touching different amounts of memory when the
    amount of memory touched no longer fits in the
    cache, then the time to get that memory will be
    larger than before.

soak 2048 32 100 5.5 nanoseconds soak 4096
32 100 5.4 nanoseconds soak 8192 32 100
5.2 nanoseconds soak 12288 32 100 5.4
nanoseconds soak 16384 32 100 5.3
nanoseconds soak 20480 32 100 9.2
nanoseconds soak 24576 32 100 9.2 nanoseconds
15
Task 1
  • Heres the way to do Part E
  • In this Part, youre figuring out the cost of
    doing a memory access. Lets run soak in a mode
    that ensures that we miss the L2 cache every time
    well ask for lots of memory.
  • rabbit soak 2000000 32 100
  • Soak Version November 5, 2001
  • Touching 2000000 bytes of memory for 1600
    iterations
  • 2000000 bytes allocated at 0x40141000
  • 13.749 seconds elapsed for 100000000 memory
    touches
  • 137.5 nanoseconds per touch
  • This says there are 100,000,000 memory touches in
    13.7 seconds or 7,300,000
  • touches per second.
  • Event
    Events Events/sec
  • ----------------------------------------
    ---------------- ----------------
  • 0x24 36 l2_lines_in
    2414030 7323989.60
  • 0x2e 46 l2_rqsts
    2406704 7318328.50

1 X
6 X
Write a Comment
User Comments (0)
About PowerShow.com