Performance Monitoring on Pentium 4 Processor - PowerPoint PPT Presentation

1 / 10
About This Presentation
Title:

Performance Monitoring on Pentium 4 Processor

Description:

Performance Monitoring on Pentium 4* Processor. Nidhi. nidhi.nidhi_at_intel.com ... With hyperthreading, the counters may get divided among the logical processors. ... – PowerPoint PPT presentation

Number of Views:47
Avg rating:3.0/5.0
Slides: 11
Provided by: intel162
Category:

less

Transcript and Presenter's Notes

Title: Performance Monitoring on Pentium 4 Processor


1
Performance Monitoring on Pentium 4 Processor
  • Nidhi
  • nidhi.nidhi_at_intel.com
  • IA 32 Performance Architect

2
Outline
  • Pentium 4 Processor Performance Monitoring
    features
  • Implementation
  • How Intel uses Performance Monitors
  • Limitations
  • Open issues

3
Feature Overview
  • Counters
  • 18 40-bit programmable counters
  • Events
  • 45 events in various parts of the machine
  • Counter increment control
  • qualification by current privilege level (O/S,
    USER)
  • qualification by hardware thread id
  • edge detection
  • threshold comparison
  • interrupt on counter overflow
  • Interface ( x86 instructions to set/read
    counters)
  • WRMSR (write machine status register)
  • RDMSR (read machine status register)
  • RDPMC (read performance monitoring counter)
  • RDTSC (read time-stamp counter)

4
Features Overview, cont.
  • Cascading
  • Second counter begins counting when first counter
    overflows
  • For instance, to measure cycles elapsed after the
    first counter overflowed.
  • Tagging
  • Used to get non-speculative event counts
  • Tags micro-ops when they incur an event
  • Counts tagged micro-ops at retirement
  • Three tagging mechanisms front-end, execution,
    and replay

5
Precise Event Based Sampling
  • Mechanism
  • User allocates a PEBS buffer in memory
  • User programs a counter to tag micro-ops and
    count them as they retire
  • When the counter overflows, the Pentium 4
    Processor s retirement logic forces a microcode
    assist just before the next tagged micro-op
  • Microcode assist copies the program counter and
    GPRs into the PEBS buffer in memory
  • Advantages
  • Precise taken at instruction which had an event
  • Enables creation of data address profiles and
    locate cache lookup patterns and data relocation
    opportunities

6
Implementation Overview
7
How Intel Uses Performance Monitors
  • Intel uses Performance Monitoring for
  • Performance Analysis
  • Compiler optimizations
  • System level optimizations
  • Performance and functional debug
  • Many tools built for analyzing and collecting
    Performance monitoring counters
  • Interval Sampler
  • Profiler

8
Performance Analysis
  • Interval sampler
  • Gives the characteristics of the system
  • VTune Performance Analyzer
  • Event Profiler
  • Gives the distribution of events for the system
    over the whole application run
  • Available at http//www.intel.com/software/produc
    ts/vtune/
  • Interval Sampler points out which events to look
    for, VTune event profiles then help find the
    function, basic block or the IPs that have the
    performance problem.

9
Limitations
  • Not all counters can count all events.
  • With hyperthreading, the counters may get divided
    among the logical processors.

10
Open Questions
  • Centralized Vs. Distributed?
  • Distributed is simpler but less flexible
  • Add new events?
  • New usage models
  • Multicore / Multithread scenarios
  • Feedback is welcome! ?
Write a Comment
User Comments (0)
About PowerShow.com