Kari Tiensyrj - PowerPoint PPT Presentation

About This Presentation
Title:

Kari Tiensyrj

Description:

... Evaluation Hearing. 1. Kari Tiensyrj . Senior Research Scientist ... Senior Principal Researcher. NEC Europe. Ben Juurlink. Professor. Delft ... Games, ... – PowerPoint PPT presentation

Number of Views:19
Avg rating:3.0/5.0
Slides: 13
Provided by: irjak
Category:
Tags: kari | tiensyrj

less

Transcript and Presenter's Notes

Title: Kari Tiensyrj


1
Kari TiensyrjäSenior Research ScientistVTT
  • FP6-2004-IST-4 FET Proactive Initiative ACA
  • SUPERcomputing on a CHIP SUPERCHIP
  • Proposal Number 26888

Jesper Larsson TräffSenior Principal
ResearcherNEC Europe
Ian PhillipsProf., Principal Staff EngineerARM
Ben JuurlinkProfessorDelft University of
Technology
2
1. Paths to exploitation
  • FET project with potential for application
    breakthroughs in a 10 years horizon
  • Industrial Partners (NEC, ARM, Intel) cover a
    wide spectrum of application domains and provide
  • Steering of scientific and technological research
  • Transfer of knowledge and results to and
    interplay with company design groups
  • Proposition to standardization bodies, where
    relevant (B.3.6)
  • Active promotion of results (T6.1 and T6.2)
  • High-profile scientific and applied conferences
    and journals
  • Organization of workshops
  • PhD courses and summer schools, incorporation
    into advanced curricula
  • Links to NoEs
  • WP6 (led by Intel) dissemination and
    exploitation (also B.3.3, B.4.1.7, and B.8.2.6)
  • T6.3 for technology transfer
  • T6.4 for exploitation

3
2. Target applications
  • Wide range of applications with high
    computational requirements will be considered
  • WP4 will analyse and identify applications, and
    selected sample applications will be implemented
    as proof-of-concept
  • An initial set of applications considered
  • Mobile devices (energy-efficiency)
  • PDA, HDTV
  • Games, virtual reality
  • Desktops and servers (versatility from
    high-performance/single-application to
    high-throughput application suites)
  • Streaming and DSP applications, e.g. video in
    bandwidth constrained active networks and
    embedded 3D graphics
  • Real-time speech recognition and
    videoconferencing
  • Database applications, string processing,
    geographical information processing
  • Supercomputer (high-performance)
  • Vectorised CFD Boltzmann automata
  • MPI-parallelised finite element methods
  • Quantum Chromodynamics

4
3. Leading contenders within the proposal
  • Objectives to boost performance by 2-3 orders of
    magnitude (compared to same transistor count),
    exploit parallelism at all levels, realise
    easy-to-use strong model of computing, provide
    scalability/wide application area/power saving
    techniques

Eclipse XMT CMP TTA/PISMA TRIPS
Scalable NOC with EREW PRAM model Simultaneous ILP-TLP exploitation Cacheless memory Regular structure - CMP with PRAM-like but more asynchronous model SMT synchronization mechanism On-chip caches - Shared memory using caches advanced cache coherency protocols - Tiled architecture with virtual shared memory communication - Very simple and strongly decentralized organization -Single chip reconfigurable processor / memory architecture -Grids of ALUs connected via operand networks -Static spatial scheduling
5
3. Leading contenders within the proposal (cont)
  • Initial choice of architectures is partially
    guided by application requirements
  • Eclipse and XMT general purpose computing,
    embedded computing
  • Advanced CMP high-throughput desktop and server
    machines
  • TTA/PISMA streaming/DSP
  • TRIPS HPC, streaming/DSP, threaded servers
  • Procedure to choose the initial SUPERCHIP
    architecture
  • 1. Develop an architecture evaluation framework
    (T1.1)
  • 2. Develop semi-analytical power/performance/cost
    models (T5.1)
  • 3. Develop/modify existing simulators for the
    architectures (T5.2)
  • 4. Design benchmark programs for the
    architectures (T4.1)
  • 5. Perform evaluation identify strong/weak
    points select (T1.1)
  • Preliminary criteria
  • Power, performance, cost (silicon area)
  • Estimated scalability, PRAM-like model support,
    ease of programming
  • Estimated coverage for aimed application area,
    TLP-ILP co-exploitation
  • Potential for solving the rest of the problems

6
4. Ensuring HW implementation technologies impact
on choice of scalable architecture
  • Scalability issues are observed in initial
    selection of candidate architectures
  • Mesh-like topologies (providing constant wire
    length links) Eclipse, CMP, TTA, TRIPS
  • Regular structures Eclipse, CMP, TTA, TRIPS
  • No forwarding networks (Eclipse) or multistage
    forwarding networks (TRIPS)
  • No cache coherency mechanisms Eclipse
  • Multithreading Eclipse, XMT
  • Decentralized structure Eclipse, CMP, TTA, TRIPS
  • Semi-analytical modeling of the architectures and
    candidate techniques (T5.1)
  • Analytical parametric power/performance/cost
    estimation models
  • Hardware implementation parameters are extracted
    from
  • Technology roadmaps e.g. ITRS
  • Pragmatic experience and knowledge of industrial
    partners

7
4. Ensuring HW implementation technology impact
on our choice of scalable architecture (cont)
  • Architectural simulation (T5.2)
  • Develop/modify existing simulators
  • Benchmarks
  • Sample applications
  • Information on execution time, resource
    utilization and power consumption is extracted
  • Modeling of the critical parts of architectures
  • Feasibility analysis of candidate architectures
  • Studies on fault tolerance, clocking schemes,
    on-chip/off-chip communication, power saving and
    other implementation related issues for the
    SUPERCHIP architecture (T5.3)
  • Detailed modeling and feasibility assessment of
    critical parts of the SUPERCHIP architecture
    (T5.4)

8
5. Evolvement of the PRAM model for the candidate
architectures
  • For ease-of-programming the SUPERCHIP programming
    model will be based on a PRAM-like model,
    considering
  • Relaxed synchronization (BSP-like)
  • Strong memory semantics (CRCW-like, built-in
    operators)
  • Potential for locality exploitation (memory,
    Hierarchical-PRAM)
  • SUPERCHIP will develop the necessary
    architectural support for this model
  • Architectural requirements
  • Synchronization implicit after each instruction
  • Bandwidth high bisection to handle random
    communication
  • Latency communication/memory access latency
    should be hidden
  • SUPERCHIP will not investigate PRAM-implementation
    on distributed memory architectures in general
  • Long-term research issue Evolution of
    programming model and architecture to SUPERCHIP
    constellations

9
5. Evolvement of the PRAM model for the candidate
architectures (cont)
Candi-date Synchronization Bisection bandwidth Latency hiding Initial model
Eclipse synchronization wave fast barrier mechanism P/2 Super-pipelined multithreading EREW PRAM
XMT hardware synchronization ? caches PRAM-like
CMP software synchronization square root P caches NUMA
TTA/PISMA software synchronization square root P caches NUMA
TRIPS software synchronization square root P caches NUMA
10
6. Validation and assessment of the performance
scalability of the final choice of HW/SW
architecture
  • Analytically through parametric
    power/performance/cost models
  • Empirically through simulations
  • Benchmark kernels and sample applications
  • Scalable benchmark suite for fine-grained shared
    memory architecture
  • Standard benchmark suites
  • Sample applications
  • Parametric architecture simulations
  • By comparing to future alternative approaches
    (e.g. advanced CMPs) and theoretical machines
    (e.g. ideal PRAM) using the applications and
    benchmarks

11
7. Plan for identifying the requirements for the
OS within the resources of the work plan
  • Goal is to identify requirements and implement
    core OS services to demonstrate validity of the
    architectural approach, but not to develop
    full-fledged OS (as stated in B.4.1.5)
  • Requirements from underlying architecture and
    applications
  • Resource management (process, thread and memory)
  • Runtime functions and services for applications
  • Input for identifying requirements will come from
    several other tasks including T1.2, T1.3, T2.2
    and T3.3
  • OS is not in charge of supporting distributed
    shared memory
  • Certain OS functionality will be covered by
    compilers run-time system
  • Task leader of OS task (T4.3, ULM) has developed
    a distributed operating system (Plurix) which
    provides an excellent basis

12
7. Plan for identifying the requirements for the
OS within the resources of the work plan (cont)
  • Preliminary anticipated OS requirements
  • Dynamic process/thread scheduling
  • Memory management (physical and virtual)
  • Synchronization including inter-process
    communication
  • Support for power management and IO
  • Definition
  • A coarse-grain functional model of OS will be
    developed and validated through simulation
  • Definition of API in SUPERCHIP language (or
    pseudo-language in the early phase)
  • Implementation
  • Using the SUPERCHIP language and compiler (from
    T2.2 and T3.3)
  • Testing with architecture simulation tools (from
    T5.2)

Feasible with the allocated resources and partners
Write a Comment
User Comments (0)
About PowerShow.com