Transforming a FAST simulator into RTL implementation - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Transforming a FAST simulator into RTL implementation

Description:

FM is ISA specific, but micro-architecture agnostic ... Currently, our FM can model x86 and PowerPC targets. TM ... It is simple enough to do manually (TASK) ... – PowerPoint PPT presentation

Number of Views:42
Avg rating:3.0/5.0
Slides: 25
Provided by: nikhil1
Category:

less

Transcript and Presenter's Notes

Title: Transforming a FAST simulator into RTL implementation


1
Transforming a FAST simulator into RTL
implementation
  • Nikhil A. Patil Derek Chiou
  • FAST Research group,
  • University of Texas at Austin

2
Outline
  • Research Goal
  • Motivation
  • Quick introduction to FAST
  • Going from FAST to RTL
  • Data-path
  • Microcode Compiler
  • Golden Models
  • Optimizing to single-cycle
  • Benefits
  • Conclusions

3
Research Goal
  • Simplify the design, development, and
    verification of computer systems
  • Significantly reduce overall architecture, RTL,
    verification, software effort
  • Eliminate wasted work enable code-reuse

4
Motivation
Architectural Simulator
RTL
Synthesis Flow
Verification
Compiler
Low Accuracy Software Simulator
Software
  • Information duplication in traditional design
    flow

5
Pre-silicon S-RTL Bugs in Pentium 4
  • Bob Bentley, Validating the Intel Pentium 4
    Microprocessor, DAC 2001

6
Vision of an ideal design flow
Architectural Micro-architectural Specification
Architectural Simulator
RTL
Verification
Software
Shared specification reduces information
duplication
7
Vision of an ideal design flow
  • Single central source (code-base) for all of
    the following
  • Architectural studies
  • Micro-architectural tuning
  • RTL implementation
  • RTL level power modeling
  • RTL Verification
  • Software development
  • Note For now, we dont address anything beyond
    synthesizable RTL (physical design, etc.)

8
Overview of FAST
9
Points to note about FAST
  • FM is ISA specific, but micro-architecture
    agnostic
  • Trace sent from FM to TM is ISA-specific, not
    micro-architecture specific e.g., x86 opcode,
    not x86 microcode
  • TM implements a (potentially inaccurate)
    microcode table to decode the meaning of the
    trace
  • For a simpler ISA, table is an identity mapping
  • Currently, our FM can model x86 and PowerPC
    targets
  • TM written in Bluespec SystemVerilog
  • TM is composed of modules connected with FAST
    Connectors, that manage latency, throughput and
    buffering (built upon the theory of Asim A-Ports)
  • FAST methodology itself does not introduce any
    inherent inaccuracies all inaccuracies are due
    to lower fidelity models (or bugs)

10
Vision for FAST
  • Single central codebase will be comprised of the
    following three sub-modules
  • ISA simulator (C/C)
  • Micro-op definition (C/C)
  • Micro-architectural definition (Bluespec/C)
  • Note that the information contained in each is
    mutually exclusive
  • Eliminates possibility of inconsistency

11
From FAST to RTL
  • Add data-paths to the timing model
  • ALU, cache data-stores, forwarding paths
  • Magically move the ISA from the FM to TM
  • Detach trace-buffers use internal data-path
  • ? TM module, improve fidelity
  • _at_ 100 fidelity, we have a Golden model
  • ? TM module, improve host/target-cycle ratio
  • _at_ 11 h/t-cycle ratio, we have RTL
  • Will need changes to FAST connector

12
Caveats
  • Fidelity of the simulation models is transferred
    to the implementation
  • Depending on the model fidelity, it may or may
    not be possible to run actual software on the
    implementation
  • Use software that uses only the subset of
    features supported with 100 fidelity e.g.
  • Self-modifying code
  • Unaligned accesses

13
From FAST to RTL
  • Add Data-path
  • Add Functionality
  • Detach trace-buffers
  • Improve fidelity
  • Improve host performance

14
Data-path
  • Assuming a sufficiently high fidelity model
  • Adding data-path does not change the module
    interfaces significantly
  • It is simple enough to do manually (TASK)
  • This process can sometimes unearth fidelity bugs
    in the simulator e.g., not accounting for
    limited number of ports on a register file
  • The data-path can be trivially removed for
    simulation flows
  • Data-path also needed for power modeling of
    certain modules

15
Functionality
  • ISA simulation (in FM) can be summarized as
  • Fetch fetch instructions, advancing PC
  • Modeled in the TM already (with very high
    fidelity)
  • Decode identifies an instruction with a function
  • Not modeled in TM at all
  • Can be written manually or auto-generated (TASK)
  • Execute calls the function
  • Corresponds to target microcode and data-path
  • Microcode needs to be made 100 accurate (TASK)

16
Microcode Compiler
  • Microcode Compiler (MCC) maps each instruction
    onto one or more micro-ops
  • Takes two software (C/C) simulators as its
    input
  • ISA simulator (currently, bochs)
  • Micro-op simulator
  • Compiles the specification of each
    instruction/micro-op into a data-flow graph
  • Uses exhaustive search to statically map
    instruction execution onto one or more micro-ops
    based on a cost table
  • In case of a failure, says why a mapping is not
    possible
  • Work in progress ?

17
From FAST to RTL
  • Add Data-path v
  • Add Functionality v
  • Detach trace-buffers
  • ? TM module, improve fidelity
  • _at_ 100 fidelity, we have a Golden model
  • ? TM module, improve host/target-cycle ratio
  • _at_ 11 h/t-cycle ratio, we have RTL
  • Will need changes to FAST connector

18
Golden models
  • A 100 cycle-accurate model
  • May still take multiple FPGA cycles to model a
    single target cycle
  • It is in fact a legitimate implementation
  • Serves as a golden reference model for the next
    step (optimization) as well as for writing and
    debugging verification suites
  • Traditionally, verification teams have written
    golden models from the architectural specs
  • Likely to use FPGA structures efficiently

19
Optimizing to single-cycle
  • Automatic transformation of modules may be
    possible for some simple modules using algorithms
    to
  • Unroll a loop in hardware
  • Collapse a multi-state FSM into a single state
  • Can Bluespec help here?
  • Manual optimization is certainly feasible
  • Currently, FAST Connectors dont allow this
    optimization (TASK)
  • Connector interface cannot support modules that
    take exactly 1 host cycle for every target cycle
  • Work in progress ?

20
From FAST to RTL
  • Add Data-path v
  • Add Functionality v
  • Detach trace-buffers v
  • ? TM module, improve fidelity v
  • _at_ 100 fidelity, we have a Golden model
  • ? TM module, improve host/target-cycle ratio v
  • _at_ 11 h/t-cycle ratio, we have RTL
  • Will need changes to FAST connector

21
Alternative path
  • Design the original TM modules as 1-host-cycle
    implementations
  • Automatically convert to n-host-cycle for the
    simulator
  • Using Bluespec?
  • Without automatic conversion, we would end up
    with RTL before FAST simulator!
  • Almost like prototyping

22
Potential benefits
  • Provides a way to verify FAST simulators
  • Golden models can be generated for the
    verification teams
  • Verify resulting implementation
  • Provide working implementation to RTL designers
  • Replace one component at a time
  • Provides a test-rig
  • Runs software
  • Improves communication between teams
  • Eliminates SIM-RTL calibration
  • Potentially faster than the simulator
  • Early versions can be made available to software
    team

23
Conclusions
  • This technology provides a way to use a single
    codebase to meet a variety of needs from
    Simulation to Implementation to Verification.
  • Single central codebase will be comprised of the
    following three sub-modules
  • ISA simulator (C/C)
  • Micro-op definition (C/C)
  • Micro-architectural definition (Bluespec/C)

24
?
Write a Comment
User Comments (0)
About PowerShow.com