Metropolis: Envisioning the Service-Oriented Enterprise - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Metropolis: Envisioning the Service-Oriented Enterprise

Description:

... and PDAs by performance but it is great for mobile market and digital home! ... Is it a waste of time during development or a necessary thing for digital home? ... – PowerPoint PPT presentation

Number of Views:53
Avg rating:3.0/5.0
Slides: 29
Provided by: pathe7
Category:

less

Transcript and Presenter's Notes

Title: Metropolis: Envisioning the Service-Oriented Enterprise


1
Hardware-based CIL-machine
Nizhniy Novgorod State University,
Russia Laboratory of Physical Fundamentals and
Technologies of Wireless Communications
reporter Maxim Shuralev max_at_wl.unn.ru Head of
the project Dr. Alexey Umnov umnov_at_wl.unn.ru
2
Hardware CIL processor project team
  • Hardware
  • Maxim Shuralev, Maxim Sokolov, Dmitry Mordvinov
  • (NNSU, Wireless Lab)
  • Software, workloads and tools
  • Andrey Eltsov (NNSU, Wireless Lab),
  • Roman Mitin, Sergey Lyalin, Sergey Galkin,
  • Ilia Golubev (NNSU, IT Lab)
  • Support
  • Dmitry Golovachev, Svetlana Surova, Elena
    Pankratova
  • (NNSU, Wireles Lab)
  • Consultants
  • Aliaksei Chapyzhenka (Intel), Dmitry Ragozin
    (Intel),
  • Sergey Chernyshov (Nizhniy Novgorod State
    Technology University)
  • Head of Wireless Lab Alexey Umnov

3
Agenda
Introduction Architecture of the CIL
processor Description of the DSP core Description
of the CIL core Speed up features of the CIL
core a metainformation cache a hardware
stack a hardware type control engine Garbage
collector implementation Example of DSP workload
for the processor Development board for processor
implementation HW Implementation results Software
support libraries Conclusion and comparison
4
Introduction
  • port of the .NET engine
  • to
  • energy-efficient low-power mobile platform
  • advantages and disadvantages of stack-based CIL
    engine
  • maximum execution speed of CIL instructions can
    not be more than one instruction per clock
  • the stack engine is the most simplest way to
    execute some machine code, as instruction
    decoding and processor structure is very simple
  • limited ability for parallel instruction
    execution
  • low complexity and low power consumption

5
Introduction
application target and target market .NET is
intended for different Web-oriented services,
distributed business databases, online
transactions, CRM system support and etc. CIL
processor is not supposed to compete with desktop
processors and PDAs by performance but it is
great for mobile market and digital
home! The target is end-user specialized and
oriented for MOBILE DEVICES,
Web-terminals, Web-browsers, interactive TV,
HOUSE CONTROL
SYSTEMS
6
Introduction
  • requirements for the CIL processor
  • Execute the .NET (CIL) code directly
  • .NET is native code
  • Consume low power from power supply
  • Mobile low power devices
  • Effectively handle DSP tasks
  • New generation of
  • interactive multimedia mobile devices

7
Architecture
High-level structure of the CIL processor
implementation Programmers model
8
Architecture
High-level hardware structure of the CIL
processor Hardware structure
9
Architecture
  • Why DSP-based ?
  • Is it a waste of time during development or a
    necessary thing for digital home?
  • As CIL processor is an excellent solution for
    digital home
  • Pro
  • We have firmware layer for executing
  • very complex CIL instructions
  • increased in 5-10 times performance
  • in multimedia applications
  • Contra
  • increased development time
  • We need to implement only
  • standard CIL set, not DSP

10
Architecture
  • Why DSP-based ?
  • Hardware implementation
  • Pro
  • Effective low-power computational kernel
  • Good mapping CIL instruction -gt DSP instruction
  • Low power consumption in multimedia tasks
  • Similar technology to existing and efficient
    ARM/Java Jazelle
  • Contra
  • Only serial instruction execution
  • (as we have CIL stack based instruction set and
    do not want to use superscalar techniques)

11
Architecture
  • Why DSP-based ?
  • 2-in-1 2 native instruction sets on-board
  • Complex CIL instructions (e.g. type hierarchy
    checks and safety checks) are simply implemented
    in firmware as DSP instructions
  • 5x-10x speed improvement for DSP workloads
  • Low overhead in terms of extra transistors
    on-chip

12
Description of DSP core units
AGU-1
AGU-2
ALU
13
Description of CIL core

Under the execution CIL mode, the programmer has
the exact implementation of the ECMA-335 standard
CIL engine
14
Speed up features of CIL core
Metainformation cache
  • Constant table
  • String table
  • Method table
  • Class field table
  • Type table
  • Smart array table

15
Speed up features of CIL core
Hardware typed stack
16
Garbage collector
  • Automatic memory management
  • Division of objects into big and small
  • The generational garbage collector with two
    generations for small objects
  • Separate area of memory for big objects

Special coprocessor, based on reduced DSP kernel
may be used for processing garbage collector tasks
17
Example of DSP workload
Our CIL processor is an excellent target for
multimedia applications
18
Development board
Virtex-4 FPGA chip 64 MBytes DDR SDRAM 100 Mhz
clock oscillator Expansion bus up to 32 I/O
lines Stereo AC97 audio codec RS-232 serial
port LCD display for debugging messages VGA
output (50 Mhz 24-bit video DAC) PS/2 mouse and
PS/2 keyboard connectors System ACE
configuration controller access to external
flash cards 10/100/1000 Mbit Ethernet transceiver
for networking USB interface chip Xilinx
XC95144XL CPLD for FPGA configur. Xilinx XCF32P
Platform Flash configuration JTAG configuration
port for design loading or remote debugging from
PC
495 USD only
19
Development board
Testing process for processor cores The
C model is a full-scale analog of the Verilog
HDL model The C model is considered as a
reference model
20
Implementation results
Device Spartan-3 Spartan-3 Spartan-3 Spartan-3 Virtex-4 Virtex-4 Virtex-4 Virtex-4
Slices Slice Flip-Flops 4-input LUTs Maximum frequency, MHz Slices Slice Flip-Flops 4-input LUTs Maximum frequency, MHz
AGU-1 331 220 548 N/A 300 200 560 228
AGU-2 385 320 543 N/A 300 200 560 228
ALU 4368 587 7917 N/A 4216 593 8056 55.4
Decoder 1227 60 2139 N/A 1319 40 2303 971
DSP 5365 628 9508 46.9 4981 628 9191 77.8
The ALU consumes most of the FPGA resources The
DSP core uses only a small part of Virtex-4
LX25, and the CIL processor implementation takes
only up to 5500 cells (35 ) of our Virtex-4
FPGA (without optimizations)
21
Implementation results
main ALU unit structure Bit Manipulation
Unit (a part of the ALU unit)

whole DSP kernel
22
Implementation results
Moderate detail-level structure of implemented
CIL processor
23
Software support
  • Exception microcode complex CIL instruction
    implementation in DSP code
  • Class library may ported from PC
  • Supporting system libraries I/O, memory
    management
  • Multimedia libraries for DSP core
  • User applications
  • Just in time compiler for CIL code, if necessary
  • Compiler we are using a retargeted GCC version
  • Assembler / disassembler retargetable
    utilities, used with compiler, they a specially
    tuned for CIL core
  • Linker
  • Hardware and software codesign suite (compiler,
    assembler, disassembler, Verilog instruction
    decoder generator

24
Conclusion comparison
Comparison with ARM-based software .NET engine
for embedded systems (www.dotnetcpu.com)
Hardware-based CIL-machine ARM-based .NET execution engine
80-100 Mhz FPGA implementation 27 Mhz
1-2 CIL operations per cycle (40-50 Millions of CIL operations per second) hardware execution for basic CIL operations hardware assisted stack implementation 450,000 CIL operation per second interpreted CIL operations execution
50x faster than interpreted execution 50x slower than hardware execution of basic operations
hardware type control software type control
garbage collector may be implemented as a hardware coprocessor or intellectual memory software garbage collector
Meta-information cache hardware software meta-information processing
DSP core with two memory spaces ARM core
2 Multiply-Accumulate instructions and 2 ALU operations in cycle up to 4 instruction per cycle 1 ALU operation in cycle
DSP core power consumption is 3-4x less than ARM core ARM core power consumption in 3-4x more than DSP core
25
Conclusion comparison
  • CIL processor is not only a software concept it
    may be successfully implemented in hardware
  • Our dual architecture the CIL processor, based
    on a DSP core, enables multimedia applications
    with low-power consumption, so the CIL processor
    may be successfully used for digital home and
    digital entertainment
  • CIL typed engines are implemented in hardware,
    that greatly reduces overhead of type checking in
    run-time
  • Hardware CIL implementation greatly outperforms
    non-optimized software implementations
  • (by performance and power consumption)

26
Project participants
27
Express gratitude
Microsoft Corporation for grant, which allows us
to joint people for different faculties of Nizhny
Novgorod State University into one team and
develop our hardware solution Laboratory of
Physical Foundations and Technologies of Wireless
Communications, Nizhny Novgorod State University,
which is supported by Intel Corporation, for help
during our research activities Special thanks
for Aliaskey Chapyzhenka, Intel Corp. for
spending his time advising us in hardware
architectures
28
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com