System Software for Embedded Systems - PowerPoint PPT Presentation

About This Presentation
Title:

System Software for Embedded Systems

Description:

Single functional e.g. pager, mobile phone. Tightly constrained ... Marconi, Ericcson, BAe, Creative Labs, etc. Krithi Ramamritham / Kavi Arya. What this means ... – PowerPoint PPT presentation

Number of Views:735
Avg rating:3.0/5.0
Slides: 177
Provided by: kri94
Category:

less

Transcript and Presenter's Notes

Title: System Software for Embedded Systems


1
System Software for Embedded Systems
  • Krithi Ramamritham
  • Kavi Arya
  • IIT Bombay

Embedded Systems Workshop 2007
2
Embedded Systems?
3
Embedded Systems
  • Single functional e.g. pager, mobile phone
  • Tightly constrained
  • cost, size, performance, power, etc.
  • Reactive real-time
  • e.g. cars cruise controller
  • delay in computation gt failure of system

4
Hardware is not the whole System !!!
  • A Micro-Electronic System is the result of a
    projection of
  • Architecture
  • Hardware
  • Software
  • distinguished by its gross Functional
    Behaviour !
  • Software is an important part of the Product and
    must be part of the Design Process or we are
    only designing a Component of the system.

5
Why Is Embedded Software Not JustSoftware On
Small Computers?
  • Embedded Dedicated
  • Interaction with physical processes
  • sensors, actuators, processes
  • Critical properties are not all functional
  • real-time, fault recovery, power, security,
    robustness
  • Heterogeneity
  • hardware/software tradeoffs, mixed architectures
  • Concurrency
  • interaction with multiple processes
  • Reactivity
  • operating at the speed of the environment
  • These features look more like hardware!

SourceEdward A. Lee, UC Berkeley SRC/ETAB
Summer Study 2001
6
What is Embedded SW?
  • One definition
  • Software that is directly in contact with, or
    significantly affected by, the hardware that it
    executes on, or can directly influence the
    behavior of that hardware.

7
What is Embedded SW?
  • What is it not?
  • Application software can be recompiled and
    executed on any number of hardware platforms so
    long as the basic services/libraries are
    provided.
  • It is divided by vertical market segments
    (application domains)
  • Well-established methodologies, architectures,
  • HW platform independent, highly portable
  • Any SW that has no direct relationship with HW.

8
Embedded System Challenges for HW Folks
  • PARADIGM CHANGE!
  • Designers main tasks convert from processor
    integration to performance analysis.
    Concentration on functional requirements instead
    of integration work
  • Concentration on architectural exploration
    (including performance analysis
  • ? Re-use and Platform-based design become key!
  • ? Early validation of system/solution correctness
  • ? Parallel hardware and software development
  • ? More effective use of previous work
  • ? Faster ways to build new elements of a solution
  • ? Ways to test more effectively, efficiently, and
    quickly

9
Software Guys can Learnfrom Hardware Experts!
  • Concurrency
  • the synchrony abstraction
  • event-driven modeling
  • Reusability
  • cell libraries
  • interface definition
  • Reliability
  • leveraging limited abstractions
  • leveraging verification
  • Heterogeneity
  • mixing synchronous and asynchronous designs
  • resource management

SourceEdward A. Lee, UC Berkeley SRC/ETAB
Summer Study 2001
10
Trade-offs. Methodology ESW Architectural
specifics
  • Portability
  • ESW itself is intended to provide portability for
    higher SW layers
  • (At least parts of) ESW is per definition not
    portable
  • Real-time
  • Restricted use of standardized Inter-process
    communication (IPC) mechanisms (CORBA,) for
    performance reasons
  • Typically hard real-time requirements
  • RTOS dependency
  • Implementation of OS like services
  • Sometimes shielding of the RTOS to higher level
    SW layers
  • Direct dependency on RTOS implementation

11
Functional Design Mapping
SourceIan Phillips, ARM VSIA 2001
12
The Embedded Market Disruptive Change
Source Jim Ready President / CEO MontaVista
Software
Traditional Embedded World Never small
enough Never fast enough Headless/Character-based
Standalone Boot Run from ROM More Hardware than
Software Low-Level Programming Model Application
tied to hardware
  • Time to Market Pressures
  • Shortage of Embed. SW Engineers

13
Plan
  • Embedded Systems
  • New Approaches to building ESW
  • New paradigms Lava, Handel-C
  • Examples Engineering Returns to Software
  • Build a RISC processor in 48hrs
  • Advantages of reconfigurable hardware.
  • Real-time support for ESW

14
Motorola Software Survey Findings
  • Hardware design is a software task IC designers
    write code (VHDL, Verilog, Scripting)!
  • We must become a software-intensive embedded
    system solutions company, focused on integrating
    our platforms into users products -in the
    future well be neither a hardware nor a software
    company
  • Focus on developing systems capability, not just
    a software counterpart to our current hardware
    capability (though thats needed too)
  • We should have software content from drivers to
    applications
  • The fundamental goal isnt 70 margin on software
    products, its helping someone choose your total
    solution
  • Embedded systems platforms and solutions will be
    the key to market differentiation and profitable
    growth

SourceBob Altizer, BASYS VSIA 2001
15
Common Design Metrics
  • NRE (Non-recurring engineering) cost
  • Unit cost
  • Size (bytes, gates)
  • Performance (execution time)
  • Power (more powergt more heat less battery
    time)
  • Flexibility (ability to change functionality)
  • Time to prototype
  • Time to market
  • Maintainability
  • Correctness
  • Safety (probability that system wont cause harm)

16
Time to Market Design Metric
  • Simplified revenue model
  • Product life 2W, peak at W
  • Time of market entry defines a triangle,
    representing market penetration
  • Triangle area equals revenue
  • Loss
  • The difference between the on-time and delayed
    triangle areas
  • Avg. time to market today 8 mth
  • 1 day delay may amount to Ms
  • see Sony Playstation vs XBox

Source Embedded System Design Frank Vahid/ Tony
Vargis (John Wiley Sons, Inc.2002)
17
NRE and unit cost metrics
  • Compare technologies by costs -- best depends on
    quantity
  • Technology A NRE2,000, unit100
  • Technology B NRE30,000, unit30
  • Technology C NRE100,000, unit2
  • But, must also consider time-to-market

Source Embedded System Design Frank Vahid/ Tony
Vargis (John Wiley Sons, Inc.2002)
18
Losses due to delayed market entry
  • Area 1/2 base height
  • On-time 1/2 2W W
  • Delayed 1/2 (W-DW)(W-D)
  • Percentage revenue loss (D(3W-D)/2W2)100
  • Try some examples
  • Lifetime 2W52 wks, delay D4 wks
  • (4(326 4)/2262) 22
  • Lifetime 2W52 wks, delay D10 wks
  • (10(326 10)/2262) 50
  • Delays are costly!

Source Embedded System Design Frank Vahid/ Tony
Vargis (John Wiley Sons, Inc.2002)
19
Trends
  • Moores Law
  • IC transistor capacity doubles every 18 mths
  • 1981 leading edge chip had 10k transistors
  • 2002 leading edge chip had 150M transistors
  • 2007 leading edge chip has 1000M transistors
    (90nm)
  • Designer productivity has improved due to better
    tools
  • Compilation/Synthesis tools
  • Libraries/IP
  • Test/verification tools
  • Standards
  • Languages and frameworks (Handel-C, Lava,
    Esterel, )
  • 1981 designer produced 100 transistors per month
  • 2002 designer produces 5000 transistors per month
  • 2007 ???

20
Our New Understanding
  • We have simultaneous optimisations of competing
    design metrics speed, size, power, complexity,
    etc.
  • We need a Renaissance Engineer
  • with holistic view of design process and
    comfortable with technologies ranging from
    hardware, software to formal methods
  • Maturation of behavioral synthesis tools and
    other tools has enabled this kind of unified view
    of hardware/ software co-design.
  • Design efforts now focus at higher levels of
    abstraction gt abstract specifications now
    refined into programs and then into gates and
    logic.
  • There is no fundamental difference of between
    what hardware and software can implement.

21
Designer Productivity
  • The Mythical Man Month by Frederick Brooks 75
  • More designers on team gt lower productivity
    because of increasing communication costs between
    groups
  • Consider 1M transistor project- Say, a designer
    has productivity of 5000 transistor/mth- Each
    extra designer gt decrease of 100 transistor/mth
    productivity in group due to comm. costs
  • 1 designer 1M/5000 200mth
  • 10 designer 1M/(104100) 24.3mth
  • 25 designer 1M/(252600) 15.3mth
  • 27 designer 1M/(272400) 15.4mth
  • Need new design technology to shrink the design
    gap

Source Embedded System Design Frank Vahid/ Tony
Vargis (John Wiley Sons, Inc.2002)
22
Plan
  • Embedded Systems
  • New Approaches to building ESW
  • New paradigms Lava, Handel-C
  • Examples Engineering Returns to Software
  • Build a RISC processor in 48hrs
  • Advantages of reconfigurable hardware.
  • Real-time support for ESW

23
Design Productivity Gap
  • Designer productivity has grown over the last
    decade
  • Rate of improvement has not kept pace with the
    chip-capacity growth
  • 1981 leading edge chip
  • 100 designers 100 trans/mth gt 10k trans
    complexity
  • 2002 leading edge chip
  • 30k designer mth 5k trans/mth gt 150M trans
    complexity
  • Designers at avg. of 10k pmgt cost of building
    leading edge chips has gone from 1M in 1981 to
    300M in 2002
  • Need paradigm shift to cope with the complexities
    of system design

24
Lava
  • Not so much a hardware description language
  • More a style of circuit description
  • Emphasises connection patterns
  • Think of Lego

25
Lava
  • Mary Sheeran, Koen Classen, Satnam
    SinghChalmers University (Sweden)
  • Based on earlier work on MuFP to describe circuit
    functionality and layout in single language
  • Built using functional programming paradigm

26
Behaviour and Structure
g
f
f -gt- g
27
Lava Properties
  • Higher-order functions
  • Circuits are functions
  • May be passed as arguments to other functions.
  • gt Easier to produce parameterized circuits than
    with VHDL.
  • Functions can return circuits as results
  • Circuit combinators take circuits as arguments,
    return circuits as results.
  • gt Powerful glue for composing circuits to form
    larger systems.
  • Circuit combinators combine behavior layout
  • Combinators lay out circuits in rows, columns,
    triangles, trees etc.
  • Performance of circuit
  • Improved by exploring the layout design space by
    experimenting with alternative layout
    combinators.
  • Examples of circuits produced
  • High speed constant coefficient multipliers,
    finite impulse response filters (1D and 2D),
    adder tree networks and sorting butterfly
    networks.

28
Parallel Connection Patterns
f -- g
29
map f
30
Four Sided Tiles
31
Column
32
Full Adder
cout
b
sum
a
cin
fa (cin, (a,b)) (sum, cout)
where part_sum
xor (a, b) sum xorcy
(part_sum, cin) cout
muxcy (part_sum, (a, cin))
33
Generic Adder
adder col fa
34
Top Level
adder16Circuit do a lt- inputVec a
(bit_vector 15 downto 0) b lt- inputVec
b (bit_vector 15 downto 0) (s, carry)
lt- adder4 (a, b) sum lt- outputVec sum
s (bit_vector 16 downto 0) ? circuit2VHDL
add16 adder16Circuit ? circuit2EDIF add16
adder16Circuit ? circuit2Verilog add16
adder16Circuit
35
Xilinx FPGA Implementation
  • 16-bit implementation on a XCV300 FPGA
  • Vertical layout required to exploit fast carry
    chain
  • No need to specify coordinates in HDL code

36
16-bit Adder Layout
Source Mary Sheeran Nov.2002
37
Four adder trees
Source Mary Sheeran Nov.2002
38
No Layout Information
Source Mary Sheeran Nov.2002
39
Plan
  • Embedded Systems
  • New Approaches to building ESW
  • New paradigms Lava, Handel-C
  • Examples Engineering Returns to Software
  • Build a RISC processor in 48hrs
  • Advantages of reconfigurable hardware.
  • Real-time support for ESW

40
Handel-C
  • Programming language- enables compilation of
    programs into synchronous hardware
  • NOT Hardware Description Language- its a prog.
    language aimed at compiling high-level algorithms
    into gate-level hardware
  • Syntax (loosely) based on C
  • Handel-C is to hardware (gates) what C is to
    micro-assembly code

41
Handel-C (cont.)
  • Inventor - Ian Page, Programming Research Group
    (Oxford University/UK)
  • Semantics based on Hoares Communication Seq.
    Processes (CSP) model
  • Occam transputer prog. language
  • Industry heavyweights using tools Marconi,
    Ericcson, BAe, Creative Labs, etc.

42
What this means
  • Hardware design produced is exactly the hardware
    specified in source program
  • No intermediate interpreting layer as in
    assembly language targeting general purpose
    microprocessor
  • Logic gates are assembly instructions of Handel-C
    system
  • Design/re-design/optimise at software level!!!

43
What This Means
  • True parallelism
  • not time-shared (interpreted) parallelism of
    gen.purpose computers
  • PAR ab
  • instructions executed in // at same instant of
    time by 2 sep. pcs of hw
  • Timing
  • branches that complete early forced to wait for
    slowest branch before continuing

44
Comparison with C
  • Similar- Programs inherently sequential-
    Similar control-flow constructs if-then-else,
    switch, while, for, etc.
  • Dissimilar - No malloc/ dynamic store
    allocation- No recursion (limited rec. in
    macros)- No nested procedures- No stdin/stdout
    - Void main()- variable width words- PAR, etc.

45
Handel-C is based on
  • ANSI-standard C without external
    library-functions
  • I/O functions printf(), putc(), scanf(),...
  • File functions fopen(), fclose(), fprintf(), ...
  • String-functions length(), strcpy(), strcmp(),
  • Math-functions sin(), cos(), sqrt(),
  • ...

46
Supported declarationsstatements instructions
  • Main program structure
  • Variables
  • Arrays
  • Switch statement
  • FOR Loop
  • Comments
  • Constants
  • Scope Variable sharing
  • Arithmetic, Relational, Relational Logic ops
  • Conditional Execution
  • While loop
  • Do While Loop

47
Channel Communication
  • link!v link?v
  • channel input is form of assignment
  • Provides link between parallel (//) branches
  • One // branch outputs data onto channel
  • Other // branch reads data from channel
  • gt Synchronisation
  • data transfers only when both processes are ready

48
Additional Features Statements
  • Channel
  • unsigned int 8 a
  • chan unsigned int 8 c
  • c ! 5
  • c ? A

49
Additional Features Statements
  • Prialt
  • prialt
  • case CommsStatement
  • Statement
  • break
  • ...
  • default
  • Statement
  • break

50
Example 1 (sum)
IMPORTANT width!!
  • Void main()
  • unsigned int 16 sum // variable width word
  • unsigned int 8 data
  • chanin input // input/output
  • chanout output
  • sum0
  • do
  • input?data
  • sum sum (0_at_data)
  • while (data!0)
  • output!sum

51
Example 2 (divider)
  • define DATA_WIDTH 16
  • Void main(void)
  • unsigned int DATA_WIDTH a, mult, result
  • unsigned int (DATA_WIDTH2 -1) b
  • chanin input
  • chanout output
  • while (1)
  • input?a
  • input?result b result _at_ 0
  • mult 1ltlt (DATA_WIDTH-1)
  • result 0
  • ltltltltlt MAIN LOOP gtgtgtgtgt
  • output ! Result

result integer(a / b)
52
Example 2 (cont.)
  • while (mult ! 0)
  • if (0 _at_ a) gt b)
  • par
  • a - b lt- width(a)
  • result ! mult
  • par
  • b b gtgt 1
  • mult mult gtgt 1

53
Example 3
Link0
Link1
input
output
State0
State1
State2
Parallel tasks Comm between tasks Array of
variables Array of channels Parameterised on width
  • Void main(void)
  • chan unsigned int undefined link2
  • chanin unsigned int 8 input
  • chanout unsigned int 8 output
  • unsigned int undefined state3
  • par
  • while (1) // first queue location
  • input ? State0
  • link0 ! State0
  • while (1) // second queue location
  • link0 ? State1
  • link1 ! State1
  • while (1) // third queue location
  • link1 ? State2
  • output ! State2

54
Additional Features Statements
  • Timing
  • An assignment statement takes exactly one clock
    cycle to execute. Everything else is free
  • void main(void)
  • unsigned 8 x, y
  • x x y

55
Timing/efficiency issues
  • One clock source for entire program- Assignment
    delay take one clock cycle- Expressions are
    for free
  • Handel-C designed such that experienced
    programmer can immediately tell which
    instructions execute on which clock cycles
  • Example x y x (((yz) (wv)
    )ltlt2)lt-7both statements take one clock cycle
  • Clock at longest logic depthgt reduce the depth
    of logic to speed up programgt pipelining

56
Porting C to Handel-C
  • Decide how software maps to hardware platform
  • Partition algorithm between multiple FPGAs
  • Port C to Handel-C use simulator to check
    correctness
  • Modify code to take advantage of extra operators
    in Handel-C - simulate to ensure correctness
  • Add fine-grain parallelism through PAR parallel
    assignments or parallellise algorithm - simulate
  • Add hardware interfaces for target architecture
    map simulator channels communications onto these
    interfaces - simulate
  • Use FPGA place route tools to generate FPGA
    images

57
Design Flow Overview
Port algorithm to Handel-C
Compile program to .net file for simulator
Modify/ debug program
Use simulator to evaluate and debug design
Add interfaces to external hardware
Use Handel-C compiler to target h/w netlist
Use FPGA tools to place route netlist
Program FPGA with result of place route
58
Essence
  • Software approach allows us to rapidly prototype
    applications for a given domain
  • Handel-C provides a seamless approach toderive
    expressive and fast implementations from the
    software level
  • Cost of silicon is falling shortage of trained
    engineers high cost of programmer time gt
    Software based, high-level approaches to solving
    problems become increasingly attractive.

59
Handel-C Concepts (Recap)
  • Describes hardware - h/w design produced h/w
    in source program
  • Logic gates are assembly instructions of Handel-C
    system
  • Real parallelism not interpreted
  • Assignment, delay take 1 clock cycleExpression
    evaluation is free
  • No side-effectsI.e. a is statement (not
    expression as in C)
  • Variable width words gt great performance
    improvement over softwareMin. datapath widths gt
    minimal h/w usage

60
Additional Features Statements
  • Concurrency
  • ...
  • par

61
Concurrency (example)
  • void main(void)
  • unsigned 8 x, y
  • unsigned 5 temp1
  • unsigned 4 temp2
  • ...
  • temp1 (0_at_(x lt- 4)) (0_at_(y lt- 4))
  • temp2 (x \\ 4) (y \\ 4)
  • x (temp2 (0_at_temp14)) _at_ temp130

62
Additional Features Statements
  • Concurrency
  • ...
  • par
  • temp1(0_at_(xlt-4))(0_at_(ylt-4))
  • temp2(x\\4)(y\\4)
  • x(temp2(0_at_temp14))_at_temp130
  • ...

63
Features Statements (contd.)
  • Delay
  • ...
  • par
  • x 1
  • delay
  • x2
  • while (x 0) delay

64
Additional Features Statements
  • Channel
  • unsigned int 8 a
  • chan unsigned int 8 c
  • c ! 5
  • c ? A
  • Single variable must not be accessed by gt1 //
    branchgt
  • par
  • out!3
  • out!4
  • // illegal

65
Features Statements(contd.)
  • Macros(Examples - contd)
  • Combinatorial
  • macro expr abs(a) ((a) width(a)-1 0 ? (a)
    (-a))
  • shared expr incwrap(e, m) (((em) ? 0
    (e)1)
  • Recursive
  • macro expr copy (e, n)
  • select(n1, (e), copy(e, n/2) _at_ copy(e,
    n-(n/2)))

66
Features Statements(contd)
  • Operators for Bit Manipulation
  • z x lt- 2 // Take least significant bits
  • z y \\ 2 // Drop least significant bits
  • z x _at_ y // Concatenation
  • z x3 // Bit selection
  • z y23 // Bus selection
  • z width(x) // Width of expression
  • Note in the form ymn the order is MSBLSB
  • Unsigned int 3 y 4
  • y0 is 0
  • y2 is 1

67
Additional Features Statements
  • External RAM / ROM
  • ram unsigned int 4 ExtRAM8 with offchip 1,
  • data "P01", "P02", "P03", "P04",
  • addr "P05", "P06", "P07",
  • we "P08", oe "P09", cs "P10"
  • rom unsigned int 4 ExtROM8 with offchip 1,
  • data "P01", "P02", "P03", "P04",
  • addr "P05", "P06", "P07",
  • we , oe "P09", cs "P10"

68
Additional Features Statements
  • Internal RAM / ROM
  • ram unsigned int 8 speicher256
  • rom unsigned int 8 program 1,2,3,4
  • unsigned char i
  • i 3
  • speicheri 25
  • for (i 0 i lt 4 i) stdout ! programi

69
Recursive Macro Expressions Example
  • Illustrates the generation of large quantities of
    hardware from simple macros.
  • Multiplier whose width depends on the parameters
    of the macro.
  • Starting point for generating large regular
    hardware structures using macros.
  • Single-cycle long multiplication from single
    macro
  • macro expr multiply(x, y) select(width(x)
    0,
  • 0, multiply(x \\ 1, y ltlt 1) (x0
    1 ? y 0))
  • a multiply (b , c)

70
Timing
71
Additional Features Statements
  • Off-Chip Interface
  • Input, registered Input, latched Input
  • Output
  • Tristate Bus
  • Off-Chip Interface (examples)
  • interface bus_in (int 4) InBus() with
  • data "P1", "P2", "P3", "P4"
  • int 4 x
  • x InBus.in
  • interface bus_out () OutBus (xy) with
  • data "P11", "P12", "P13", "P14"

72
Parallel Access to Variables
  • Rules of parallelism same variable must not be
    accessed from two separate parallel branches.
    (to avoid resource conflicts on the variables)
  • Actually, the same variable must not be assigned
    to more than once on the same clock cycle but may
    be read as often as required (see wires!)
  • Allows some useful and powerful programming
    techniques. eg
  • par
  • a b
  • b a
  • // swaps values of a and b in single
    clock cycle.

73
Parallel Access to Variables
  • Four place queue
  • while(1)
  • par
  • int x3
  • x0 in
  • x1 x0
  • x2 x1 // values at out delayed
  • out x2 // by 4 clock cycles

74
Time Efficiency of Handel-C Hardware
  • RequirementClock period for program to be
    longer than longest path thru combinatorial logic
    in whole program.
  • gt once FPGA place and route is done, max.
    clock-rate 1/longest-path-delay
  • ExampleFPGA place and route tools calculate
    longest path delay between flip-flops in a design
    is 70nS.
  • The max. clock rate is 1/70nS 14.3MHz.Speed
    allowed by system 400kHz - 100MHz
  • BUT WHAT IF THIS IS NOT FAST ENOUGH

75
Improving Time Efficiency
  • Reducing Logic DepthAvoid multiplication, avoid
    wide-adders, reduce complex expressions into
    stages, etc.
  • unsigned 8 x
  • unsigned 8 y
  • unsigned 5 temp1
  • unsigned 4 temp2
  • par
  • temp1 (0_at_(xlt-4)) (0_at_(ylt-4))
  • temp2 (x \\ 4) (y \\ 4)
  • x (temp2(0_at_temp14)) _at_ temp130
  • Pipelining gt increased latency for higher
    throughput

76
Serialisation
  • Multiplication in more than one clockcycle in
    order to save hardware
  • Algorithm is parametrizable by a compile-time
    constant
  • macro proc mult_serial(x, y, xy)
  • macro expr count_width 5
  • macro expr steps 1 ltlt count_width
  • macro expr bits width(xy) / steps
  • unsigned count_width count
  • par xy 0 count 0
  • do par
  • xy (0 _at_ (x lt- bits)) y
  • x gtgt bits
  • y ltlt bits
  • count
  • while (count ! 0)

77
Serialisation
  • Gatecount for a 32-bit multiplication

78
Plan
  • Embedded Systems
  • New Approaches to building ESW
  • New paradigms Lava, Handel-C
  • Examples (Engineering Returns to Software
  • Build a RISC processor in 48hrs
  • Advantages of reconfigurable hardware.
  • Real-time support for ESW

79
RISC-Processor
  • Features
  • 16 instructions
  • 4 bit I/O Ports
  • one accumulator
  • Program memory (16x8 ROM)
  • Data memory (16x4 RAM)
  • ProblemExecute a program stored in ROM to
    calculate the first few members of the Fibonacci
    number sequence.1, 2, 3, 5, 8, 13, 21, 34,
    fib(n) 1 if n0 V n1fib(n) fib(n-1)
    fib(n-2) if ngt2

80
RISC-Processor
  • Instruction Set

81
RISC-Processor (cont.)
  • Program
  • chanin input
  • chanout output
  • // Parameterisation
  • define dw 32 / Data width /
  • define opcw 4 / Op-code width /
  • define oprw 4 / Operand width /
  • define rom_aw 4 / Width of ROM address bus /
  • define ram_aw 4 / Width of RAM address bus /
  • // The opcodes
  • define HALT 0
  • define LOAD 1
  • define LOADI 2
  • define STORE 3
  • define ADD 4
  • define SUB 5
  • define JUMP 6

82
RISC-Processor(cont.)
  • I/O Interface
  • unsigned int dw output
  • interface bus_clock_in (unsigned int 1) reset()
    with data reset_pin
  • interface bus_in (unsigned int dw) input() with
    data in_pins
  • interface bus_out () out(output) with data
    out_pins
  • Definition of available opcode
  • define HLD 0
  • define NOP 1
  • define OUT 2
  • define IN 3
  • ...
  • define SRA 15

83
RISC-Processor
  • Declaration of FPGA and Pinning
  • set family Altera10K
  • set part "EPF10K70RC240-3"
  • set clock external "91"
  • macro expr in_pins "38", "83", "101", "148"
  • macro expr out_pins "153", "202", "218",
    "19"
  • macro expr reset_pin "45"
  • Defining Parameters
  • define dw 4 / Data width /
  • define opcw 4 / Op-code width /
  • define oprw 4 / Operand width /
  • define rom_aw 4 / Width of ROM addr bus /
  • define ram_aw 4 / Width of RAM addr bus /

84
RISC-Processor (cont.)
  • Program (cont)
  • // Rom program data
  • rom unsigned int undefined program
  • _asm_(LOADI, 1), / 0 / / Get a one /
  • _asm_(STORE, 3), / 1 / / Store this /
  • _asm_(STORE, 1), / 2 /
  • _asm_(INPUT, 0), / 3 / / Read value from user
    /
  • _asm_(STORE, 2), / 4 / / Store this /
  • _asm_(LOAD, 1), / 5 / / Loop entry point /
  • _asm_(ADD, 0), / 6 / / Make a fib number /
  • _asm_(STORE, 0), / 7 / / Store it /
  • _asm_(OUTPUT, 0), / 8 / / Output it /
  • _asm_(ADD, 1), / 9 / / Make a fib number /
  • _asm_(STORE, 1), / a / / Store it /
  • _asm_(OUTPUT, 0), / b / / Output it /
  • _asm_(LOAD, 2), / c / / Decrement counter /
  • _asm_(SUB, 3), / d /
  • _asm_(JUMPNZ, 4), / e / / Repeat if not zero
    /

85
RISC-Processor (cont.)
  • Program (cont)
  • / RAM for processor /
  • ram unsigned int dw data1 ltlt ram_aw
  • / Processor registers /
  • unsigned int rom_aw pc / Program counter /
  • unsigned int (opcwoprw) ir / Instruction
    register /
  • unsigned int dw x / Accumulator /
  • / Macros to extract opcode and operand fields /
  • define opcode (ir lt- opcw)
  • define operand (ir \\ opcw)

86
RISC-Processor (cont.)
  • Program (cont)/ Main program /
  • void main(void)
  • pc 0
  • // Processor loop
  • do
  • // fetch
  • par
  • ir programpc
  • pc pc 1
  • / MAIN DECODE/EXECUTE /
  • while (opcode ! HALT)
  • / main program /

87
RISC-Processor (cont.)
  • Program (cont)
  • // decode and execute
  • switch (opcode)
  • case LOAD x dataoperandlt-ram_aw break
  • case LOADI x 0 _at_ operand break
  • case STORE dataoperandlt-ram_aw x break
  • case ADD x xdataoperandlt-ram_aw break
  • case SUB x x-dataoperandlt-ram_aw break
  • case JUMP pc operandlt-rom_aw break
  • case JUMPNZ if (x!0) pcoperandlt-rom_aw
    break
  • case INPUT input ? x break
  • case OUTPUT output ! x break
  • default while(1) delay // unknown opcode

88
RISC-Processor (cont.)
  • The Final Program!

89
Simulation debugging
  • The simulator is integrated into the compiler.
  • Executing a cycle-based simulation.
  • Variables are traceable at any clock cycle.
  • Port interface will be replaced by standard I/O.
  • Handel-C simulator supports debugging at any
    clock-cycle.
  • Highlighting of characteristic Values e.g. Area
    of any program line.

90
Some Recent Work
  • Customising Graphics ApplicationsTechniques
    Programming InterfaceHenry Styles Wayne Luk,
    Proceedings of IEEE Symposium on Field
    Programmable Custom Computing Machines, IEEE
    Computer Society Press, 2000.
  • Exploit custom data-formats and datapath
    widthsto optimise graphics operations such as
    texture mapping hidden-surface removal.
  • Discusses techniques for balancing graphics
    pipeline
  • Customised architectures captured in
    Handel-Ccompiled for Xilinx Virtex FPGAs
  • Handel-C API based on OpenGL standardfor
    automatic speedup of graphics applications,
    include Quake-2 action game.

91
The Graphics Pipeline
92
Performance Case Studies
  • Geometric Visualisation

Nvidia is a 3-D graphics chipset I.e.
specialised graphics ASIC Chart gt FPGA platform
fast approaching performance of dedicated
graphics ASICfor gen. Purpose graphics
applications
93
Performance Case Studies
  • Infrared Simulation requires custom pixel format
    not supported by graphics ASICs

Onyx contains two 180 MHz MIPs processors, two
Geometry Engine processors and two rasteriser
ASICs, with a memory Bandwidth of 6.4 GB/sec
(I.e. 10X cost mem.b/w of FPGA
94
Performance Case Studies
  • Quake-2 benchmark requires custom pixel format
    not supported by graphics ASICs

Bottleneck is PCI-bus speed limitation. Improve
performance by moving FPGA to AGP slot allowing
1GB/sec transfers between graphics h/w and memory
95
Some Observations
  • FPGA renderer is a low-cost platform for custom
    graphics applications
  • Development time of a customised FPGA renderer
    comparable to optimised softwaregt effective to
    use a reconfigurable platform
  • Good for reconfigurable designs where ASIC is not
    available or too expensive
  • Useful in exploring desirable algorithms and
    architectures for ASICs
  • Hardware renderer may be customised to maximixe
    performance for each application

96
Some Features of the Rapid Prototyping Board
  • Full length 32 bit PCI card
  • Virtex XCV1000 1.000.000 system gates,
  • 131 kBit Block RAM, 393 kBit SelectRAM
  • Programmable clock 400 kHz to 100 MHz
  • 4 banks of fast asynchronous 32 bit wide SRAM,
    each 2 Mbytes
  • PCI interface 32 bit, 33 MHz, 132 Mbytes/sec
    burst
  • 2 x PMC sites for VME grade I/O processing
    modules
  • 50 pin Aux I/O, 8 LEDs

97
Summary
  • Cost of silicon is falling Products are getting
    more complex Time-to-market shrinking rapidly
    shortage of trained engineers cost of
    programmer time is major constraint gtSoftware
    based, high-level approaches to solving problems
    become increasingly attractive.
  • New generation of languages let us build systems
    at high level of abstraction.
  • High-density FPGAs and SoCs allow complex
    designs to be rapidly prototyped gt reduce the
    development cycle of new technology perhaps
    even to deploy final product as soft cores.
  • Broader understanding demanded from system
    designer need Renaissance Engineer with
    equal understanding of hardware and software.

98
Plan
  • Embedded Systems
  • New Approaches to building ESW
  • Real-Time Support
  • Special Characteristics of Real-Time Systems
  • Real-Time Constraints
  • Canonical Real-Time Applications
  • Scheduling in Real-time systems
  • Operating System Approaches

99
What is real about real-time?
  • computer world real world
  • e.g., PC industrial system, airplane
  • average response for user,
    events occur in environment at own speed
  • interactive
  • occasionally longer reaction too slow
    deadline miss
  • reaction user annoyed reaction
    damage, pot. loss of human life
  • computer controls speed of user computer
    must follow speed

  • of environment

computer time real-time
100
They Why real-time, why not simply fast?
  • Fast enough dependent on system and its
    environment and
  • turtle fast enough to eat salad
  • mouse fast to enough steal cheese
  • fly fast enough to escape

101
  • what if environment changes?systems not fast
    enough
  • mouse trap
  • fly-swatter

time scale depends on - or dictated by -
environment cannot slow down environment is the
real world
102
Real-Time Systems
  • A real-time system is a system that reacts to
    events in the environment by performing
    predefined actions

within specified time intervals.
103
Flight Avionics
  • CLIENT SERVER

Constraints on responses to pilot inputs,
aircraft state updates
104
  • Constraints
  • Keep plastic at proper temperature (liquid, but
    not boiling)
  • Control injector solenoid (make sure that the
    motion of the piston reaches the end of its
    travel)

105
Real-Time Systems Properties of Interest
  • Safety Nothing bad will happen.
  • Liveness Something good will happen.
  • Timeliness Things will happen on time -- by
    their deadlines, periodically, ....

106
In a Real-Time System.
Correctness of results depends on valueand its
time of delivery
  • correct value delivered too late is incorrect
  • e.g., traffic light light must be green when
    crossing, not enough before
  • Real-time
  • (Timely) reactions to events as they occur, at
    their pace(real-time) system (internal) time
    same time scale as environment (external) time

107
Performance Metrics in Real-Time Systems
  • Beyond minimizing response times and increasing
    the throughput
  • achieve timeliness.
  • More precisely, how well can we predict that
    deadlines will be met?

108
Types of RT Systems
  • Dimensions along which real-time activities can
    be categorized
  • how tight are the deadlines? --deadlines are
    tight when the laxity (deadline -- computation
    time) is small.
  • how strict are the deadlines? what is the value
    of executing an activity after its deadline?
  • what are the characteristics of the environment?
    how static or dynamic must the system be?
  • Designers want their real-time system to be fast,
    predictable, reliable, flexible.

109
Hard, soft, firm
  • Hardresult useless or dangerousif deadline
    exceeded
  • Softresult of some - lower -value if deadline
    exceeded
  • Firm
  • If value drops to zero at deadline

-
Deadline intervalsresult required not
laterand not before
110
Examples
  • Hard real time systems
  • Aircraft
  • Airport landing services
  • Nuclear Power Stations
  • Chemical Plants
  • Life support systems
  • Soft real time systems
  • Mutlimedia
  • Interactive video games

111
Real-Time Items and Terms
  • Task
  • program, perform service, functionality
  • requires resources, e.g., execution time
  • Deadline
  • specified time for completion of, e.g., task
  • time interval or absolute point in time
  • value of result may depend on completion time

112
Plan
  • Special Characteristics of Real-Time Systems
  • Real-Time Constraints
  • Canonical Real-Time Applications
  • Scheduling in Real-time systems
  • Operating System Approaches

113
Timing Constraints
  • Real-time means to be in time --- how do we know
    something is in time?how do we express that?
  • Timing constraints are used to specify temporal
    correctnesse.g., finish assignment by 2pm, be
    at station before train departs.
  • A system is said to be (temporally) feasible, if
    it meets all specified timing constraints.
  • Timing constraints do not come out of thin
    airdesign process identifies events, derives,
    models, and finally specifies timing constraints

114
  • Periodic
  • activity occurs repeatedly
  • e.g., to monitor environment values, temperature,
    etc.

period
115
  • Aperiodic
  • can occur any time
  • no arrival pattern given

aperiodic
aperiodic
116
  • Sporadic
  • can occur any time, but
  • minimum time between arrivals

mint
sporadic
117
Who initiates (triggers) actions?
  • Example Chemical process
  • controlled so that temperature stays below danger
    level
  • warning is triggered before danger point
  • so that cooling can still
    occur
  • Two possibilities
  • action whenever temp raises above warn event
    triggered
  • look every int time intervals action when temp
    if measures above warn time triggered

118
t
time
TT
ET
119
t
time
TT
ET
120
ET vs TT
  • Time triggered
  • Stable number of invocations
  • Event triggered
  • Only invoked when needed
  • High number of invocation and computation demands
    if value changes frequently

121
Slow down the environment?
  • Importance
  • which parts of the system are important?
  • importance can change over timee.g., fuel
    efficiency during emergency landing
  • Flow controlwho has control over speed of
    processing, who can slow partner down?
  • environment
  • computer system
  • RT environment cannot be slowed down

122
Other Issues to worry about
  • Meet requirements -- some activities may run
    only
  • after others have completed - precedence
    constraints
  • while others are not running - mutual exclusion
  • within certain times - temporal constraints
  • Scheduling
  • planning of activities, such that required timing
    is kept
  • Allocation
  • where should a task execute?

123
Plan
  • Special Characteristics of Real-Time Systems
  • Real-Time Constraints
  • Canonical Real-Time Applications
  • Scheduling in Real-time systems
  • Operating System Approaches

124
A Typical Real time system
Temperature sensor
Input port
CPU
Memory
Output port
Heater
125
Code for example
While true do read temperature sensor if
temperature too high then turn off heater
else if temperature too low
then turn on heater
else nothing
126
Comment on code
  • Code is by Polling device (temperature sensor)
  • Code is in form of infinite loop
  • No other tasks can be executed
  • Suitable for dedicated system or sub-system only

127
Extended polling example
Conceptual link
Temperature Sensor 1
Task 1
Heater 1
Temperature Sensor 2
Heater 2
Task 2
Computer
Temperature Sensor 3
Heater 3
Task 3
Temperature Sensor 4
Heater 4
Task 4
128
Polling
  • Problems
  • Arranging task priorities
  • Round robin is usual within a priority level
  • Urgent tasks are delayed

129
Interrupt driven systems
  • Advantages
  • Fast
  • Little delay for high priority tasks
  • Disadvantages
  • Programming
  • Code difficult to debug
  • Code difficult to maintain

130
How can we monitor a sensor every 100 ms
Initiate a task T1 to handle the
sensor T1 Loop Do sensor task T2 Schedule
T2 for 100 ms Note that the time could be
relative (as here) or could be an actual time -
there would be slight differences between the
methods, due to the additional time to execute
the code.
131
An alternative
Initiate a task to handle the sensor T1 T1 Do
sensor task T2 Repeat Schedule T2 for n 100
ms nn1 There are some subtleties here...
132
Clock, interrupts, tasks
Interrupts
Clock
Processor
Examines
Job/Task queue
Task 1
Task 2
Task 3
Task 4
Tasks schedule events using the clock...
133
Flight Simulator
  • CLIENT SERVER

134
Time Periods to meet Timing Requirements
  • CLIENT SERVER

Requirement
Choice Made
Rationale
135
Time Periods to meet Timing Requirements
  • CLIENT SERVER

Requirement
Choice Made
Rationale
136
Time Periods to meet Timing Requirements

137
Controlling a reaction
  • we know
  • if temperature too high, it explodes
  • maximum rate of temperature increase
  • rate of cooling
  • events
  • temperature change
  • temperature gt safe threshold
  • we can derive
  • how often we have to check temperature
  • when we have to finish cooling

138
(No Transcript)
139
Example Injection Molding (cont.)
  • Timing constraints

140
Example Injection Molding (cont.)
  • Concurrent control tasks

141
Plan
  • Special Characteristics of Real-Time Systems
  • Real-Time Constraints
  • Canonical Real-Time Applications
  • Scheduling in Real-time systems
  • Operating System Approaches

142
Why is scheduling important?
  • Definition
  • A real-time system is a system that reacts to
    events in the environment by performing
    predefined actions within specified time
    intervals.

143
Schedulability analysis
  • a.k.a. feasibility checking
  • check whether tasks will meet their
  • timing constraints.

144
Scheduling Paradigms
  • Four scheduling paradigms emerge, depending on
  • whether a system performs schedulability
    analysis
  • if it does,
  • whether it is done statically or dynamically
  • whether the result of the analysis itself
    produces
  • a schedule or plan according to which
  • tasks are dispatched at run-time.

145
1. Static Table-Driven Approaches
  • Perform static schedulability analysis by
    checking if a schedule is derivable.
  • The resulting schedule (table) identifies the
    start times of each task.
  • Applicable to tasks that are periodic (or have
    been transformed into periodic tasks by well
    known techniques).
  • This is highly predictable but, highly
    inflexible.
  • Any change to the tasks and their characteristics
    may require a complete overhaul of the table.

146
2. Static Priority Driven Preemptive
Approaches
  • Tasks have -- systematically assigned -- static
    priorities.
  • Priorities take timing constraints into
    account
  • e.g. rate-monotonic
  • the lower the period, the higher the priority.
  • Perform static schedulability analysis but no
    explicit schedule is constructed
  • RMA - Sum of task Utilizations lt ln 2.
  • Task utilization computation-time / Period
  • At run-time, tasks are executed
    highest-priority-first, with preemptive-resume
    policy.
  • When resources are used, need to compute
    worst-case blocking times.

147
Static PrioritiesRate Monotonic Analysis
  • presented by Liu and Layland in 1973
  • Assumptions
  • Tasks are periodic with deadline equal to
    period.Release time of tasks is the period start
    time.
  • Tasks do not suspend themselves
  • Tasks have bounded execution time
  • Tasks are independent
  • Scheduling overhead negligible

148
RMA Design Time vs. Run Time
  • At Design Time
  • Tasks priorities are assigned according to their
    periods shorter period means higher priority
  • Schedulability test
  • Taskset is schedulable if
  • Very simple test, easy to implement.
  • Run-time
  • The ready task with the highest priority is
    executed.

149
  • RMA Example
  • taskset t1, t2, t3, t4
  • t1 (3, 1)
  • t2 (6, 1)
  • t3 (5, 1)
  • t4 (10, 2)
  • The schedulability test
  • 1/3 1/6 1/5 2/10 4 (2(1/4) - 1) ?
  • 0.9 lt 0.75 ?
  • . not schedulable

150
  • RMA
  • A schedulability test is
  • Sufficient
  • there may exist tasksets that fail the test,
    but are schedulable
  • Necessary
  • tasksets that fail are (definitely) not
    schedulable
  • The RMA schedulability test is sufficient, but
    not necessary.
  • e.g., when periods are harmonic,
  • i.e., multiples of each other,
    utilization can be 1.

151
Exact RMA
  • by Joseph and Pandya, based on critical instance
    analysis
  • (longest response time of task, when it is
    released at same time as all higher priority
    tasks)
  • What is happening at the critical instance?
  • Let T1 be the highest priority task. Its response
    time
  • R1 C1 since it cannot be preempted
  • What about T2 ?R2 C2 delays due to
    interruptions by T1.
  • Since T1 has higher priority, it has
    shorter period. That means it will interrupt T2
    at least once, probably more often. Assume T1 has
    half the period of T2, R2 C2 2 x C1

152
Exact RMA.
  • In general
  • Rni denotes the nth iteration of the response
    time of task i
  • hp(i) is the set of tasks with higher priority as
    task i

153
Example - Exact Analysis
  • Let us look at our example, that failed the pure
    rate monotonic test, although we can schedule it
  • Exact analysis says so.
  • R1 1 easy
  • R3, second highest priority taskhp(t3) T1
    R3 2

154
  • R2, third highest priority taskhp(t2) T1 ,T3
    R2 3

155
  • R4, third lowest priority taskhp(t4) T1 ,T3
    ,T2
  • R4 9
  • Response times of first instances of all tasks
    lt their periods
  • gt taskset feasible under RM scheduling

156
3. Dynamic Planning based Approaches
  • Feasibility is checked at run-time -- a
    dynamically arriving task is accepted only if it
    is feasible to meet its deadline.
  • Such a task is said to be guaranteed to meet its
    time constraints
  • One of the results of the feasibility analysis
    can be a schedule or plan that determines start
    times
  • Has the flexibility of dynamic approaches with
    some of the predictability of static approaches
  • If feasibility check is done sufficiently ahead
    of the deadline, time is available to take
    alternative actions.

157
4. Dynamic Best-effort Approaches
  • The system tries to do its best to meet
    deadlines.
  • But since no guarantees are provided, a task may
    be aborted during its execution.
  • Until the deadline arrives, or until the task
    finishes, whichever comes first, one does not
    know whether a timing constraint will be met.
  • Permits any reasonable scheduling approach, EDF,
    Highest-priority,

158
Cyclic scheduling
  • Ubiquitous in large-scale dynamic real-time
    systems
  • Combination of both table-driven scheduling and
    priority scheduling.
  • Tasks are assigned one of a set of harmonic
    periods.
  • Within each period, tasks are dispatched
    according to a table that just lists the order in
    which the tasks execute.
  • Slightly more flexible than the table-driven
    approach
  • no start times are specified
  • In many actual applications, rather than making
    worse-case assumptions, confidence in a cyclic
    schedule is obtained by very elaborate and
    extensive simulations of typical scenarios.

159
Plan
  • Special Characteristics of Real-Time Systems
  • Real-Time Constraints
  • Canonical Real-Time Applications
  • Scheduling in Real-time systems
  • Operating System Approaches

160
Real-Time Operating Systems
  • Support process management and synchronization,
    memory management, interprocess communication,
    and I/O.
  • Three categories of real-time operating systems
  • small, proprietary kernels.
  • e.g. VRTX32, pSOS, VxWorks
  • real-time extensions to commercial timesharing
    operatin systems.
  • e.g. RT-Linux, RT-NT
  • research kernels
  • e.g. MARS, ARTS, Spring, Polis

161
Real-Time Applications Spectrum
Hard
Real-Time Operating System
VxWorks, Lynx, QNX, ...
Intime, HyperKernel, RTX
Windows CE
General-Purpose Operating System
Windows NT
Soft
162
Real-Time Applications Spectrum
Hard
Real-Time Operating System
VxWorks, Lynx, QNX, ... Intime, HyperKernel, RTX
Windows CE
Windows NT
General-Purpose Operating System
Soft
163
Embedded (Commercial) Kernels
  • Stripped down and optimized versions of
    timesharing operating systems.
  • Intended to be fast
  • a fast context switch,
  • external interrupts recognized quickly
  • the ability to lock code and data in memory
  • special sequential files that can accumulate
    data at a fast rate
  • To deal with timing requirements
  • a real-time clock with special alarms and
    timeouts
  • bounded execution time for most primitives
  • real-time queuing disciplines such as earliest
    deadline first,
  • primitives to delay/suspend/resume execution
  • priority-driven best-effort scheduling mechanism
    or a table-driven mechanism.
  • Communication and synchronization via mailboxes,
    events, signals, and semaphores.

164
Real-Time Extensions to General Purpose
Operating Systems
  • E.g., extending LINUX to RT-LINUX, NT to RT-NT
  • Advantage
  • based on a set of familiar interfaces
    (standards) that speed development and facilitate
    portability.
  • Disadvantages
  • Too many basic and inappropriate underlying
    assumptions still exist.

165
Using General Purpose Operating Systems
  • GPOS offer some capabilities useful for real-time
    system builders
  • RT applications can obtain leverage from existing
    development tools and applications
  • Some GPOSs accepted as de-facto standards for
    industrial applications

166
Real Time Linux approaches
  • Modify the current Linux kernel to handle RT
    constraints
  • Used by KURT
  • Make the standard Linux kernel run as a task of
    the real-time kernel
  • Used by RT-Linux, RTAI

167
Modifying Linux kernel
  • Advantages
  • Most problems, such as interrupt handling,
    already solved
  • Less initial labor
  • Disadvantages
  • No guaranteed performance
  • RT tasks dont always have precedence over non-RT
    ta
Write a Comment
User Comments (0)
About PowerShow.com