Title: System Software for Embedded Systems
1System Software for Embedded Systems
- Krithi Ramamritham
- Kavi Arya
- IIT Bombay
Embedded Systems Workshop 2007
2Embedded Systems?
3Embedded Systems
- Single functional e.g. pager, mobile phone
- Tightly constrained
- cost, size, performance, power, etc.
- Reactive real-time
- e.g. cars cruise controller
- delay in computation gt failure of system
4Hardware is not the whole System !!!
- A Micro-Electronic System is the result of a
projection of - Architecture
- Hardware
- Software
- distinguished by its gross Functional
Behaviour ! - Software is an important part of the Product and
must be part of the Design Process or we are
only designing a Component of the system.
5Why Is Embedded Software Not JustSoftware On
Small Computers?
- Embedded Dedicated
- Interaction with physical processes
- sensors, actuators, processes
- Critical properties are not all functional
- real-time, fault recovery, power, security,
robustness - Heterogeneity
- hardware/software tradeoffs, mixed architectures
- Concurrency
- interaction with multiple processes
- Reactivity
- operating at the speed of the environment
- These features look more like hardware!
SourceEdward A. Lee, UC Berkeley SRC/ETAB
Summer Study 2001
6What is Embedded SW?
- One definition
- Software that is directly in contact with, or
significantly affected by, the hardware that it
executes on, or can directly influence the
behavior of that hardware.
7What is Embedded SW?
- What is it not?
- Application software can be recompiled and
executed on any number of hardware platforms so
long as the basic services/libraries are
provided. - It is divided by vertical market segments
(application domains) - Well-established methodologies, architectures,
- HW platform independent, highly portable
- Any SW that has no direct relationship with HW.
8Embedded System Challenges for HW Folks
- PARADIGM CHANGE!
- Designers main tasks convert from processor
integration to performance analysis.
Concentration on functional requirements instead
of integration work - Concentration on architectural exploration
(including performance analysis - ? Re-use and Platform-based design become key!
- ? Early validation of system/solution correctness
- ? Parallel hardware and software development
- ? More effective use of previous work
- ? Faster ways to build new elements of a solution
- ? Ways to test more effectively, efficiently, and
quickly
9Software Guys can Learnfrom Hardware Experts!
- Concurrency
- the synchrony abstraction
- event-driven modeling
- Reusability
- cell libraries
- interface definition
- Reliability
- leveraging limited abstractions
- leveraging verification
- Heterogeneity
- mixing synchronous and asynchronous designs
- resource management
SourceEdward A. Lee, UC Berkeley SRC/ETAB
Summer Study 2001
10Trade-offs. Methodology ESW Architectural
specifics
- Portability
- ESW itself is intended to provide portability for
higher SW layers - (At least parts of) ESW is per definition not
portable - Real-time
- Restricted use of standardized Inter-process
communication (IPC) mechanisms (CORBA,) for
performance reasons - Typically hard real-time requirements
- RTOS dependency
- Implementation of OS like services
- Sometimes shielding of the RTOS to higher level
SW layers - Direct dependency on RTOS implementation
11Functional Design Mapping
SourceIan Phillips, ARM VSIA 2001
12The Embedded Market Disruptive Change
Source Jim Ready President / CEO MontaVista
Software
Traditional Embedded World Never small
enough Never fast enough Headless/Character-based
Standalone Boot Run from ROM More Hardware than
Software Low-Level Programming Model Application
tied to hardware
- Time to Market Pressures
- Shortage of Embed. SW Engineers
13Plan
- Embedded Systems
- New Approaches to building ESW
- New paradigms Lava, Handel-C
- Examples Engineering Returns to Software
- Build a RISC processor in 48hrs
- Advantages of reconfigurable hardware.
- Real-time support for ESW
14Motorola Software Survey Findings
- Hardware design is a software task IC designers
write code (VHDL, Verilog, Scripting)! - We must become a software-intensive embedded
system solutions company, focused on integrating
our platforms into users products -in the
future well be neither a hardware nor a software
company - Focus on developing systems capability, not just
a software counterpart to our current hardware
capability (though thats needed too) - We should have software content from drivers to
applications - The fundamental goal isnt 70 margin on software
products, its helping someone choose your total
solution - Embedded systems platforms and solutions will be
the key to market differentiation and profitable
growth
SourceBob Altizer, BASYS VSIA 2001
15Common Design Metrics
- NRE (Non-recurring engineering) cost
- Unit cost
- Size (bytes, gates)
- Performance (execution time)
- Power (more powergt more heat less battery
time) - Flexibility (ability to change functionality)
- Time to prototype
- Time to market
- Maintainability
- Correctness
- Safety (probability that system wont cause harm)
16Time to Market Design Metric
- Simplified revenue model
- Product life 2W, peak at W
- Time of market entry defines a triangle,
representing market penetration - Triangle area equals revenue
- Loss
- The difference between the on-time and delayed
triangle areas - Avg. time to market today 8 mth
- 1 day delay may amount to Ms
- see Sony Playstation vs XBox
Source Embedded System Design Frank Vahid/ Tony
Vargis (John Wiley Sons, Inc.2002)
17NRE and unit cost metrics
- Compare technologies by costs -- best depends on
quantity - Technology A NRE2,000, unit100
- Technology B NRE30,000, unit30
- Technology C NRE100,000, unit2
- But, must also consider time-to-market
Source Embedded System Design Frank Vahid/ Tony
Vargis (John Wiley Sons, Inc.2002)
18Losses due to delayed market entry
- Area 1/2 base height
- On-time 1/2 2W W
- Delayed 1/2 (W-DW)(W-D)
- Percentage revenue loss (D(3W-D)/2W2)100
- Try some examples
- Lifetime 2W52 wks, delay D4 wks
- (4(326 4)/2262) 22
- Lifetime 2W52 wks, delay D10 wks
- (10(326 10)/2262) 50
- Delays are costly!
Source Embedded System Design Frank Vahid/ Tony
Vargis (John Wiley Sons, Inc.2002)
19Trends
- Moores Law
- IC transistor capacity doubles every 18 mths
- 1981 leading edge chip had 10k transistors
- 2002 leading edge chip had 150M transistors
- 2007 leading edge chip has 1000M transistors
(90nm) - Designer productivity has improved due to better
tools - Compilation/Synthesis tools
- Libraries/IP
- Test/verification tools
- Standards
- Languages and frameworks (Handel-C, Lava,
Esterel, ) - 1981 designer produced 100 transistors per month
- 2002 designer produces 5000 transistors per month
- 2007 ???
20Our New Understanding
- We have simultaneous optimisations of competing
design metrics speed, size, power, complexity,
etc. - We need a Renaissance Engineer
- with holistic view of design process and
comfortable with technologies ranging from
hardware, software to formal methods - Maturation of behavioral synthesis tools and
other tools has enabled this kind of unified view
of hardware/ software co-design. - Design efforts now focus at higher levels of
abstraction gt abstract specifications now
refined into programs and then into gates and
logic. - There is no fundamental difference of between
what hardware and software can implement.
21Designer Productivity
- The Mythical Man Month by Frederick Brooks 75
- More designers on team gt lower productivity
because of increasing communication costs between
groups - Consider 1M transistor project- Say, a designer
has productivity of 5000 transistor/mth- Each
extra designer gt decrease of 100 transistor/mth
productivity in group due to comm. costs - 1 designer 1M/5000 200mth
- 10 designer 1M/(104100) 24.3mth
- 25 designer 1M/(252600) 15.3mth
- 27 designer 1M/(272400) 15.4mth
- Need new design technology to shrink the design
gap
Source Embedded System Design Frank Vahid/ Tony
Vargis (John Wiley Sons, Inc.2002)
22Plan
- Embedded Systems
- New Approaches to building ESW
- New paradigms Lava, Handel-C
- Examples Engineering Returns to Software
- Build a RISC processor in 48hrs
- Advantages of reconfigurable hardware.
- Real-time support for ESW
23Design Productivity Gap
- Designer productivity has grown over the last
decade - Rate of improvement has not kept pace with the
chip-capacity growth - 1981 leading edge chip
- 100 designers 100 trans/mth gt 10k trans
complexity - 2002 leading edge chip
- 30k designer mth 5k trans/mth gt 150M trans
complexity - Designers at avg. of 10k pmgt cost of building
leading edge chips has gone from 1M in 1981 to
300M in 2002 - Need paradigm shift to cope with the complexities
of system design
24Lava
- Not so much a hardware description language
- More a style of circuit description
- Emphasises connection patterns
- Think of Lego
25Lava
- Mary Sheeran, Koen Classen, Satnam
SinghChalmers University (Sweden) - Based on earlier work on MuFP to describe circuit
functionality and layout in single language - Built using functional programming paradigm
26Behaviour and Structure
g
f
f -gt- g
27Lava Properties
- Higher-order functions
- Circuits are functions
- May be passed as arguments to other functions.
- gt Easier to produce parameterized circuits than
with VHDL. - Functions can return circuits as results
- Circuit combinators take circuits as arguments,
return circuits as results. - gt Powerful glue for composing circuits to form
larger systems. - Circuit combinators combine behavior layout
- Combinators lay out circuits in rows, columns,
triangles, trees etc. - Performance of circuit
- Improved by exploring the layout design space by
experimenting with alternative layout
combinators. - Examples of circuits produced
- High speed constant coefficient multipliers,
finite impulse response filters (1D and 2D),
adder tree networks and sorting butterfly
networks.
28Parallel Connection Patterns
f -- g
29map f
30Four Sided Tiles
31Column
32Full Adder
cout
b
sum
a
cin
fa (cin, (a,b)) (sum, cout)
where part_sum
xor (a, b) sum xorcy
(part_sum, cin) cout
muxcy (part_sum, (a, cin))
33Generic Adder
adder col fa
34Top Level
adder16Circuit do a lt- inputVec a
(bit_vector 15 downto 0) b lt- inputVec
b (bit_vector 15 downto 0) (s, carry)
lt- adder4 (a, b) sum lt- outputVec sum
s (bit_vector 16 downto 0) ? circuit2VHDL
add16 adder16Circuit ? circuit2EDIF add16
adder16Circuit ? circuit2Verilog add16
adder16Circuit
35Xilinx FPGA Implementation
- 16-bit implementation on a XCV300 FPGA
- Vertical layout required to exploit fast carry
chain - No need to specify coordinates in HDL code
3616-bit Adder Layout
Source Mary Sheeran Nov.2002
37Four adder trees
Source Mary Sheeran Nov.2002
38No Layout Information
Source Mary Sheeran Nov.2002
39Plan
- Embedded Systems
- New Approaches to building ESW
- New paradigms Lava, Handel-C
- Examples Engineering Returns to Software
- Build a RISC processor in 48hrs
- Advantages of reconfigurable hardware.
- Real-time support for ESW
40Handel-C
- Programming language- enables compilation of
programs into synchronous hardware - NOT Hardware Description Language- its a prog.
language aimed at compiling high-level algorithms
into gate-level hardware - Syntax (loosely) based on C
- Handel-C is to hardware (gates) what C is to
micro-assembly code
41Handel-C (cont.)
- Inventor - Ian Page, Programming Research Group
(Oxford University/UK) - Semantics based on Hoares Communication Seq.
Processes (CSP) model - Occam transputer prog. language
- Industry heavyweights using tools Marconi,
Ericcson, BAe, Creative Labs, etc.
42What this means
- Hardware design produced is exactly the hardware
specified in source program - No intermediate interpreting layer as in
assembly language targeting general purpose
microprocessor - Logic gates are assembly instructions of Handel-C
system - Design/re-design/optimise at software level!!!
43What This Means
- True parallelism
- not time-shared (interpreted) parallelism of
gen.purpose computers - PAR ab
- instructions executed in // at same instant of
time by 2 sep. pcs of hw - Timing
- branches that complete early forced to wait for
slowest branch before continuing
44Comparison with C
- Similar- Programs inherently sequential-
Similar control-flow constructs if-then-else,
switch, while, for, etc. - Dissimilar - No malloc/ dynamic store
allocation- No recursion (limited rec. in
macros)- No nested procedures- No stdin/stdout
- Void main()- variable width words- PAR, etc.
45Handel-C is based on
- ANSI-standard C without external
library-functions - I/O functions printf(), putc(), scanf(),...
- File functions fopen(), fclose(), fprintf(), ...
- String-functions length(), strcpy(), strcmp(),
- Math-functions sin(), cos(), sqrt(),
- ...
46Supported declarationsstatements instructions
- Main program structure
- Variables
- Arrays
- Switch statement
- FOR Loop
- Comments
- Constants
- Scope Variable sharing
- Arithmetic, Relational, Relational Logic ops
- Conditional Execution
- While loop
- Do While Loop
47Channel Communication
- link!v link?v
- channel input is form of assignment
- Provides link between parallel (//) branches
- One // branch outputs data onto channel
- Other // branch reads data from channel
- gt Synchronisation
- data transfers only when both processes are ready
48Additional Features Statements
- Channel
- unsigned int 8 a
- chan unsigned int 8 c
- c ! 5
- c ? A
49Additional Features Statements
- Prialt
-
- prialt
-
- case CommsStatement
- Statement
- break
- ...
- default
- Statement
- break
-
50Example 1 (sum)
IMPORTANT width!!
- Void main()
- unsigned int 16 sum // variable width word
- unsigned int 8 data
- chanin input // input/output
- chanout output
-
- sum0
- do
- input?data
- sum sum (0_at_data)
- while (data!0)
- output!sum
-
-
51Example 2 (divider)
- define DATA_WIDTH 16
- Void main(void)
- unsigned int DATA_WIDTH a, mult, result
- unsigned int (DATA_WIDTH2 -1) b
- chanin input
- chanout output
-
- while (1)
- input?a
- input?result b result _at_ 0
- mult 1ltlt (DATA_WIDTH-1)
- result 0
- ltltltltlt MAIN LOOP gtgtgtgtgt
- output ! Result
-
-
result integer(a / b)
52Example 2 (cont.)
- while (mult ! 0)
-
- if (0 _at_ a) gt b)
- par
- a - b lt- width(a)
- result ! mult
-
- par
- b b gtgt 1
- mult mult gtgt 1
-
53Example 3
Link0
Link1
input
output
State0
State1
State2
Parallel tasks Comm between tasks Array of
variables Array of channels Parameterised on width
- Void main(void)
-
- chan unsigned int undefined link2
- chanin unsigned int 8 input
- chanout unsigned int 8 output
- unsigned int undefined state3
- par
- while (1) // first queue location
- input ? State0
- link0 ! State0
-
- while (1) // second queue location
- link0 ? State1
- link1 ! State1
-
- while (1) // third queue location
- link1 ? State2
- output ! State2
54Additional Features Statements
- Timing
- An assignment statement takes exactly one clock
cycle to execute. Everything else is free - void main(void)
-
- unsigned 8 x, y
-
- x x y
-
55Timing/efficiency issues
- One clock source for entire program- Assignment
delay take one clock cycle- Expressions are
for free - Handel-C designed such that experienced
programmer can immediately tell which
instructions execute on which clock cycles - Example x y x (((yz) (wv)
)ltlt2)lt-7both statements take one clock cycle
- Clock at longest logic depthgt reduce the depth
of logic to speed up programgt pipelining
56Porting C to Handel-C
- Decide how software maps to hardware platform
- Partition algorithm between multiple FPGAs
- Port C to Handel-C use simulator to check
correctness - Modify code to take advantage of extra operators
in Handel-C - simulate to ensure correctness - Add fine-grain parallelism through PAR parallel
assignments or parallellise algorithm - simulate - Add hardware interfaces for target architecture
map simulator channels communications onto these
interfaces - simulate - Use FPGA place route tools to generate FPGA
images
57Design Flow Overview
Port algorithm to Handel-C
Compile program to .net file for simulator
Modify/ debug program
Use simulator to evaluate and debug design
Add interfaces to external hardware
Use Handel-C compiler to target h/w netlist
Use FPGA tools to place route netlist
Program FPGA with result of place route
58Essence
- Software approach allows us to rapidly prototype
applications for a given domain - Handel-C provides a seamless approach toderive
expressive and fast implementations from the
software level - Cost of silicon is falling shortage of trained
engineers high cost of programmer time gt
Software based, high-level approaches to solving
problems become increasingly attractive.
59Handel-C Concepts (Recap)
- Describes hardware - h/w design produced h/w
in source program - Logic gates are assembly instructions of Handel-C
system - Real parallelism not interpreted
- Assignment, delay take 1 clock cycleExpression
evaluation is free - No side-effectsI.e. a is statement (not
expression as in C) - Variable width words gt great performance
improvement over softwareMin. datapath widths gt
minimal h/w usage
60Additional Features Statements
61Concurrency (example)
- void main(void)
-
- unsigned 8 x, y
- unsigned 5 temp1
- unsigned 4 temp2
- ...
- temp1 (0_at_(x lt- 4)) (0_at_(y lt- 4))
- temp2 (x \\ 4) (y \\ 4)
- x (temp2 (0_at_temp14)) _at_ temp130
62Additional Features Statements
- Concurrency
- ...
- par
-
- temp1(0_at_(xlt-4))(0_at_(ylt-4))
- temp2(x\\4)(y\\4)
-
- x(temp2(0_at_temp14))_at_temp130
- ...
63Features Statements (contd.)
- Delay
- ...
- par
-
- x 1
-
- delay
- x2
-
-
- while (x 0) delay
64Additional Features Statements
- Channel
- unsigned int 8 a
- chan unsigned int 8 c
- c ! 5
- c ? A
- Single variable must not be accessed by gt1 //
branchgt - par
- out!3
- out!4
- // illegal
65Features Statements(contd.)
- Macros(Examples - contd)
- Combinatorial
- macro expr abs(a) ((a) width(a)-1 0 ? (a)
(-a)) - shared expr incwrap(e, m) (((em) ? 0
(e)1) - Recursive
- macro expr copy (e, n)
- select(n1, (e), copy(e, n/2) _at_ copy(e,
n-(n/2)))
66Features Statements(contd)
- Operators for Bit Manipulation
- z x lt- 2 // Take least significant bits
- z y \\ 2 // Drop least significant bits
- z x _at_ y // Concatenation
- z x3 // Bit selection
- z y23 // Bus selection
- z width(x) // Width of expression
- Note in the form ymn the order is MSBLSB
- Unsigned int 3 y 4
- y0 is 0
- y2 is 1
67Additional Features Statements
- External RAM / ROM
- ram unsigned int 4 ExtRAM8 with offchip 1,
- data "P01", "P02", "P03", "P04",
- addr "P05", "P06", "P07",
- we "P08", oe "P09", cs "P10"
- rom unsigned int 4 ExtROM8 with offchip 1,
- data "P01", "P02", "P03", "P04",
- addr "P05", "P06", "P07",
- we , oe "P09", cs "P10"
68Additional Features Statements
- Internal RAM / ROM
- ram unsigned int 8 speicher256
- rom unsigned int 8 program 1,2,3,4
- unsigned char i
- i 3
- speicheri 25
- for (i 0 i lt 4 i) stdout ! programi
69Recursive Macro Expressions Example
- Illustrates the generation of large quantities of
hardware from simple macros. - Multiplier whose width depends on the parameters
of the macro. - Starting point for generating large regular
hardware structures using macros. - Single-cycle long multiplication from single
macro - macro expr multiply(x, y) select(width(x)
0, - 0, multiply(x \\ 1, y ltlt 1) (x0
1 ? y 0)) - a multiply (b , c)
70Timing
71Additional Features Statements
- Off-Chip Interface
- Input, registered Input, latched Input
- Output
- Tristate Bus
- Off-Chip Interface (examples)
- interface bus_in (int 4) InBus() with
- data "P1", "P2", "P3", "P4"
-
- int 4 x
- x InBus.in
- interface bus_out () OutBus (xy) with
- data "P11", "P12", "P13", "P14"
72Parallel Access to Variables
- Rules of parallelism same variable must not be
accessed from two separate parallel branches.
(to avoid resource conflicts on the variables) - Actually, the same variable must not be assigned
to more than once on the same clock cycle but may
be read as often as required (see wires!) - Allows some useful and powerful programming
techniques. eg - par
-
- a b
- b a
- // swaps values of a and b in single
clock cycle.
73Parallel Access to Variables
- Four place queue
- while(1)
-
- par
- int x3
- x0 in
- x1 x0
- x2 x1 // values at out delayed
- out x2 // by 4 clock cycles
-
-
74Time Efficiency of Handel-C Hardware
- RequirementClock period for program to be
longer than longest path thru combinatorial logic
in whole program. - gt once FPGA place and route is done, max.
clock-rate 1/longest-path-delay - ExampleFPGA place and route tools calculate
longest path delay between flip-flops in a design
is 70nS. - The max. clock rate is 1/70nS 14.3MHz.Speed
allowed by system 400kHz - 100MHz - BUT WHAT IF THIS IS NOT FAST ENOUGH
75Improving Time Efficiency
- Reducing Logic DepthAvoid multiplication, avoid
wide-adders, reduce complex expressions into
stages, etc. - unsigned 8 x
- unsigned 8 y
- unsigned 5 temp1
- unsigned 4 temp2
- par
-
- temp1 (0_at_(xlt-4)) (0_at_(ylt-4))
- temp2 (x \\ 4) (y \\ 4)
-
- x (temp2(0_at_temp14)) _at_ temp130
- Pipelining gt increased latency for higher
throughput
76Serialisation
- Multiplication in more than one clockcycle in
order to save hardware - Algorithm is parametrizable by a compile-time
constant - macro proc mult_serial(x, y, xy)
-
- macro expr count_width 5
- macro expr steps 1 ltlt count_width
- macro expr bits width(xy) / steps
- unsigned count_width count
- par xy 0 count 0
- do par
-
- xy (0 _at_ (x lt- bits)) y
- x gtgt bits
- y ltlt bits
- count
- while (count ! 0)
77Serialisation
- Gatecount for a 32-bit multiplication
78Plan
- Embedded Systems
- New Approaches to building ESW
- New paradigms Lava, Handel-C
- Examples (Engineering Returns to Software
- Build a RISC processor in 48hrs
- Advantages of reconfigurable hardware.
- Real-time support for ESW
79RISC-Processor
- Features
- 16 instructions
- 4 bit I/O Ports
- one accumulator
- Program memory (16x8 ROM)
- Data memory (16x4 RAM)
- ProblemExecute a program stored in ROM to
calculate the first few members of the Fibonacci
number sequence.1, 2, 3, 5, 8, 13, 21, 34,
fib(n) 1 if n0 V n1fib(n) fib(n-1)
fib(n-2) if ngt2
80RISC-Processor
81RISC-Processor (cont.)
- Program
- chanin input
- chanout output
- // Parameterisation
- define dw 32 / Data width /
- define opcw 4 / Op-code width /
- define oprw 4 / Operand width /
- define rom_aw 4 / Width of ROM address bus /
- define ram_aw 4 / Width of RAM address bus /
- // The opcodes
- define HALT 0
- define LOAD 1
- define LOADI 2
- define STORE 3
- define ADD 4
- define SUB 5
- define JUMP 6
82RISC-Processor(cont.)
- I/O Interface
- unsigned int dw output
- interface bus_clock_in (unsigned int 1) reset()
with data reset_pin - interface bus_in (unsigned int dw) input() with
data in_pins - interface bus_out () out(output) with data
out_pins - Definition of available opcode
- define HLD 0
- define NOP 1
- define OUT 2
- define IN 3
- ...
- define SRA 15
83RISC-Processor
- Declaration of FPGA and Pinning
- set family Altera10K
- set part "EPF10K70RC240-3"
- set clock external "91"
- macro expr in_pins "38", "83", "101", "148"
- macro expr out_pins "153", "202", "218",
"19" - macro expr reset_pin "45"
- Defining Parameters
- define dw 4 / Data width /
- define opcw 4 / Op-code width /
- define oprw 4 / Operand width /
- define rom_aw 4 / Width of ROM addr bus /
- define ram_aw 4 / Width of RAM addr bus /
-
84RISC-Processor (cont.)
- Program (cont)
- // Rom program data
- rom unsigned int undefined program
-
- _asm_(LOADI, 1), / 0 / / Get a one /
- _asm_(STORE, 3), / 1 / / Store this /
- _asm_(STORE, 1), / 2 /
- _asm_(INPUT, 0), / 3 / / Read value from user
/ - _asm_(STORE, 2), / 4 / / Store this /
- _asm_(LOAD, 1), / 5 / / Loop entry point /
- _asm_(ADD, 0), / 6 / / Make a fib number /
- _asm_(STORE, 0), / 7 / / Store it /
- _asm_(OUTPUT, 0), / 8 / / Output it /
- _asm_(ADD, 1), / 9 / / Make a fib number /
- _asm_(STORE, 1), / a / / Store it /
- _asm_(OUTPUT, 0), / b / / Output it /
- _asm_(LOAD, 2), / c / / Decrement counter /
- _asm_(SUB, 3), / d /
- _asm_(JUMPNZ, 4), / e / / Repeat if not zero
/
85RISC-Processor (cont.)
- Program (cont)
- / RAM for processor /
- ram unsigned int dw data1 ltlt ram_aw
- / Processor registers /
- unsigned int rom_aw pc / Program counter /
- unsigned int (opcwoprw) ir / Instruction
register / - unsigned int dw x / Accumulator /
- / Macros to extract opcode and operand fields /
- define opcode (ir lt- opcw)
- define operand (ir \\ opcw)
86RISC-Processor (cont.)
- Program (cont)/ Main program /
- void main(void)
-
- pc 0
- // Processor loop
- do
-
- // fetch
- par
-
- ir programpc
- pc pc 1
-
- / MAIN DECODE/EXECUTE /
- while (opcode ! HALT)
- / main program /
87RISC-Processor (cont.)
- Program (cont)
- // decode and execute
- switch (opcode)
-
- case LOAD x dataoperandlt-ram_aw break
- case LOADI x 0 _at_ operand break
- case STORE dataoperandlt-ram_aw x break
- case ADD x xdataoperandlt-ram_aw break
- case SUB x x-dataoperandlt-ram_aw break
- case JUMP pc operandlt-rom_aw break
- case JUMPNZ if (x!0) pcoperandlt-rom_aw
break - case INPUT input ? x break
- case OUTPUT output ! x break
- default while(1) delay // unknown opcode
-
88RISC-Processor (cont.)
89Simulation debugging
- The simulator is integrated into the compiler.
- Executing a cycle-based simulation.
- Variables are traceable at any clock cycle.
- Port interface will be replaced by standard I/O.
- Handel-C simulator supports debugging at any
clock-cycle. - Highlighting of characteristic Values e.g. Area
of any program line.
90Some Recent Work
- Customising Graphics ApplicationsTechniques
Programming InterfaceHenry Styles Wayne Luk,
Proceedings of IEEE Symposium on Field
Programmable Custom Computing Machines, IEEE
Computer Society Press, 2000. - Exploit custom data-formats and datapath
widthsto optimise graphics operations such as
texture mapping hidden-surface removal. - Discusses techniques for balancing graphics
pipeline - Customised architectures captured in
Handel-Ccompiled for Xilinx Virtex FPGAs - Handel-C API based on OpenGL standardfor
automatic speedup of graphics applications,
include Quake-2 action game.
91The Graphics Pipeline
92Performance Case Studies
Nvidia is a 3-D graphics chipset I.e.
specialised graphics ASIC Chart gt FPGA platform
fast approaching performance of dedicated
graphics ASICfor gen. Purpose graphics
applications
93Performance Case Studies
- Infrared Simulation requires custom pixel format
not supported by graphics ASICs
Onyx contains two 180 MHz MIPs processors, two
Geometry Engine processors and two rasteriser
ASICs, with a memory Bandwidth of 6.4 GB/sec
(I.e. 10X cost mem.b/w of FPGA
94Performance Case Studies
- Quake-2 benchmark requires custom pixel format
not supported by graphics ASICs
Bottleneck is PCI-bus speed limitation. Improve
performance by moving FPGA to AGP slot allowing
1GB/sec transfers between graphics h/w and memory
95Some Observations
- FPGA renderer is a low-cost platform for custom
graphics applications - Development time of a customised FPGA renderer
comparable to optimised softwaregt effective to
use a reconfigurable platform - Good for reconfigurable designs where ASIC is not
available or too expensive - Useful in exploring desirable algorithms and
architectures for ASICs - Hardware renderer may be customised to maximixe
performance for each application
96Some Features of the Rapid Prototyping Board
- Full length 32 bit PCI card
- Virtex XCV1000 1.000.000 system gates,
- 131 kBit Block RAM, 393 kBit SelectRAM
- Programmable clock 400 kHz to 100 MHz
- 4 banks of fast asynchronous 32 bit wide SRAM,
each 2 Mbytes - PCI interface 32 bit, 33 MHz, 132 Mbytes/sec
burst - 2 x PMC sites for VME grade I/O processing
modules - 50 pin Aux I/O, 8 LEDs
97Summary
- Cost of silicon is falling Products are getting
more complex Time-to-market shrinking rapidly
shortage of trained engineers cost of
programmer time is major constraint gtSoftware
based, high-level approaches to solving problems
become increasingly attractive. - New generation of languages let us build systems
at high level of abstraction. - High-density FPGAs and SoCs allow complex
designs to be rapidly prototyped gt reduce the
development cycle of new technology perhaps
even to deploy final product as soft cores. - Broader understanding demanded from system
designer need Renaissance Engineer with
equal understanding of hardware and software.
98Plan
- Embedded Systems
- New Approaches to building ESW
- Real-Time Support
- Special Characteristics of Real-Time Systems
- Real-Time Constraints
- Canonical Real-Time Applications
- Scheduling in Real-time systems
- Operating System Approaches
99What is real about real-time?
- computer world real world
- e.g., PC industrial system, airplane
- average response for user,
events occur in environment at own speed - interactive
- occasionally longer reaction too slow
deadline miss - reaction user annoyed reaction
damage, pot. loss of human life -
- computer controls speed of user computer
must follow speed -
of environment
computer time real-time
100They Why real-time, why not simply fast?
- Fast enough dependent on system and its
environment and - turtle fast enough to eat salad
- mouse fast to enough steal cheese
- fly fast enough to escape
101- what if environment changes?systems not fast
enough - mouse trap
time scale depends on - or dictated by -
environment cannot slow down environment is the
real world
102Real-Time Systems
- A real-time system is a system that reacts to
events in the environment by performing
predefined actions
within specified time intervals.
103Flight Avionics
Constraints on responses to pilot inputs,
aircraft state updates
104- Constraints
- Keep plastic at proper temperature (liquid, but
not boiling) - Control injector solenoid (make sure that the
motion of the piston reaches the end of its
travel)
105Real-Time Systems Properties of Interest
- Safety Nothing bad will happen.
- Liveness Something good will happen.
- Timeliness Things will happen on time -- by
their deadlines, periodically, ....
106In a Real-Time System.
Correctness of results depends on valueand its
time of delivery
- correct value delivered too late is incorrect
- e.g., traffic light light must be green when
crossing, not enough before - Real-time
- (Timely) reactions to events as they occur, at
their pace(real-time) system (internal) time
same time scale as environment (external) time
107Performance Metrics in Real-Time Systems
- Beyond minimizing response times and increasing
the throughput - achieve timeliness.
- More precisely, how well can we predict that
deadlines will be met?
108Types of RT Systems
- Dimensions along which real-time activities can
be categorized - how tight are the deadlines? --deadlines are
tight when the laxity (deadline -- computation
time) is small. - how strict are the deadlines? what is the value
of executing an activity after its deadline? - what are the characteristics of the environment?
how static or dynamic must the system be? - Designers want their real-time system to be fast,
predictable, reliable, flexible.
109Hard, soft, firm
- Hardresult useless or dangerousif deadline
exceeded
- Softresult of some - lower -value if deadline
exceeded
- Firm
- If value drops to zero at deadline
-
Deadline intervalsresult required not
laterand not before
110Examples
- Hard real time systems
- Aircraft
- Airport landing services
- Nuclear Power Stations
- Chemical Plants
- Life support systems
- Soft real time systems
- Mutlimedia
- Interactive video games
111Real-Time Items and Terms
- Task
- program, perform service, functionality
- requires resources, e.g., execution time
- Deadline
- specified time for completion of, e.g., task
- time interval or absolute point in time
- value of result may depend on completion time
112Plan
- Special Characteristics of Real-Time Systems
- Real-Time Constraints
- Canonical Real-Time Applications
- Scheduling in Real-time systems
- Operating System Approaches
113Timing Constraints
- Real-time means to be in time --- how do we know
something is in time?how do we express that? - Timing constraints are used to specify temporal
correctnesse.g., finish assignment by 2pm, be
at station before train departs. - A system is said to be (temporally) feasible, if
it meets all specified timing constraints. - Timing constraints do not come out of thin
airdesign process identifies events, derives,
models, and finally specifies timing constraints
114- Periodic
- activity occurs repeatedly
- e.g., to monitor environment values, temperature,
etc.
period
115- Aperiodic
- can occur any time
- no arrival pattern given
aperiodic
aperiodic
116- Sporadic
- can occur any time, but
- minimum time between arrivals
mint
sporadic
117Who initiates (triggers) actions?
- Example Chemical process
- controlled so that temperature stays below danger
level - warning is triggered before danger point
- so that cooling can still
occur - Two possibilities
- action whenever temp raises above warn event
triggered - look every int time intervals action when temp
if measures above warn time triggered
118t
time
TT
ET
119t
time
TT
ET
120ET vs TT
- Time triggered
- Stable number of invocations
- Event triggered
- Only invoked when needed
- High number of invocation and computation demands
if value changes frequently
121Slow down the environment?
- Importance
- which parts of the system are important?
- importance can change over timee.g., fuel
efficiency during emergency landing - Flow controlwho has control over speed of
processing, who can slow partner down? - environment
- computer system
- RT environment cannot be slowed down
122Other Issues to worry about
- Meet requirements -- some activities may run
only - after others have completed - precedence
constraints - while others are not running - mutual exclusion
- within certain times - temporal constraints
- Scheduling
- planning of activities, such that required timing
is kept - Allocation
- where should a task execute?
123Plan
- Special Characteristics of Real-Time Systems
- Real-Time Constraints
- Canonical Real-Time Applications
- Scheduling in Real-time systems
- Operating System Approaches
124A Typical Real time system
Temperature sensor
Input port
CPU
Memory
Output port
Heater
125Code for example
While true do read temperature sensor if
temperature too high then turn off heater
else if temperature too low
then turn on heater
else nothing
126Comment on code
- Code is by Polling device (temperature sensor)
- Code is in form of infinite loop
- No other tasks can be executed
- Suitable for dedicated system or sub-system only
127Extended polling example
Conceptual link
Temperature Sensor 1
Task 1
Heater 1
Temperature Sensor 2
Heater 2
Task 2
Computer
Temperature Sensor 3
Heater 3
Task 3
Temperature Sensor 4
Heater 4
Task 4
128Polling
- Problems
- Arranging task priorities
- Round robin is usual within a priority level
- Urgent tasks are delayed
129Interrupt driven systems
- Advantages
- Fast
- Little delay for high priority tasks
- Disadvantages
- Programming
- Code difficult to debug
- Code difficult to maintain
130How can we monitor a sensor every 100 ms
Initiate a task T1 to handle the
sensor T1 Loop Do sensor task T2 Schedule
T2 for 100 ms Note that the time could be
relative (as here) or could be an actual time -
there would be slight differences between the
methods, due to the additional time to execute
the code.
131An alternative
Initiate a task to handle the sensor T1 T1 Do
sensor task T2 Repeat Schedule T2 for n 100
ms nn1 There are some subtleties here...
132Clock, interrupts, tasks
Interrupts
Clock
Processor
Examines
Job/Task queue
Task 1
Task 2
Task 3
Task 4
Tasks schedule events using the clock...
133 Flight Simulator
134Time Periods to meet Timing Requirements
Requirement
Choice Made
Rationale
135 Time Periods to meet Timing Requirements
Requirement
Choice Made
Rationale
136Time Periods to meet Timing Requirements
137Controlling a reaction
- we know
- if temperature too high, it explodes
- maximum rate of temperature increase
- rate of cooling
- events
- temperature change
- temperature gt safe threshold
- we can derive
- how often we have to check temperature
- when we have to finish cooling
138(No Transcript)
139Example Injection Molding (cont.)
140Example Injection Molding (cont.)
141Plan
- Special Characteristics of Real-Time Systems
- Real-Time Constraints
- Canonical Real-Time Applications
- Scheduling in Real-time systems
- Operating System Approaches
142Why is scheduling important?
- Definition
- A real-time system is a system that reacts to
events in the environment by performing
predefined actions within specified time
intervals.
143Schedulability analysis
- a.k.a. feasibility checking
- check whether tasks will meet their
- timing constraints.
144Scheduling Paradigms
- Four scheduling paradigms emerge, depending on
- whether a system performs schedulability
analysis - if it does,
- whether it is done statically or dynamically
- whether the result of the analysis itself
produces - a schedule or plan according to which
- tasks are dispatched at run-time.
1451. Static Table-Driven Approaches
- Perform static schedulability analysis by
checking if a schedule is derivable. - The resulting schedule (table) identifies the
start times of each task. - Applicable to tasks that are periodic (or have
been transformed into periodic tasks by well
known techniques). - This is highly predictable but, highly
inflexible. - Any change to the tasks and their characteristics
may require a complete overhaul of the table.
1462. Static Priority Driven Preemptive
Approaches
- Tasks have -- systematically assigned -- static
priorities. - Priorities take timing constraints into
account - e.g. rate-monotonic
- the lower the period, the higher the priority.
- Perform static schedulability analysis but no
explicit schedule is constructed - RMA - Sum of task Utilizations lt ln 2.
- Task utilization computation-time / Period
- At run-time, tasks are executed
highest-priority-first, with preemptive-resume
policy. - When resources are used, need to compute
worst-case blocking times.
147Static PrioritiesRate Monotonic Analysis
- presented by Liu and Layland in 1973
- Assumptions
- Tasks are periodic with deadline equal to
period.Release time of tasks is the period start
time. - Tasks do not suspend themselves
- Tasks have bounded execution time
- Tasks are independent
- Scheduling overhead negligible
148RMA Design Time vs. Run Time
- At Design Time
- Tasks priorities are assigned according to their
periods shorter period means higher priority - Schedulability test
- Taskset is schedulable if
- Very simple test, easy to implement.
- Run-time
- The ready task with the highest priority is
executed.
149- RMA Example
- taskset t1, t2, t3, t4
-
- t1 (3, 1)
- t2 (6, 1)
- t3 (5, 1)
- t4 (10, 2)
- The schedulability test
- 1/3 1/6 1/5 2/10 4 (2(1/4) - 1) ?
- 0.9 lt 0.75 ?
- . not schedulable
150- RMA
- A schedulability test is
- Sufficient
- there may exist tasksets that fail the test,
but are schedulable - Necessary
- tasksets that fail are (definitely) not
schedulable - The RMA schedulability test is sufficient, but
not necessary. - e.g., when periods are harmonic,
- i.e., multiples of each other,
utilization can be 1.
151Exact RMA
- by Joseph and Pandya, based on critical instance
analysis - (longest response time of task, when it is
released at same time as all higher priority
tasks) - What is happening at the critical instance?
- Let T1 be the highest priority task. Its response
time - R1 C1 since it cannot be preempted
- What about T2 ?R2 C2 delays due to
interruptions by T1. - Since T1 has higher priority, it has
shorter period. That means it will interrupt T2
at least once, probably more often. Assume T1 has
half the period of T2, R2 C2 2 x C1
152Exact RMA.
- In general
- Rni denotes the nth iteration of the response
time of task i - hp(i) is the set of tasks with higher priority as
task i
153Example - Exact Analysis
- Let us look at our example, that failed the pure
rate monotonic test, although we can schedule it - Exact analysis says so.
- R1 1 easy
- R3, second highest priority taskhp(t3) T1
R3 2
154- R2, third highest priority taskhp(t2) T1 ,T3
R2 3
155- R4, third lowest priority taskhp(t4) T1 ,T3
,T2 - R4 9
- Response times of first instances of all tasks
lt their periods - gt taskset feasible under RM scheduling
1563. Dynamic Planning based Approaches
- Feasibility is checked at run-time -- a
dynamically arriving task is accepted only if it
is feasible to meet its deadline. - Such a task is said to be guaranteed to meet its
time constraints - One of the results of the feasibility analysis
can be a schedule or plan that determines start
times - Has the flexibility of dynamic approaches with
some of the predictability of static approaches
- If feasibility check is done sufficiently ahead
of the deadline, time is available to take
alternative actions.
1574. Dynamic Best-effort Approaches
- The system tries to do its best to meet
deadlines. - But since no guarantees are provided, a task may
be aborted during its execution. - Until the deadline arrives, or until the task
finishes, whichever comes first, one does not
know whether a timing constraint will be met. - Permits any reasonable scheduling approach, EDF,
Highest-priority,
158Cyclic scheduling
- Ubiquitous in large-scale dynamic real-time
systems - Combination of both table-driven scheduling and
priority scheduling. - Tasks are assigned one of a set of harmonic
periods. - Within each period, tasks are dispatched
according to a table that just lists the order in
which the tasks execute. - Slightly more flexible than the table-driven
approach - no start times are specified
- In many actual applications, rather than making
worse-case assumptions, confidence in a cyclic
schedule is obtained by very elaborate and
extensive simulations of typical scenarios.
159Plan
- Special Characteristics of Real-Time Systems
- Real-Time Constraints
- Canonical Real-Time Applications
- Scheduling in Real-time systems
- Operating System Approaches
160Real-Time Operating Systems
- Support process management and synchronization,
memory management, interprocess communication,
and I/O. - Three categories of real-time operating systems
- small, proprietary kernels.
- e.g. VRTX32, pSOS, VxWorks
- real-time extensions to commercial timesharing
operatin systems. - e.g. RT-Linux, RT-NT
- research kernels
- e.g. MARS, ARTS, Spring, Polis
161Real-Time Applications Spectrum
Hard
Real-Time Operating System
VxWorks, Lynx, QNX, ...
Intime, HyperKernel, RTX
Windows CE
General-Purpose Operating System
Windows NT
Soft
162Real-Time Applications Spectrum
Hard
Real-Time Operating System
VxWorks, Lynx, QNX, ... Intime, HyperKernel, RTX
Windows CE
Windows NT
General-Purpose Operating System
Soft
163Embedded (Commercial) Kernels
- Stripped down and optimized versions of
timesharing operating systems. - Intended to be fast
- a fast context switch,
- external interrupts recognized quickly
- the ability to lock code and data in memory
- special sequential files that can accumulate
data at a fast rate - To deal with timing requirements
- a real-time clock with special alarms and
timeouts - bounded execution time for most primitives
- real-time queuing disciplines such as earliest
deadline first, - primitives to delay/suspend/resume execution
- priority-driven best-effort scheduling mechanism
or a table-driven mechanism. - Communication and synchronization via mailboxes,
events, signals, and semaphores.
164Real-Time Extensions to General Purpose
Operating Systems
- E.g., extending LINUX to RT-LINUX, NT to RT-NT
- Advantage
- based on a set of familiar interfaces
(standards) that speed development and facilitate
portability. - Disadvantages
- Too many basic and inappropriate underlying
assumptions still exist.
165Using General Purpose Operating Systems
- GPOS offer some capabilities useful for real-time
system builders - RT applications can obtain leverage from existing
development tools and applications - Some GPOSs accepted as de-facto standards for
industrial applications
166Real Time Linux approaches
- Modify the current Linux kernel to handle RT
constraints - Used by KURT
- Make the standard Linux kernel run as a task of
the real-time kernel - Used by RT-Linux, RTAI
167Modifying Linux kernel
- Advantages
- Most problems, such as interrupt handling,
already solved - Less initial labor
- Disadvantages
- No guaranteed performance
- RT tasks dont always have precedence over non-RT
ta