Title: Distributed Embedded Systems
1Distributed Embedded Systems
2Distributed Embedded Systems
- A system implemented on several processing
elements connected by one or more networks that
allows them to communicate
3Distributed Embedded Architectures
- Many alternatives
- basic units processing elements and the network
4Why Distributed?
- The devices that PEs communicate with may be
physically separated - One part of the system can be used to help
diagnose problems in another part - Distributing work on multiple PEs may be more
cost effective than one powerful PE - Some off the shelf components may have embedded
processors pre-designed for network interfacing
5Network Abstractions
- In initial stages, there was chaos
- Solution - the ISOs OSI model
- ISO ?International Standards Organization
- OSI ? Open Systems Interconnect Model
6(No Transcript)
7Physical Layer
- Defines the basic properties of the interface
between systems, including - the physical connections (plugs and wires),
- electrical properties,
- basic functions of the electrical and physical
components, and - the basic procedures for exchanging bits
8Data Link Layer
- Primary - error detection and control across a
single link - if the network requires multiple hops over
several data links - layer does not define the
mechanism for data integrity between hops only
within a single hop
9Network Layer
- Defines the basic end-to-end data transmission
service - Particularly important in multihop networks
10Transport Layer
- Defines connection-oriented services that ensure
that data are delivered in the proper order and
without errors across multiple links - May also try to optimize network resource
utilization
11Session Layer
- Provides mechanisms for controlling the
interaction of end-user services across a
network, such as data grouping and checkpointing
12Presentation Layer
- Defines data exchange formats and provides
transformation utilities to application programs
13Application Layer
- Provides the application interface between the
network and end-user programs
14Hardware and Software Architectures
- Architecture - a function of type of
interconnection network
15Point-to-Point
- A connection between exactly two processing
elements
16(No Transcript)
17Buses
- Interconnection for multiple processing elements
(PEs) - requires message formatting
18Responsibility of transmitting PE to
divide data into PACKETS
Responsibility of receiving PE to reassemble
message from PACKETS
19ARBITRATION SCHEME
- Necessary when multiple devices must try to
access bus
20Fixed-Priority Arbitration
- Always gives priority to competing devices in
the same way - If a high-priority and a low-priority device
both have long data transmissions ready at the
same time - quite possible that the low-priority
device will not be able to transmit until the
high-priority device has sent all data packets
21Fair Arbitration
- Makes sure that no device is starved
- Round robin arbitration -- the most commonly
used of the fair arbitration schemes - PCI bus -- requires that the arbitration scheme
used on the bus be fair but does not specify a
particular arbitration scheme
22Crossbar Network
- Not only allows any input to be connected to any
output - allows all combinations of input/output
connections to be made - a crosspoint is a switch that connects an input
to an output - The major drawback of the crossbar network is
expense - the size of the network grows as the square of
the number of inputs
23For example - can simultaneously connect in1 to
out4, in2 to out3, in3 to out2, in4 to out1
or any other combinations of inputs.
Multicast connections can also be made from one
input to several outputs
24Multistage Networks
25- Blocking gt there are some combinations of
sources and destinations for which messages
cannot be delivered simultaneously - Bus gt maximally blocking
- An alternative to a nonbus network is to use
multiple networks - may be cheaper to use two slow, inexpensive
networks than a single high-performance,
expensive network
26Sending A Message
- C Procedure
- send_packet(address, data)
- Code segment
- for (i 0iltmessage.length iipacket_size
) - send_packet(address,message.datai
- uses the loop to break up an arbitrary-length
message into packet-size chunks
27Receiving A Message
- Reception of a packet will probably be
implemented with interrupts - Simplest procedural interface will simply check
to see whether a received message is waiting in
a buffer - In a more complex RTOS-based system, reception
of a packet may enable a process/task for
execution
28Communication may be blocking or nonblocking
- Simplest implementation of message passing is
blocking, with the routine not returning until
it has transmitted or received - A nonblocking network interface requires a queue
of data to be sent, with network driver sending
packets off the head of the queue and placing
received packets on the tail of the queue - a nonblocking communication mechanism makes
sense only when concurrency is available between
computing and data transfer
29Networks for Embedded Systems
- Several system buses
- Multibus and VME bus were developed by Intel and
Motorola, respectively, for multicard computer
systems and have been widely used in industrial
applications - ISA bus has been used to support many I/O cards
for PC-based embedded systems - PCI bus has replaced ISA for high-speed
interfaces in PC-based applications
30Network Examples
- I2C Bus
- ETHERNET
- INTERNET
31I2C BUS
- Commonly used to link microcontrollers
- Multiple nodes may be bus masters
- Some nodes may be slaves -gt respond only
32Serial Data Line
Serial Clock Line
33Electrical Interface
34Operation
- Bus master responsible for generating SCL clock
- as a result no global master to generate clock
- Each bus master must listen to the bus while
transmitting to be sure there is no interference - if a different value is received than is being
sent gt interference
35Data Link Layer
- Each device has 7 bit address
- Addr 0000000 broadcast
- Addr 11110XX extended 10 bit address
- Bus transaction a series of one byte
transmissions - an address followed by one or more data bytes
36(No Transcript)
37Bus Transaction
- Initiated by a start signal and completed with
an end signal - Start is signaled by leaving the SCL high and
sending a 1 to 0 transition on SDL - Stop is signaled by setting the SCL high and
sending a 0 to 1 transition on SDL
38(No Transcript)
39(No Transcript)
40(No Transcript)
41Bus Arbitration
- When sending, devices listen to the bus
-
- if a device is trying to send a logic 1 but hears
a logic 0, it immediately stops transmitting and
gives the other sender priority
42(No Transcript)
43Microcontroller Implementation
- The I2C interface on a microcontroller can be
implemented with varying percentages of the
functionality in software and hardware - One of the microcontrollers timers is typically
used to control the length of bit on the bus - Interrupts may be used to recognize bits
44ETHERNET A bus organization but NOT
synchronized like I2C or CAN gt cannot detect
collision in a single bit time as I2C or CAN
45Arbitration Scheme
- CSMA/CD -- Carrier Sense Multiple
Access/Collision Detection - a node that has a message waits for the bus to
become silent and then starts transmitting - it simultaneously listens, and if it hears
another transmission that interferes with its
transmission, it stops transmitting and waits to
retransmit - the waiting time is random, but weighted by an
exponential function of the number of times the
message has been aborted
46(No Transcript)
47Randomness makes performance analysis
difficult Almost precludes hard real time control
48(No Transcript)
49- Maximum length of an Ethernet is limited by a
nodes ability to detect collisions - Worst case two nodes at opposite ends
- Length several hundred meters
50Internet
- Protocol Internet Protocol (IP)
- Provides connectionless, packet based
communication - OSI network layer
51(No Transcript)
52- IP creates the packets
- Router a node that transmits data (packets)
among different types of networks - As data pass through several layers of the
protocol stack, the IP packet data are
encapsulated in packet formats appropriate to
each layer
53Originally 32 bits Now 128 bits
lt 64K in length
54- IP Address Format XXX.XX.XX.XX
- Domain Name Server (DNS) translates
- YY.ZZ.COM etc into IP addresses
- IP at the network layer ?no guarantee that a
packet is delivered to destination or that
packets take same path and arrive in the same
order - Responsibility of higher layers
55Higher level services are built on top of IP
- TCP Transmission Control Protocol best known
example - provides connection oriented service that
ensures that data arrive and are placed in
appropriate order - uses an acknowledgement protocol to ensure that
packets arrive
56Other Services
- FTP file transport protocol batch file
transfers - HTTP hypertext transport protocol www
- SMTP simple mail transfer protocol email
- Telenet virtual terminals
- UDP user datagram protocol basis for network
management - SNMP simple network management protocol
57(No Transcript)
58Network-Based Design
- Similar analysis and design tasks as necessary
for accelerator based systems - must schedule computations in time and allocate
them to processing elements on the networks - Network can become the bottleneck in the system
59Communication Analysis
- To determine the delay incurred by transmitting
messages - Message delay (tm) for a single message with no
contention - tm txtntr
- tx - transmitter side overhead
- tn - network transmission time
- tr - receiver side overhead
60Example Simple Message Delay for an I2C Message
- Assume that an I2C bus runs at the rate of 100
kilobits per second and that we need to send
one 8-bit byte - npacket start bit addressdatastop bit
- 1881 18bits
- tn npacket x tbit 1.8X10-4 sec.
61- The loops that send bytes to and receive bytes
from the network interface will run
concurrently with the message transmission - If we assume that 20 instructions outside of
these loops are executed by the transmitter and
receiver, overhead on an 8 MHz microcontroller
would be as follows - tx tr20 x 0.125 x 10-6 2.5 x 10-6
- tm 2.5 x 10-6 1.8X10-4 2.5 x10-6 1.85X10-4
- Overhead is less than 3
62- If messages can interfere with each other in the
network, analyzing communication delay becomes
difficult - Message delay is
- ty tdtx
- where td is the network availability delay
incurred waiting for the network to become
available
63The main problem is calculating td
- If the network uses fixed-priority arbitration,
the network availability delay is unbounded for
all but the highest-priority device - If the network uses fair arbitration, the network
availability delay is bounded - in the case of round-robin arbitration, if there
are N devices, then the worst-case network
availability delay is N(tx tarb), where tarb is
the delay incurred for arbitration - tarb is usually small compared to transmission
time
64Example
- Adjusting Messages to Improve Network Delay
65Assume we want to implement the following task
graph
66Assume also that we want to implement the task
graph on the following network
67- Allocate
- P1 ---gt M1 - requires 3 units
- P2 ---gt M2 - requires 3 units
- P3 ---gt M3 - requires 4 units
- Transmission times
- d1d2 4 units
68The simplest implementation transmits all the
required data in one large message.
Total schedule length 3444 15 units
69- If P3 is redesigned so that it doesnt require
all of both messages to begin processing - program can be modified so that it reads one
packet of data each from d1 and d2 and starts
computing on that - if it finishes what it can do on that data before
the next packets from d1 and d2 arrive, it
waits otherwise, it picks up the packets and
keeps computing
70Total schedule length 381 12 units
71- Adding acknowledgement and data corruption into
the analysis, requires complex probability
analysis
72Arbitration
- A form of prioritization
- hence techniques such as those in RTOS can be
employed e.g. rate monotonic scheduling - since packet transmission, in general, cannot be
interrupted, networks exhibit priority
inversion - when a low priority message is on the network,
higher priority messages are effectively blocked - deadlock cannot occur
73System Performance Analysis
- Very difficult for distributed embedded systems
- Consider a simple case
74Two processes and one n-packet data communication
worst case time to completion tp1ntxtp2
If situation allows computations and
communications to interfere, complexity increases
75Consider this example task graph is
superimposed on target architecture
76- Data dependency from P1 to P2 translates any
uncertainty in the execution time of P1 into
uncertainty about the start time of P2 - Collocation of P2 and P3 to processing element
M2 means that variations in the ready time of P2
can affect the completion time of P3 - variations in the execution time of P2 can also
affect P3
77- Data dependency from P3 to P4 translates
variations in the completion time of P3 to the
start time of P4 - Therefore, even though P2 and P3 are separate
tasks, the fact that they are allocated to the
same PE causes them to interact with each other
in ways that affect the completion time of every
task in the system
78- Complex distributed embedded systems require CAD
tools to accurately analyze performance - If you dont have tools to help you analyze
performance, care is essential when
hand-designing a system that has to meet hard
real-time deadlines - It is important to make sure that only one task
the critical task is active - When there are several critical tasks that must
occur simultaneously, hand design requires
allocating them to share nothing no PEs nor
communication links
79Hardware Platform, Design, Allocation Scheduling
- Design choices
- number of PEs required
- types of all PEs
- number of networks required
- types (and data rates) of the networks
80Two Strategies
- For I/O intensive systems
- start with the I/O devices and their associated
processing - For computation intensive systems
- start with the processes/tasks
81I/O Intensive System Design
- Inventory the required I/O devices
- Determine which processing has deadlines short
enough that they cannot be met by any network
within your price range - I/O devices that do not require local processing
may be attached to the network with the
simplest available interface - Determine which devices can share a processing
element or network interface
82I/O Intensive System Design
- Analyze communication times to determine whether
critical communications may interfere with each
other - determine whether a complex network or multiple
networks may be required to satisfy
communication deadlines - Allocate the minimum required PE to go with each
I/O device - Design the rest of the system using the procedure
for computation-intensive systems
83Computation Intensive System Design
- Start with the tasks with the shortest deadlines
- the shorter the deadline for a task, the more
likely it is to require its own processing
element or elements - if a high-priority task shares a PE with a
low- priority task, not only will a more
expensive PE be required, but scheduling
overhead will be paid for at the nonlinear rate
84Computation Intensive System Design
- Analyze communication time to determine whether
critical communications may interfere with each
other - Allocate lower-priority tasks to shared PEs
where possible - After designing a basic system that meets
performance goals, improve it to satisfy power
consumption or other requirements