Inter-Processor Communication for Heterogeneous Dual Core Systems

About This Presentation

Title:

Inter-Processor Communication for Heterogeneous Dual Core Systems

Description:

Nokia DSP Gateway. TI DSP/BIOS Link. IPC Hardware Architecture. Conclusions. IPC Overview ... drwxr-xr-x arch. drwxr-xr-x block. drwxr-xr-x crypto. drwxr-xr-x ... – PowerPoint PPT presentation

Number of Views:98

Avg rating:3.0/5.0

Slides: 110

Provided by: chunmin

Category:

more less

Transcript and Presenter's Notes

Title: Inter-Processor Communication for Heterogeneous Dual Core Systems

1
Inter-Processor Communication for Heterogeneous
Dual Core Systems
2006/09/27

Chun-Ming Huang, Ph.D.
National Chip Implementation Center (CIC)
cmhuang_at_cic.org.tw

2
Agenda

IPC Overview
IPC Schemes
Nokia DSP Gateway
TI DSP/BIOS Link
IPC Hardware Architecture
Conclusions

3
IPC Overview
4
What is IPC?

Inter-Process Communication
Inter-Processor Communication

Single-Chip Multi-Chip
Single-Core
Multi-Core
How to provide inter-process communication
services for multi-core systems?
5
Independent Cooperating Process

Processes executing concurrently in the
multitasking environment may be either
independent processes or cooperating processes
A process is independent if it cannot affect or
be affected by the other processes executing in
the system any process that does not share data
with any other process is independent
A process is cooperating if it can affect or be
affected by the other processes executing in the
system any process that shares data with other
processes is a cooperating process

Silberschatz, et al., Operating System
Principles, Seventh Edition
6
Why Allow Process Cooperation?

Information sharing
Computation speedup
Modularity
Convenience
Cooperating processes requires an inter-process
communication (IPC) mechanism that will allow
them to exchange data and information

Silberschatz, et al., Operating System
Principles, Seventh Edition
7
IPC Example

Unix pipe
ls l / grep 2005 wc
2 19 98
The grep utility searches text files for a
pattern and prints all lines that contain that
pattern.
The wc utility displays a count of lines, words
and characters in a text file.
Data exchange
Synchronization

8
Operating System Kernel Components

Process scheduler
determines when and for how long a process
execute on a processor
Memory manager
determines when and how memory is allocated to
processes and what to do when memory becomes full
I/O manager
services input and output requests from and to
hardware devices
Inter-process communication (IPC) manager
allows processes to communicate with one other
File system manager
organizes named collections of data on storage
devices and provides an interface for accessing
data on those devices

Deitel, et al., Operating Systems, Third Edition
9
Linux Kernel 2.6.17.11
drwxr-xr-x arch drwxr-xr-x block drwxr-xr-x
crypto drwxr-xr-x drivers drwxr-xr-x fs
drwxr-xr-x include drwxr-xr-x init drwxr-xr-x
ipc drwxr-xr-x kernel drwxr-xr-x lib
drwxr-xr-x mm drwxr-xr-x net drwxr-xr-x
scripts drwxr-xr-x security drwxr-xr-x sound
drwxr-xr-x usr
-rw-r--r-- Makefile -rw-r--r-- compat.c -rw-r--r--
compat_mq.c -rw-r--r-- mqueue.c -rw-r--r--
msg.c -rw-r--r-- msgutil.c -rw-r--r--
sem.c -rw-r--r-- shm.c -rw-r--r--
util.c -rw-r--r-- util.h
http//www.kernel.org
10
Machine-Independent SW in the FreeBSD Kernel
Category Lines of Code Percentage of Kernel ()
Headers 38,158 4.8
initialization 1,663 0.2
kernel facilities 53,805 6.7
generic interfaces 22,191 2.8
interprocess communication 10,019 1.3
terminal handling 5,798 0.7
virtual memory 24,714 3.1
vnode memory 22,764 2.9
local filesystem 28,067 3.5
miscellaneous filesystems (19) 58,753 7.4
network filesystem 22,436 2.8
network communication 46,570 5.8
Internet V4 protocols 41,220 5.2
Internet V6 protocols 45,527 5.7
IPsec 17,956 2.2
netgraph 74,338 9.3
cryptographic support 7,515 0.9
GEOM layer 11,563 1.4
CAM layer 41,805 5.2
ATA layer 14,192 1.8
ISA bus 10,984 1.4
PCI bus 72,366 9.1
pccard bus 6,916 0.9
Linux compatibility 10,474 1.3
Total Machine Independent 689,794 86.4
McKusic Neville-Neil, The Design and
Implementation of the FreeBSD Operating System
11
Homogeneous vs. Heterogeneous
Sun
TI OMAP 5910
12
Multiprocessor OS Organizations

Can classify systems based on how processors
share operating system responsibilities
Three types
Master/slave
Separate kernels
Symmetrical organization

Deitel, et al., Operating Systems, Third Edition
13
Master/Slave

Master/Slave organization
Master processor executes the operating system
Slaves execute only user processors
Hardware asymmetry
Low fault tolerance
Good for computationally intensive jobs
Example nCUBE system

Deitel, et al., Operating Systems, Third Edition
14
Separate Kernels

Separate kernels organization
Each processor executes its own operating system
Some globally shared operating system data
Loosely coupled
Catastrophic failure unlikely, but failure of one
processor results in termination of processes on
that processor
Little contention over resources
Example Tandem system

Deitel, et al., Operating Systems, Third Edition
15
Symmetrical Organization

Symmetrical organization
Operating system manages a pool of identical
processors
High amount of resource sharing
Need for mutual exclusion
Highest degree of fault tolerance of any
organization
Some contention for resources
Example BBN Butterfly

Deitel, et al., Operating Systems, Third Edition
16
Memory Access Architectures

Memory access
Can classify multiprocessors based on how
processors share memory
Goal Fast memory access from all processors to
all memory
Contention in large systems makes this impractical

Deitel, et al., Operating Systems, Third Edition
17
Uniform Memory Access

Uniform memory access (UMA) multiprocessor
All processors share all memory
Access to any memory page is nearly the same for
all processors and all memory modules
(disregarding cache hits)
Typically uses shared bus or crossbar-switch
matrix
Also called symmetric multiprocessing (SMP)
Small multiprocessors (typically two to eight
processors)

Deitel, et al., Operating Systems, Third Edition
18
Uniform Memory Access
Deitel, et al., Operating Systems, Third Edition
19
Non-Uniform Memory Access

Non-uniform memory access (NUMA) multiprocessor
Each node contains a few processors and a portion
of system memory, which is local to that node
Access to local memory faster than access to
global memory (rest of memory)
More scalable than UMA (fewer bus collisions)

Deitel, et al., Operating Systems, Third Edition
20
Non-Uniform Memory Access
Deitel, et al., Operating Systems, Third Edition
21
Cache-Only Memory Architecture

Cache-only memory architecture (COMA)
multiprocessor
Physically interconnected as a NUMA is
Local memory vs. global memory
Main memory is viewed as a cache and called an
attraction memory (AM)
Allows system to migrate data to node that most
often accesses it at granularity of a memory line
(more efficient than a memory page)
Reduces the number of cache misses serviced
remotely
Overhead
Duplicated data items
Complex protocol to ensure all updates are
received at all processors

Deitel, et al., Operating Systems, Third Edition
22
Cache-Only Memory Architecture
Deitel, et al., Operating Systems, Third Edition
23
No Remote Memory Access

No-remote-memory-access (NORMA) multiprocessor
Does not share physical memory
Some implement the illusion of shared physical
memoryshared virtual memory (SVM)
Loosely coupled
Communication through explicit messages
Distributed systems
Not networked system

Deitel, et al., Operating Systems, Third Edition
24
No Remote Memory Access
Deitel, et al., Operating Systems, Third Edition
25
Four Possible Cases
Symmetrical OSs Asymmetrical OSs
Homogeneous Cores CPU_A(OS_X) CPU_A(OS_X) CPU_A(OS_X) CPU_A(OS_Y)
Heterogeneous Cores CPU_A(OS_X) CPU_B(OS_X) CPU_A(OS_X) CPU_B(OS_Y)
26
IPC Schemes
27
Communication via Files

Communication via files is in fact the oldest way
of exchanging data between programs. Program A
writes data to a file and Program B reads it. In
a system in which only one program can be run at
any given time, this does not present any
problem.
In a multitasking system, however both programs
could be run as processes at least quasi-parallel
to each other. Race conditions then usually
produce inconsistencies in the file data which
result from one program reading a data area
before the other has finished modifying it, or
both processes modifying the same area of memory
at the same time.

28
Communication via Files

Locking entire files
lock file
fcntl( ) (POSIX), flock( ) (BSD 4.3)
Locking file areas (record locking)
Deadlock

29
Process Communication Models

Message passing
Shared memory

Silberschatz, et al., Operating System
Principles, Seventh Edition
30
IPC for Linux

Linux IPC
Many IPC mechanisms derived from traditional UNIX
IPC
Allow processes to exchange information
Some are better suited for particular
applications
For example, those that communicate over a
network or exchange short messages with other
local applications

Deitel, et al., Operating Systems, Third Edition
31
IPC for Linux

Signal
Pipe
Message queue
Shared memory
System V Semaphores
Sockets

32
Signals

Signals
One of the first interprocess communication
mechanisms available in UNIX systems
Kernel uses them to notify processes when certain
events occur
Do not allow processes to specify more than a
word of data to exchange with other processes
Created by the kernel in response to interrupts
and exceptions, are sent to a process or thread
as a result of executing an instruction (such as
a segmentation fault)
from another process (such as when one process
terminates another)
from an asynchronous event

Deitel, et al., Operating Systems, Third Edition
33
POSIX Signals
Deitel, et al., Operating Systems, Third Edition
34
Signals

A process/thread can handle a signal by
Ignore the signalprocesses can ignore all but
the SIGSTOP and SIGKILL signals.
Catch the signalwhen a process catches a signal,
it invokes its signal handler to respond to the
signal.
Execute the default action that the kernel
defines for that signal
Default actions
Abort terminate immediately
Memory dump Copies execution context before
exiting
Ignore
Stop (i.e., suspend)
Continue (i.e., resume)

Deitel, et al., Operating Systems, Third Edition
35
Signals

Signal blocking
A process or thread can block a signal
Signal is not delivered until process/thread
stops blocking it
While a signal handler is running, signals of
that type are blocked by default
Still possible to receive signals of a different
type
Common signals are not queued
Real-time signals provide signal queuing

Deitel, et al., Operating Systems, Third Edition
36
Pipes

Pipes ?
Producer process writes data to the pipe, after
which the consumer process reads data from the
pipe in first-in-first-out order
When pipe is created, an inode that points to
pipe buffer (page of data) is created
Access to pipes is controlled by file descriptors
Can be passed between related processes (e.g.,
parent and child)
Named pipes (FIFOs) ?
Can be accessed via the directory tree
Limitation Fixed-size buffer

Deitel, et al., Operating Systems, Third Edition
37
Message Queues

Message queues
Allow processes to transmit information that is
composed of a message type and a variable-length
data area
Stored in message queues, remain until a process
is ready to receive them
Related processes can search for a message queue
identifier in a global array of message queue
descriptors
Message queue descriptor contains
Queue of pending messages
Queue of processes waiting for messages
Queue of processes waiting to send messages
Data describing the size and contents of the
message queue

Deitel, et al., Operating Systems, Third Edition
38
Shared Memory

Shared memory protection schemes
Advantages
Improves performance for processes that
frequently access shared data
Processes can share as much data as they can
address
Standard interfaces
System V shared memory
POSIX shared memory
Does not allow processes to change privileges for
a segment of shared memory

Deitel, et al., Operating Systems, Third Edition
39
System V Shared Memory System Calls
Deitel, et al., Operating Systems, Third Edition
40
Shared Memory

Shared memory implementation
Treats region of shared memory as a file
Shared memory page frames are freed when file is
deleted
Tmpfs (temporary file system) stores such files
Tmpfs pages are swappable
Permissions can be set
File system does not require formatting

Deitel, et al., Operating Systems, Third Edition
41
System V Semaphores

System V semaphores
Designed for user processes to access via the
system call interface
Semaphore arrays
Protect a group of related resources
Before a process can access resources protected
by a semaphore array, the kernel requires that
there be sufficient available resources to
satisfy the processs request
Otherwise, kernel blocks requesting process until
resources become available
Preventing deadlock
When a process exits, the kernel reverses all the
semaphore operations it performed to allocate its
resources

Deitel, et al., Operating Systems, Third Edition
42
Sockets

Sockets
Allows pairs of processes to exchange data by
establishing direct bidirectional communication
channels
Primarily used for bidirectional communication
between multiple processes on different systems,
but can be used for processes on the same system
Stored internally as files
File name used as sockets address, accessed via
the VFS

Deitel, et al., Operating Systems, Third Edition
43
Sockets

Stream sockets
Implement the traditional client/server model
Data is transferred as a stream of bytes
Use TCP to communicate, so they are more
appropriate for reliable communication
Datagram sockets
Faster, but less reliable communication
Data is transferred using datagram packets
Socketpairs
Pair of connected, unnamed sockets
Limited to use by processes that share file
descriptors

Deitel, et al., Operating Systems, Third Edition
44
sf01acmhuang/ ipcs IPC status from ltrunning
systemgt as of Thu Sep 21 143530 CST 2006 T
ID KEY MODE OWNER
GROUP Message Queues Shared Memory m
1 0x50000d1d --rw-r--r-- root root m
2 0xabbaca01 --rw-rw-rw- pc62
TR m 3103 0 --rw-rw-rw- cmhuang
DSD m 1404 0 --rw-rw-rw-
root root Semaphores s 0 0x1
--ra-ra-ra- root root s 2031617
0 --ra-ra-ra- cmhuang DSD s
917506 0 --ra-ra-ra- cmhuang DSD
45
IPC for WinXP

Data oriented
Pipes
Mailslots (message queues)
Shared memory
Procedure oriented / object oriented
Remote procedure calls
Microsoft COM objects
Clipboard
GUI drag-and-drop capability

Deitel, et al., Operating Systems, Third Edition
46
Pipes

Manipulated with file system calls
Read
Write
Open
Pipe server
Process that creates pipe
Pipe clients
Processes that connect to pipe
Modes
Read pipe server receives data from pipe clients
Write pipe server sends data to pipe clients
Duplex pipe server sends and receives data

Deitel, et al., Operating Systems, Third Edition
47
Pipes

Anonymous Pipes
Unidirectional
Between local processes
Synchronous
Pipe handles, usually passed through inheritance
Named Pipes
Unidirectional or bidirectional
Between local or remote processes
Synchronous or asynchronous
Opened by name
Byte stream vs. message stream
Default mode vs. write-through mode

Deitel, et al., Operating Systems, Third Edition
48
Mailslots

Mailslot server creates mailslot
Mailslot clients send messages to mailslot
Communication
Unidirectional
No acknowledgement of receipt
Local or remote communication
Implemented as files
Two modes
Datagram for small messages
Server Message Block (SMB) for large messages

Deitel, et al., Operating Systems, Third Edition
49
Shared Memory

File mapping
Processes map their virtual memory to same page
frames in physical memory
Multiple processes access same file
No synchronization guaranteed
File mapping object
Maps file to main memory
File view
Maps a processs virtual memory to main memory
mapped by file mapping object

Deitel, et al., Operating Systems, Third Edition
50
Nokia DSP Gateway
51
Nokia DSP Gateway Overview

Supports TI OMAP1510, 1610, 5910, 5912, 2410, and
2412.
GPP side
Linux kernel 2.6.6?
Linux device driver
Access DSP through normal system calls such as
read() and write()
DSP side
TI DSP/BIOS
DSP kernel library (tokliBIOS) and API

http//dspgateway.sourceforge.net/pub/index.php
52
Nokia DSP Gateway Overview

Current version 3.3.1 (2006-09-13)
Open source software
Current license state

Release License License
1.0 GPL GPL
2.X GPL GPL
3.X ARM pack DSP pack
3.X GPL BSD
53
TI OMAP 1610
54
Summary of changes from v2.6.5 to v2.6.6

lttony_at_com.rmk.(none)gt ARM PATCH 1777/1 Add
TI OMAP support to ARM core files Patch from
Tony Lindgren This patch updates the ARM Linux
core files to add support for Texas Instruments
OMAP-1510, 1610, and 730 processors. OMAP is an
embedded ARM processor with integrated DSP.
OMAP-1610 has hardware support for USB OTG,
which might be of interest to Linux developers.
OMAP-1610 could be easily be used as development
platform to add USB OTG support to Linux. This
patch is an updated version of an earlier patch
1767/1 with the dummy Kconfig added for OMAP as
suggested by Russell King here
http//www.arm.linux.org.uk/developer/patches/vi
ewpatch.php?id1767/1 This patch is brought to
you by various linux-omap developers.
http//www.kernel.org/pub/linux/kernel/v2.6/Change
Log-2.6.6
55
TI DSP/BIOS

Scalable real-time kernel
Real-time scheduling and synchronization
Host-to-target communication
Real-time instrumentation
Preemptive multi-threading
Hardware abstraction
Real-time analysis and configuration tools
Application programs use DSP/BIOS by making calls
to the API
All DSP/BIOS modules provide C-callable interfaces

56
DSP Gateway System Architecture
57
Mailbox in OMAP1

Each set of mailbox registers consists of two
16-bit registers and a 1-bit flag register.
The interrupting processor can use one 16-bit
register to pass a data word to the interrupted
processor and the other 16-bit register to pass a
command word.

58
Mailbox in OMAP2

6 sets of mailbox registers, and each message
register can carry a 32-bit data
two mailbox queues are reserved, MAILBOX_0 for
ARM to DSP direction and MAILBOX_1 for DSP to ARM
direction

59
Mailbox Command and Data Register

Command register bit definitions
Data register bit definitions

60
Mailbox Command Definition
61
Mailbox Command Sequence

Configuration sequence
System configuration
Task configuration
Task add/delete
Data transfer sequence
ARM to DSP transfer
DSP to ARM transfer
Task control
Read/write DSP register
Read/write DSP system parameters

62
System Configuration Sequence
63
DSPCFG Command
64
ARM to DSP Passive Word Receiving
65
ARM to DSP Active Word Receiving
66
ARM to DSP Passive Block Receiving
67
IPC Buffer

It is unrealistic to transfer a large amount of
data between two processors with only mailbox
registers. Therefore, IPBUF (Inter-Processor
Buffer) is introduced for the large block data
transfer.
There are three types of IPBUFs
Global IPBUF
Private IPBUF
System IPBUF

68
Global IPBUF

The Global IPBUFs are defined for the block data
transfer between ARM and DSP.
The Global IPBUF lines are identified with BID
(Buffer ID), and all tasks can use them commonly.
The maximum line size is 64k words (128k bytes).

69
Global IPBUF
70
DSP Gateway Linux Device Interfaces
71
DSP Gateway Linux APIs
72
Passive Receiving Task
73
Active Receiving Task
74
TI DSP/BIOS Link
75
TI DSP/BIOS Link

For TI OMAP5910/5912, Davinci, and DM642 devices.
DSP/BIOS Link is a no-charge, royalty-free
product and is provided in C source code form.
Current version 1.30.06 (Nov. 22, 2005)
Portable across different operating systems.
OS (GPP) DSP/BIOS (DSP)

http//focus.ti.com/dsp/docs/dspsupportatn.tsp?sec
tionId3tabId477familyId44toolTypeId5
76
DSP/BIOS Link Supported Platforms

Davinci running Montavista Linux Pro 4.0 or
PrKernel v4.1 on ARM
OMAP5912 running Montavista Linux Pro 3.1 on ARM
DA300 running PrKernel v4.1 on ARM
DM642 connected to a PC running Red Hat Linux 9.0
or Red Hat Enterprise Linux 4.0

77
Software Architecture of DSP/BIOS Link
78
On the GPP Side

The OS ADAPTATION LAYER encapsulates the generic
OS services that are required by the other
components of DSP/BIOS LINK. This component
exports a generic API that insulates the other
components from the specifics of an OS. All other
components use this API instead of direct OS
calls. This makes DSP/BIOS LINK portable across
different operating systems.
The LINK DRIVER encapsulates the low-level
control operations on the physical link between
the GPP and DSP. This module is responsible for
controlling the execution of the DSP and data
transfer using defined protocol across the
GPP-DSP boundary.

79
On the GPP Side

The PROCESSOR MANAGER maintains book-keeping
information for all components. It also allows
different boot-loaders to be plugged into the
system. It builds exposes the control operations
provided by the LINK DRIVER to the user through
the API layer.
The DSP/BIOS LINK API is interface for all
clients on the GPP side. This is a very thin
component and usually doesnt do any more
processing than parameter validation. The API
layer can be considered as skin on the muscle
mass contained in the PROCESSOR MANAGER and LINK
DRIVER.

80
On the DSP Side

The LINK DRIVER is one of the drivers in
DSP/BIOS. This driver specializes in
communicating with the GPP over the physical
link.
There is no specific DSP/BIOS LINK API on the
DSP. The communication (data/message transfer) is
done using the DSP/BIOS modules - SIO/GIO/MSGQ.

81
DSP/BIOS Link Key Components

PROC
This component represents the DSP processor in
the application space.
This component provides services to
Initialize the DSP make it available for access
from the GPP.
Load code on the DSP.
Start execution from the run address specified in
the executable.
Read from or write to DSP memory.
Stop execution.
Additional platform-specific control actions.
In the current version, only one processor is
supported. However, the APIs are designed to
support multiple DSPs and hence they accept a
processorID argument to support this future
enhancement.

82
DSP/BIOS Link Key Components

CHNL
This component represents a logical data transfer
channel in the application space.
CHNL is responsible for the data transfer across
the GPP and DSP.
CHNL is an acronym for channel.
A channel (when referred in context of DSP/BIOS
LINK) is
A means of transferring data across GPP and DSP.
A logical entity mapped over a physical
connectivity between the GPP and DSP.
Uniquely identified by a number within the range
of channels for a specific physical link towards
a DSP.
Unidirectional. The direction of a channel is
decided at run time based on the attributes
passed to the corresponding API.

83
DSP/BIOS Link Key Components

MSGQ
This component represents queue based messaging
This component is responsible for exchanging
short messages of variable length between the GPP
and DSP clients. It is based on the MSGQ module
in DSP/BIOS.
The messages are sent and received through
message queues.
A reader gets the message from the queue and a
writer puts the message on a queue. A message
queue can have only one reader and many writers.
A task may read from and write to multiple
message queues.

84
DSP/BIOS Link Key Components

POOL
This component provides APIs to open and close
memory pools, which are used by the CHNL and MSGQ
component for allocating the buffers used in data
transfer and messaging respectively.
This component is responsible for providing a
uniform view of different memory pool
implementations, which may be specific to the
hardware architecture or OS on which DSP/BIOS
LINK is ported. This component is based on the
POOL interface in DSP/BIOS.

85
Initialization Phase API

PROC
PROC_Setup()
PROC_Attach()
PROC_Load()
CHNL
CHNL_Create()
CHNL_AllocateBuffer()
MSGQ
MSGQ_TransportOpen()
MSGQ_Open()
MSGQ_SetErrorHandler()
MSGQ_Locate()
POOL
POOL_Open()

86
Execution Phase API

PROC
PROC_Start()
PROC_Read()
PROC_Write()
PROC_Stop()
CHNL
CHNL_Issue()
CHNL_Reclaim()
MSGQ
MSGQ_Alloc()
MSGQ_Put()
MSGQ_Get()
MSGQ_GetSrcQueue()
MSGQ_Free()

87
Finalization Phase API

PROC
PROC_Detach()
PROC_Destroy()
CHNL
CHNL_FreeBuffer()
CHNL_Delete()
MSGQ
MSGQ_Release()
MSGQ_TransportClose()
MSGQ_Close()
POOL
POOL_Close()

88
IPC Hardware Architecture
89
Tightly Coupled vs. Loosely Coupled Systems

Tightly coupled systems
Processors share most resources including memory
Communicate over shared buses using shared
physical memory
Loosely coupled systems
Processors do not share most resources
Most communication through explicit messages or
shared virtual memory (although not shared
physical memory)
Comparison
Loosely coupled systems more flexible, fault
tolerant, scalable
Tightly coupled systems more efficient, less
burden to operating system programmers

Deitel, et al., Operating Systems, Third Edition
90
Tightly Coupled Systems
Deitel, et al., Operating Systems, Third Edition
91
Loosely Coupled Systems
Deitel, et al., Operating Systems, Third Edition
92
Processor Interconnection Schemes

Interconnection scheme
Describes how the systems components, such as
processors and memory modules, are connected
Consists of nodes (components or switches) and
links (connections)
Parameters used to evaluate interconnection
schemes
Node degree
Bisection width
Network diameter
Cost of the interconnection scheme

Deitel, et al., Operating Systems, Third Edition
93
Processor Interconnection Schemes
Shared bus multiprocessor organization.
Deitel, et al., Operating Systems, Third Edition
94
Processor Interconnection Schemes
Crossbar-switch matrix multiprocessor
organization.
Deitel, et al., Operating Systems, Third Edition
95
Processor Interconnection Schemes
4-connected 2-D mesh network.
Deitel, et al., Operating Systems, Third Edition
96
Processor Interconnection Schemes
3- and 4-dimensional hypercubes.
Deitel, et al., Operating Systems, Third Edition
97
Processor Interconnection Schemes
Multistage baseline network.
Deitel, et al., Operating Systems, Third Edition
98
A Simple IPC Architecture

ARM writes command in shared memory
ARM interrupts DSP
DSP responds to interrupt and reads command in
shared memory
DSP executes a task based on the command
DSP interrupts ARM upon completion of the task

TMS320DM644x DMSoC ARM Subsystem Reference Guide
(SPRUE14)
99
TI OMAP5910
100
OMAP5910 IPC Architecture

Mailbox registers
Each direction 32bit x 2
Interrupt occurrence
MPU interface (MPUI)
MPU accesses DSP memory space directly
Shared memory
Arrangement with the Traffic Controller
3 type of memories
Best suitable to large amount of data sharing

101
Traffic Controller (TC)

The IMIF allows access to the 192K bytes of
on-chip SRAM.
The EMIFS interface provides 16-bit-wide access
to asynchronous or synchronous memories.
The EMIFF Interface provides access to
16-bit-wide access to standard SDRAM memories.
The TC provides the functions of
arbitrating contending accesses to the same
memory interface from different initiators (MPU,
DSP, System DMA, Local Bus),
synchronization of accesses due to the initiators
and the memory interfaces running at different
clock rates,
and the buffering of data allowing burst access
for more efficient multiplexing of transfers from
multiple initiators to the memory interfaces.
The TCs architecture allows simultaneous
transfers between initiators and different memory
interfaces without penalty. For instance, if the
MPU is accessing the EMIFF at the same time, the
DSP is accessing the IMIF, transfers may occur
simultaneously since there is no contention for
resources.

102
ARM IPCM Module

The IPCM provides up to 32 mailboxes with control
logic and interrupt generation to support
inter-processor communication.
An AHB interface enables access from source and
destination cores.
The IPCM
sends interrupts to other cores
passes small amounts of data to other cores.
A source core can have multiple mailboxes and
send messages in parallel (multitasking).

PrimeCell Inter-Processor Communications Module
Technical Reference Manual
103
IPCM Components

1-32 programmable mailboxes, each comprising
a single 1-32-bit Mailbox Source Register
a single 1-32-bit Mailbox Destination Register
a single 2-bit Mailbox Mode Register
a single 1-32-bit Mailbox Mask Register
a single 2-bit Mailbox Send Register
0-7 32-bit data registers to store the message.
1-32 sets of read-only interrupt status
registers, one for each interrupt, each
comprising
1-32-bit Raw Interrupt Status Register (each bit
corresponds to each mailbox)
1-32-bit Masked Interrupt Status Register (each
bit corresponds to each mailbox).
A 32-bit Configuration Status Register

104
IPCM Functional Block
PrimeCell Inter-Processor Communications Module
Technical Reference Manual
105
IPCM Example
106
IPCM Example

Core0 has a message to send to Core1. Core0
claims the mailbox by setting bit 0 in the
Mailbox Source Register. Core0 then sets bit 1 in
the Mailbox Destination Register, enables the
interrupts and programs the message into the
Mailbox Data Registers. Finally, Core0 sends the
message by writing 01 to the Mailbox Send
Register. This asserts the interrupt to Core1.
When Core1 is interrupted, it reads the Masked
Interrupt Status Register for IPCMINT1 to
determine which mailbox contains the message.
Core1 reads the message in that mailbox, then
clears the interrupt and asserts the acknowledge
interrupt by writing 10 to the Mailbox Send
Register.
Core0 is interrupted with the acknowledge
message, completing the operation. Core0 then
decides whether to retain the mailbox to send
another message or release the mailbox, freeing
it up for other cores in the system to use it.