Title: iSCSI Extensions for RDMA (iSER)
1iSCSI Extensions for RDMA (iSER)
- draft-ko-iwarp-iser-02
- Mike Ko
- IBM
- August 2, 2004
2Agenda
- What is iSER?
- iSER connection setup
- Open issues
- iSER flow control
- Open issues
3iSCSI Datamover with RDMA Extensions
- The Datamover Architecture defines an abstract
model in which the movement of data between iSCSI
end nodes is logically separated from the rest of
the iSCSI protocol - Allows a datamover protocol layer to offload the
tasks of data movement and placement from the
iSCSI layer - The iSCSI Extensions for RDMA (iSER) protocol is
one such datamover protocol - Applies the Datamover Architecture in extending
the data transfer capabilities of iSCSI to
include RDMA (Remote Direct Memory Access) as
defined in the iWARP protocol suite - Allows iSCSI implementations to have data
transfers which achieve true zero copy behavior
using generic RDMA network interface controllers
(RNICs)
SCSI
iSCSI
Datamover Interface
iSER
Verbs
RDMAP
DDP
iWARP
MPA
TCP
4Connection Setup for iSER-assisted Modeat the
Initiator
- Negotiated key values may be passed by the iSCSI
layer to the iSER layer by invoking the
Notice_Key_Values Operational Primitive - Before sending the final Login Request, the iSCSI
layer invokes the Allocate_Connection_Resources
Operational Primitive to request the iSER layer
to allocate the iWARP resources for the
connection - After the target returns the final Login
Response, the iSCSI layer at the initiator
invokes the Enable_Datamover Operational
Primitive to request the iSER layer to transition
into iSER-assisted mode - The first message sent by the iSER layer at the
initiator to the target is the iSER Hello Message
5Connection Setup for iSER-assisted Modeat the
Target
- Negotiated key values may be passed by the iSCSI
layer to the iSER layer by invoking the
Notice_Key_Values Operational Primitive - Before sending the final Login Response, the
iSCSI layer invokes the Allocate_Connection_Resour
ces Operational Primitive to request the iSER
layer to allocate the iWARP resources for the
connection - The iSCSI layer invokes the Enable_Datamover
Operational Primitive to enable the iSER mode
qualified with the final Login Response PDU - The iSER layer sends the final Login Response PDU
in byte stream mode and then transitions into
iSER-assisted mode - After receiving the iSER Hello Message from the
initiator, the iSER layer at the target responds
by sending the iSER HelloReply Message
6Example of Successful iSER Connection Setup
A. SCSI Login Request PDU with
RDMAExtensionsYes B. SCSI Login Response PDU
with RDMAExtensionsYes C. Optional
Notice_Key_Values to pass values of negotiated
keys D. Allocate_Connection_Resources to set up
iWARP resources E. SCSI Login Request PDU with
T1 and NSGFullFeaturePhase F. Enable_Datamover
to go into iSER mode ( send last iSCSI PDU in
byte stream mode) G. SCSI Login Response PDU in
byte stream mode with T1 and NSGFullFeaturePhase
H. iWARP Send Message containing iSER Hello J.
iWARP Send Message containing iSER HelloReply
target
initiator
iSCSI Layer
iSER Layer
iSER Layer
iSCSI Layer
A
B
. . .
C
D
E
C
D
F
G
F
H
J
7Negotiation of RDMAExtensions in Leading
Connection Only
- From section 2.3 of iSER draft iSER-assisted
mode is negotiated during the iSCSI Login for
each connection, but an entire iSCSI session MUST
operate in one mode ... - Question Since RDMAExtensions is leading-only,
this statement is incorrect - Proposed change
- Replace the sentence with iSER-assisted mode is
negotiated during the iSCSI Login for each
session, and an entire iSCSI session MUST operate
in one mode ...
8CRC32C Protection in the Layer Below iSER
- From section 5.1 of iSER draft when the
RDMAExtensions key is negotiated to "Yes", the
HeaderDigest and the DataDigest keys MUST be
negotiated to "None" ... because ... the iWARP
protocol suite provides a CRC32c-based error
detection for all iWARP Messages - Recent updates to the MPA draft renders the use
of CRC optional - Disabling of CRCs should only be done when it is
clear that the connection through the network has
data integrity at least as good as a CRC - RDDP WGs position is that all ULPs can assume
CRC level or equivalent data protection - Proposed change Add the explicit requirement
that end-to-end CRC32C based error detection or
equivalent be provided in a layer below iSER
9Order of RDMAExtensions Key Negotiation and
Allocate_Connection Resources
- From section 5.1.1 (and similarly for section
5.1.2) If the outcome of the iSCSI negotiation
is to enable iSER-assisted mode, then on the
initiator side, ... the iSCSI Layer MUST invoke
the Allocate_Connection_Resources Operational
Primitive - Question The alternative approach of invoking
Allocate_Connection_Resources before negotiating
for iSER-assisted mode should be allowed - Current approach results in the connection being
torn down if the required resources cannot be
allocated - Alternative approach avoids this problem
- Resources must be deallocated if login fails
- Resources may have to be deallocated if the
negotiated values are less than the allocated
value - Proposed change Update the draft to allow the
alternative approach with the proviso that it is
the responsibility of the implementation to
deallocate the resources if the login fails or if
the negotiation values are less than the
allocated value
10Clarification on the Usage of the
Notice_Key_Values Primitive
- From section 5.1.1 Optionally, the iSCSI Layer
MAY invoke the Notice_Key_Values Operational
Primitive before invoking the Allocate_Connection_
Resources Operational Primitive - Question The word optionally is ambiguous
- Could mean the iSCSI layer may choose to invoke
the primitive - Or the iSCSI layer may choose to use that
primitive, or some other defined or undefined
primitive - Proposed change Remove the word optionally
11Requiring the Use of the Notice_Key_Values
Primitive
- From section 5.1.1 The iSCSI Layer MAY invoke
the Notice_Key_Values Operational Primitive to
request the iSER Layer to take note of the
negotiated values of the iSCSI keys for the
Connection - Question The word MAY should be replaced with
MUST to enforce the invocation of the primitive - Proposed change None
- If the default values are accepted for all the
negotiated keys, then there is no new information
to be passed from the iSCSI layer to the iSER
layer - Requiring a "MUST" instead of a "MAY would
require this primitive be invoked even though it
is not necessary - Also, it is not architecturally required for the
iSCSI layer to issue the Notice_Key_Values
primitive
12HeaderDigest, DataDigest, OFMarker, IFMarker in
iSER-assisted Mode
- From section 6.1 and 6.6 These 4 keys must be
negotiated to none or no if the
RDMAExtensions key is negotiated to yes - Question Draft seems to imply that these 4 keys
must be negotiated even for the defaults - Suggestion Negotiations resulting in
RDMAExtensionsYes for a session implies
HeaderDigestNone, DataDigestNone, OFMarkerNo,
and IFMarkerNo on all connections in that
session - Override both the default and explicit settings
- Proposed change Update the draft to reflect the
suggested change
13Scope of RDMAExtensions Key
- From section 6.3 RDMAExtensions key has
session-wide scope - Question Should iSER support mixed mode
sessions - Argument for
- Open an iSCSI connection when there are
insufficient resources to support an
iSER-assisted connection in allegiance
reassignment and the session is in iSER-assisted
mode - Flexibility on general principles
- Argument against
- RFC 3720 assumes homogeneous connections in a
session - Introducing mixed mode sessions would require
that the RFC3720 semantics be carefully thought
through to ensure correctness - The task states maintained by an iSCSI connection
may be different from those for an iSER-assisted
connection - iSER-assisted connection may require different LO
key values for optimization compared with iSCSI
connection - Test and debug effort will increase 2x to 3x for
mixed mode support - Proposed change None
14Clarification on the Order of RDMAExtensions Key
Negotiation
- From section 6.3 If the RDMAExtensions key is
to be negotiated, it must be offered only on the
initial Login Request PDU or Login Response PDU
of the leading connection, and if offered, the
response must be sent in the immediately
following Login Response or Login Request PDU
respectively. - Question Clarify when the negotiation response
is to be returned if the key is offered in a PDU
where the C-bit is set - Question Clarify that the negotiation takes
place in the LoginOperationalNegotiation stage of
the leading connection - Question Section 5.2.2 of RFC3720 states that a
response is optional if the Boolean function is
"AND" and the value "No" is received - iSER draft always requires a response to be
returned - However, since the default for RDMAExtensions is
no, it is unlikely that the key-value pair of
RDMAExtensionsno will be offered
15Clarification on the Order of RDMAExtensions Key
Negotiation (cont.)
- Proposed change Replace sentence with However,
if the RDMAExtensions key is to be negotiated, an
initiator MUST offer the key on the first Login
Request PDU in the LoginOperationalNegotiation
stage of the leading connection, and a target
MUST offer the key on the first Login Response
PDU with which it is allowed to do so (i.e., the
first Login Response issued after the first Login
Request with the C bit set to 0) in the
LoginOperationalNegotiation stage of the leading
connection. In response to the offered keyvalue
pair of RDMAExtensionsyes, an initiator MUST
respond on the next Login Request PDU with which
it is allowed to do so, and a target MUST respond
on the next Login Response PDU with which it is
allowed to do so.
16Order of RDMAExtensions Key Negotiation Response
- From section 6.3 RDMAExtensions key must be
offered for negotiation in the first PDU that a
node is allowed to do so and the response must be
returned in the immediately following PDU in
which a node is allowed to respond - Question Why must the RDMAExtensions key be
negotiated first? - Negotiating the RDMAExtensions key first allows a
node to optimally negotiate the value of other
keys - Certain iSCSI keys such as MaxBurstLength,
MaxOutstandingR2T, ErrorRecoveryLevel,
InitialR2T, ImmediateData, etc., may have
different optimization points depending on
whether iSER-assisted mode is to be enabled in
the iSCSI session - Proposed change Update the draft to include the
rationale for the order requirement
17Key Ordering Within a PDU
- From section 6.3 The RDMAExtensions key must
precede any other login keys which may be
affected by the outcome of the negotiation of the
RDMAExtensions key - Question This can be interpreted as requiring
key ordering within a PDU which is contrary to
RFC3720 - Proposed change Remove the sentence from the
draft
18iSER Flow Control
- For RDMA Send Type Messages
- The iSER protocol does not provide additional
flow control beyond that provided by the iSCSI
layer on control-type PDUs - An implementation should be able to take
advantage of iWARP Verbs mechanisms such as the
Shared Receive Queue mechanism to effectively
address the Send Message flow control question - For RDMA Read Resources
- In the iSER Hello Message, the iSER layer at the
initiator declares the maximum number of RDMA
Read Requests that the initiator can receive on
the particular RDMAP Stream (iSER-IRD) to the
target - This allows the iSER layer at the target to
adjust its resources if it can issue more RDMA
Read Requests than the initiator can handle - In the iSER HelloReply Message, the iSER layer at
the target declares the maximum number of RDMA
Read Requests that the target can issue on a
particular RDMAP Stream (iSER-ORD) to the
initiator - This allows the iSER layer at the initiator to
adjust its resources if it can handle more RDMA
Read Requests than the target can issue - The iSER layer at the target will flow control
the RDMA Read Request Messages to not exceed
iSER-ORD
19Flow Control for Control-Type PDU
- From section 8.1 The iSER Layer SHOULD
provision enough Untagged buffers for handling
incoming RDMAP Send Message Types to prevent a
buffer underrun condition - Question Should some form of send side flow
control be established for iSCSI control-type
PDUs? - Latest DDP draft, draft-ietf-rddp-ddp-02, no
longer mandates that a DDP stream be disabled for
a buffer underrun condition - Proposed change Further discussion is needed