Scribed by Gregory Chanan
Joy Jiang and Claudio DeSanti, The role of FCoE in I/O consolidation, Proceedings of the 2008 International Conference on Advanced Infocomm Technology
Summary:
This paper describes the trend towards data center I/O consolidation. Conventionally, there are different networks running in parallel, optimized for different uses: Ethernet for LAN, Fibre Channel for SAN and Infiniband for IPC. This setup has downsides compared to unifying the network: (1) increased number of hardware components, since a separate adapter is needed for each network, (2) increased power requirements to run the separate adapters, and (3) different and more numerous cables to support the different networks.
Unifying the network by running Fibre Channel over Ethernet (FCoE) solves these issues. Using Ethernet as the underlying network makes sense because (1) many applications already assume they are running on Ethernet, (2) 10G Ethernet has enough bandwidth to carry traffic in the consolidated network, and (3) the price of 10G Ethernet has come down.
FCoE is superior to the existing iSCSI mainly because an FcoE gateway is stateless, while an iSCSI gateway is stateful and thus is a single point of failure. An FCoE gateway can be stateless because it runs over loss-less Ethernet, so FCoE frames can be encapsulated in Ethernet frames. Loss-less Ethernet is achieved by implementing a ‘Pause’ frame to pace incoming frames to avoid overflowing buffers.
Discussion:
We discussed the history and size ($2.5) of the Fibre Channel market. Originally, writing to permanent storage in the data center was done via SCSI cables. Eventually the demand for this sort of operation grew from one CPU to one hard disk to a many-to-many (CPU-to-disk) problem which required a networking solution. Thus, Fibre Channel was born to make network writes look like SCSI writes.
Fibre Channel is a very structured protocol: each address has a domain ID, area ID, etc. This makes forwarding decisions easy, but limits the size and topology of some data centers. In this sense, it is similar to PortLand.
Fibre Channel does flow control via a packet-based buffer-to-buffer credit. This avoids dropping of packets, so in theory Fibre Channel is a lossless network (though bit errors can occur at rates of 10^-17 in practice).
The advantage of Fibre Channel over other reliable networks is its simple protocol and low overhead. This means that it can be implemented in hardware. In theory, this simplicity should mean that it is cheaper and can be provided by multiple vendors as a commodity product. This hasn’t been the case in practice, however. Part of the problem is that the standards are written loosely, which makes interoperability a problem. This has lead to a market in which EMC does all the service, and sells other’s hardware as part of a bundled solution. Thus, the market is small and Fibre Channel has not benefited from the economies of scale that Ethernet has.
Why is I/O consolidation only happening now? Part of the reason is that Ethernet continues to get faster, so there is enough bandwidth for the different networks to be run together without violating QOS.
We discussed why a pause signal was added to Ethernet over a buffer credit system. The reasons given were mostly political: the IEEE will not approve a buffer-to-buffer credit system, since they already have pause proposals.
Tom concluded by discussing the FCoE frame format. Several students noted that there are a large number of reserved bytes. The reason is to know ahead of time how long the frame is, in order to support a cut-through switch that sends out the head before the tail is received. This is not possible to support with a length field.
Opinion:
I was a bit disappointed by this paper. While the reasons for adopting FCoE are well argued, I felt there was a lack of critical analysis. For example, there was no discussion about whether loss-less Ethernet is a good idea in a consolidated network. There are certainly end-to-end argument concerns with such a proposal (recall network transparency in the Active Networking paper), but these are not addressed. Overall, this just felt more like a marketing paper than an academic analysis, which I guess is to be expected given the authors work at Finisar and Cisco.