Building a memory bus to switch several full duplex 10 Gb/s switches is not trivial—it is perhaps a network problem in itself. It makes one appreciate the fixed cell sizes of ATM. A switching fabric for large fixed sized packets consisting of many 1024 bit words, could use centralized combinatorial logic to plan routing over a distance of only a few cm. This logic could plan several nanosecond cycles ahead. The CrossBar Knuth’s “Sorting and Searching” describes arrangements of about 1.5∙n∙log(n) signal pair conditional swappers which applied serially and in parallel to a bundle of n signal paths, provide all n! permutations of those paths. Each swapper conditionally sends two signals thru straight, or swaps them. Our ability to plan ahead several clock cycles makes these hardware primitives inviting. For example here is the equivalent of a 16 element crossbar in 9 logic levels consisting of 61 2 element crossbars (page 231):

  1. 01, 23, 45, 67, 89, AB, CD, EF
  2. 02, 13, 46, 57, 8A, 9B, CE, DF
  3. 04, 15, 26, 37, 8C, 9D, AE, BF
  4. 08, 19, 2A, 3B, 4C, 5D, 6E, 7F
  5. 12, 3C, 48, 5A, 69, 7B, DE
  6. 14, 28, 3A, 59, 6C, 7D, BE
  7. 24, 35, 68, 79, AC, BD
  8. 36, 58, 7A, 9C
  9. 34, 56, 78, 9A, BC
Text such as “45” means that at logic level 1 we swap signals 4 and 5 if signal 4 is destined for a slot larger than signal 5 is. With 4 way fan-out and fan-in, any two consecutive levels of 2 element crossbars can be replaced by one logic level with 4 element crossbars. This test runs at 2,000,000 permutations per second but 16! is still too big. Sorting 108 random permutations works.

Perhaps the data stream from a fiber should go into consecutive DRAM addresses and a gather process, orchestrated by the software, should dispatch the data to the fiber on output. I imagine a path from fiber to DRAM with little switching which means that a particular DRAM receives data from a particular fiber and perhaps a given color range for that fiber. Packet headers would be delivered to the cache of the controlling CPU. The expensive memory crossbar would be traversed on the way to the outgoing link after the software had identified the outgoing link and built appropriate headers in cache.

Early papers on non-blocking crossbars from Bell Labs showed elegant solutions that sometimes could not be deployed because a new call between two customers, whose lines were not busy, could not be accommodated without rerouting calls in progress. The crossbars were relays at the bottom and changing several relays to form a new path would cause intolerable noise for ongoing calls. That problem does not impact us because we are moving digital signals that are discrete in time. The electronic crossbar we imagine here is a sort of systolic array and its controls can be produced by a systolic array of comparators of very small interface numbers, and this array can run a few clocks ahead of the crossbar proper. It is a lot of circuits and the layout may be difficult, but the logic is simple.

Packaging

How do you package these crossbars? Still a difficult problem—deferred for now.
I need to understand this. There is a great deal of jargon until you get down to section 2 (page 3) that describes a fairly concrete configuration. A “Bernoulli random process” means that a cell (fixed size) arrives at each input port with probability p and ports are statistically independent. At least the model rules are well stated. I think that I do not buy the model because it presumes to buffer the data twice. I aspire to buffering it just once. While their paper presumes independent Bernoulli inputs, I think it does not show that the output of such switches have the same property which would help thinking about a fabric of such crossbars.

See Color switching. Several leads from that note suggest schemes for mechanically separating optical signals, with different steering from moment to moment, but no indication that the signal can be then amplified and sent on its way at the same color on a different fiber. Lambda Switches are closer to what I had in mind.