The task was to allocate a single processor to pushing character streams thru a network of processing steps (nodes). The application required low latency and thus single character processing, and there was no buffering except as explicitly provided by one of the nodes. There were a dozen or so stream processing tasks that we needed and we foresaw a slowly growing set of new processes. The re-plumbing of these processing elements was to be dynamic.

I describe the implementation ideas in machine language for that is how they were conceived. I suspect that implementing these ideas in C might cost very substantial performance. I would love to be proven wrong. (2004: Perhaps I was wrong.) It is not clear to me how best to express this in C except to simulate a machine obeying pseudo machine language. I see now that the scheme amounts to an efficient use of continuations. Perhaps Scheme continuations could do the trick, but again not efficiently. Perhaps we could describe this code plan as heavy use of continuations in a context without garbage collection. And in Keykos?

As in SVR4 streams, the idea is to provide a small fixed set of stream transformations but with the ability to dynamically compose these into data-flow like networks of nodes. Each node in the net performs some particular transformation from the fixed set. Each node is implemented. as a small memory block with:

The format of a particular node must conform to the transformation code located by the first word of the node.

The Varian Data Machines 620 had three registers, A, B and X which fit our scheme as if it had been designed for us. The entry point for each transformation routine assumed that X would locate the node, Register A would hold the next character in the input stream to be processed by the node and B would locate the node to pull whenever more characters from the stream were required. In other words A is the first character and B is a promise for the rest. B is sort of like Scheme’s continuation.

Each node pointer within a node would correspond to a conceptual character stream. Each time that pointer was used, by branching to the code that it located, the next character of that stream had its fleeting existence.

So far we have described the situation for nodes that produce one output for each input but we have not described “pull”. Some routines have pull entry points that assume no input character. A node that produces an infinite stream of ‘a’s will have only a pull entry point. Branching to that entry point produces yet another ‘a’. At such an entry point neither A nor B are meaningful. In all cases the first word of block for a node will locate the entry point for some transformation. Both push and pull entry points assume that X locates a node which will know where any yield is to be delivered.

The above is the idea in a nut-shell. Below are some examples of stream processing. Dispatch is merely shared code.

Convert to upper case
For this transformation the node comprises: the code address (see below) and the addresses of the downstream node. The code is:
  1. if A holds a lower case letter, put corresponding upper case letter in A.
  2. Go to (not call) Dispatch. Note that B does not change.
Stream of ‘a’s (no input).
  1. Put ‘a’ in A.
  2. Copy X into B.
  3. Go to Dispatch.
Discard Stream (no output)
  1. Copy B to X.
  2. Go to Dispatch.
Stutter (Make “axb” into “aaxxbb”.)
The words of this node are
0: code address for transformation just below,
1: downstream node address,
2: place to keep character to be repeated,
3: place to keep promise for stream remainder,
4: code address to produce second instance of repeated character.
  1. Store B at X + (3 words).
  2. Put X + (4 words) into B.
  3. Store A at X + (2 words).
  4. Go to Dispatch.
The code whose address is located at X + (4 words) does:
  1. Subtract 4 words from X.
  2. Load A from X + (2 words).
  3. Load B from X + (3 words).
  4. Go to Dispatch.
Discard ‘x’s
  1. If A doesn’t hold ‘x’, go to Dispatch.
  2. Copy B to X.
  3. Go to Dispatch.
Duplicate Stream
Just like stutter but there are two distinct down stream node pointers.
Merge characters alternately from two streams
Node state is address of pull node for “other input stream”.
  1. Exchange B with word 2 of node.
  2. Go to Dispatch.
Dispatch
  1. Locate the address of the downstream node from X + (1 word), leaving new node address in X.
  2. Locate the entry point of the new node, using (new) X.
  3. Go to the new code.
Note that those transformations that go to Dispatch do so with the next character in A, the promise in B but the current node address in X. All nodes that point to such code have the downstream node address in word 1. (Most such nodes have an entry point address in word 0.)

There are probably several bugs in the above “code”. Note that most of the steps amount to one machine instruction. We average about 7 machine instructions per node per character processed, even on a severely “register starved” machine. Note also that in contrast to the presumably more efficient batching of blocks of characters, this scheme runs the character stream thru the processing nodes without placing them in RAM or even cache for each node.

Note that the code is entirely reentrant and uses no stack. It might be described as using register B as a link register and doing tail recursion. I don’t know how to code the stutter routine that way however as that seems to require explicitly naming the continuation.

Control

Streams would be explicitly built be normal program logic, either nearby or remote. Changes in the plumbing in mid stream would not be supported except perhaps by in band signaling within the stream and with the explicit cooperation of one of the processing elements.

At the edge of the network of nodes would be buffers for comm links in the case of Tymnet. These buffers are to improve link efficiency at minimal cost to latency.