I think out loud here about providing function to define new objects based on the Tahoe technology.

This morning (2010 Feb 12) Tyler mentioned a form of a file in Tahoe beginning with a list of Tahoe file handles followed by a list end marker followed by arbitrary data. I think that provoked several notions in peoples heads. Here is what remains in my head a few hours later.

Suppose that you have a Tahoe file handle to some large file and you wish to produce a view of that file thru some obscuring filter of your design. You also want to make this view highly available in the sense that Tahoe makes access to the file available during failure of system components. The filter is generally expressed in code and one or both of the following preclude the simple notion of running the filter on the original file to produce a full filtered version of the file as an new Tahoe file:

The obvious way to do this is to create a web server with the custom filter programmed in, and endow that server with the file handle to the original file. The next obvious question is how to handle the next such problem.

Jumping a few steps we arrive at the following architecture: Build a distributed class of fungible agents, called O-agents here, that play as web servers, each with a copy of the same secret RSA key Sx. When our client wants access to the filtered file he presents a Tahoe immutable file handle OH (for an ‘O-file’) to one of the O-agents. That agent fetches the bits of the O-file, using Tahoe. The file is encrypted via the RSA key. Having Sx allows the agent to see the bits which are indeed a list of Tahoe file handles and also some code in some language. The agent runs the code with access to:

In the case at hand there is a file handle to the original unfiltered file in the OH file—perhaps the only file handle there.

It should be noted here that there is yet no reason to segregate the handles within the OH file, indeed it seems impossible to enforce.

You (as in the early paragraph) had to build and encipher the OH file. That was easy for the public RSA key is well known at least among the users of the O-agents. You rely on the O-agents not to use or disseminate the deciphered OH files except as described here. Your customer relies on you, Tahoe and the O-agents to see the filtered file. There is no single point of run-time failure. You are in a position to learn access patterns of your client to the filtered data.

This use case is an easy one where access is provided to virtual immutable data. Successive queries to the O-agent are feasibly served by different O-agents. Stateful objects built this way would require O-agents to instantiate an object to persist thru a “sessions”. Stateful objects may be useful even in this case for efficiency. If the state is small then the O-agent can encrypt a new O-file and return the file or a new OH handle for that file. This is reminiscent of Actors. With this a “session” may span the availability of any particular O-agent.

The presentation of this immutable view is alas not polymorphic with access to immutable Tahoe files. Perhaps this can be fixed. The stability of such views relies on integrity of the O-agent and either the determinism of the code in the O-file, or the determinacy of the language in which the code is expressed together with lack of access to external signals, such as a clock.

This plan contravenes an implicit Tahoe dictum: “Trust no mechanism outside your physical control.”. The O-agents of a given RSA key pair rely on either tamper resistance, or more likely physical protection. I think that Tahoe also violates this property in their scheme for mutable objects. (I am not sure.) Such violation also seems necessary for revocation services. In all three cases the trust of externally instantiated objects impacts only those who specifically rely on them.

With tamper resistance on the client machine (and appropriate kernel support) one could implement the O-agent there. This probably requires attestation. This seems to me farther afield from today’s hardware and software technology than remote physically protected shared O-agents.

Further Ideas