Collaboration by Memory Snapshots

Sort of Sharing RAM

During the communications commencing here I tried to understand some problems with collaboration between suspicious users sharing mutable data in real RAM, even cache. A discussion with Kevin Reid crystalized a technique and a term that makes several techniques easy to understand and implement. The term is “snapshots”, in this case virtual. The connotations of the word do almost all the invention work. The benefit is that mostly no real copy is necessary but there is indeed a virtual copy.

The general scenario supported here supposes programs that read data structures whose format they know and protect themselves against corruption in those structures, but cannot protect themselves against data that change while they are reading them. See TOCTTOU.

During the discussion it emerged gradually that there are two security problems with naïve sharing of RAM, as provided by mmap of Unix. I (at least) was confusing the problems. The scenario is that X builds a large data structure D in RAM in an agreed-upon format, and needs to show D to Y. X and Y have independent integrity responsibilities and neither is in a position to rely on the other. Both X and Y must tolerate malformed shared data while preserving their integrity. The cost of copying the data is too large. There are several sub-cases all covered here:

Two things can go wrong:
  1. X may modify D while Y is reading it. It is much more difficult to maintain your integrity while reading changing data than it is to do so while reading constant data. If Y is assured that Y’s copy of D is mutable only by Y then the bothersome TOCTTOU vulnerabilities are avoided.
If it were possible for X to send a virtual snapshot of D to Y such that Y can verify that the snapshot is indeed immutable, except perhaps by Y then both problems are solved.

This also solves a scheduling problem: it may be very inconvenient for X to wait while Y reads.

Conventional hardware and some kernels provide virtual snapshots in order to support the Unix fork system call where the magic is COW. How much harder would it be to apply COW logic to mmap?

I don’t know the cost of the fork system call but today it is common to have gigabytes of memory in one’s address space upon fork. In such cases whole levels in the memory map as well as the pages are shared between subsequent sibling spaces and such mapping tables are shared between the sibling spaces, initially all of them. As sections of a sibling space are modified at most log512(memory size) mapping tables must be modified. (512 in X86 architectures) All the time most of the shared data proper stays uncopied in DRAM and even cache, or even on disk in the case of Keykos.

The same pattern is used in the OCaml Set module which implements constant finite sets in a balanced tree. An operation to add an element to such a set costs log(set size) time and returns a new set that shares almost all the storage of the old unmodified set which persists, as long as anyone retains a reference to it. Memory maps are hierarchical in the same sense and I presume that kernels exploit that; Keykos does.

An operation to report the differences between a segment and one of its descendants costs in proportion to the size of differences, not the size of the segment, just as in the Set module, at least if the two sets have a recent ancestor.

This pattern is easily extended over a communications link where only pages that Y reads are transmitted, but then the pattern is feasible only if it is feasible to transmit the read set. Another drawback is the sequential latencies implied by “transmit on page fault”.

What has all this to do with capabilities? Access to the virtual snapshot is via a capability.

The Keykos VCSK segment can be used to provide this service. X could pass the factory of snapshots to Y who could verify that it was such a factory and request a snapshot.

VCSK lacks the logic and coordination to recover pages no longer used. The authority necessary for this coordination defeats VCSK’s confinement properties that its original users required.

These ideas are natural when you view the memory map as capability hardware.

These ideas do not solve the concurrent data base update problem; there is no general melding of descendent segments combining the modifications. By faulting reads as well as writes it would be possible to detect read sets and write sets of the respective clients and sometimes deduce that neither actor affected the other, so as to allow a meld, but those tricks have been tried elsewhere and I do not know the results.

COW logic puts a membrane of sorts between the program and part of its memory. ‘Membrane’ here is in the sense used by the capability community. Sometimes the state of a data base is represented in mutable things that are not memory pages, such as the objects that the system defines. If these are reliably duplicatable like pages, then the same sort of membrane can be extended to include those in the data base state.

Alas these ideas are not conformable with these and I think that both are important!

Some details
Sample C code demonstrating RAM snapshots via the fork kernel call which provides circumstantial evidence that the children get their own access to the malloc data.

These folk already had a similar idea.
A similar ploy for OCaml


Here is a very similar plan serving the same general class of problems and involving similar low level hackery. Suppose that the command to produce a virtual snapshot actually severed the segment from the producer’s address space, conceptually zeroing those addresses at which the segment had resided. A capability to the severed segment could be sent elsewhere in a message where it could be inserted into another (or same) address space at an address of the recipient’s choosing. If the sender needs the data from the sent segment, perhaps to send a similar snapshot in another message, then some of the real cost of the copy must be paid, depending on in how many pages the snapshots differ. If the recipients need to modify the received segments the same costs apply. This is especially easy in Keykos and the data of the segment may remain unmoved in cache or DRAM and even the page tables for 64Kb segments would remain untouched. Making it manifest to the recipient that he has unique access to the segment is also possible.

I suspect that these two plans can be unified. Both these plans have interactions with memory ownership and space responsibility—even in Keykos.