Some notes as I read Jonathan’s large memories, mappings and SMP.

Ways of Zeroing Memory

Regarding cleaning: I recall seeing a hardware mechanism on some machine to which we did not port, that would zero a page asynchronously. There are any of several types of mechanism that would find this task an easy trick to learn—DMA for one. This would presumably require cleaning on speculation so that a clean page would usually be already available when needed. This requires more code hair than required for the CPU to map and clean the page itself. It might be too slow to keep up with demand in which case the CPU could do it too. It would be less portable.

We never did a detailed plan for such a mechanism.

Some CPU caches have an architected way of storing virtual zeros by manipulation of cache tags. This may only postpone the real work of setting RAM to 0 but if the cleaned page will soon be used to store data then the work may be avoided to the extent that new data is produced for the page while the corresponding cache line remains in cache. This trick probably costs cache space and only the time to fill such real cache lines with real zeros is avoided. While this puts a load on the cache, it is probably no worse than the alternatives. If the new page is to be shown to a bus master that does not go thru the cache, then difficulties arise. Such things are already necessary aside from zeroing memory however.

The Harvest computer did a trick like this in 1961. There was no cache and there was an extra bit per word in the real core that meant virtual zero regardless of the other bits in the cores. Extra core wiring could set that bit for predefined blocks of memory words in one memory cycle. It was reset by a normal write into the word. Upon read, a zero word was presented to the reader when the bit was set. Here is a justifying application: We have 1000 20 bit numbers. Find any duplicates. Set out an array of 220 zero bits. For each 20 bit number set the corresponding bit to 1 but report if it was already one. The normal machine would spend more time initializing the array than in the productive loop. I first heard of this class of problem from Lehmer in 1953 who was studying collisions in 3D random walks.


VCSK.
Talk of tree like zeroing strategy as defined by Keykos segment keepers.