GNOSIS: A Prototype Operating System for the 1990's

Bill Frantz
Norm Hardy
Jay Jonekait
Charlie Landau

Tymshare Inc.
Cupertino, California

Copyright © Key Logic, Inc., 1979. All rights reserved. Permission to reproduce and redistribute this document in paper or electronic form is hereby granted, provided that this copyright notice remains intact>

Bill Frantz
Norm Hardy
Jay Jonekait
Charlie Landau

Tymshare Inc.
Cupertino, California

This session offered a brief introduction to the GNOSIS operating system being developed by TYMSHARE, INC. to run on 37O architecture hardware. GNOSIS offers new approaches to solving the inherent problems of security, auditability, and reliability which are a result of the basic design of current systems.

The presentation covered the motivation for building the system, the key architectural features which distinguish it from current systems and an introduction to the implementation and application conversion concepts.

The text of the presentation follows.

This session is being sponsored by the Basic Systems Division, by the LSRAD task force as a kind of consciousness raising session. Gnosis, which stands for the Great New Operating System In the Sky, is an example of a completely different kind of operating system. We would like for you to leave this presentation with the thought that all operating systems do not need to be alike. Perhaps Gnosis is not the best of all possible worlds for all users, but it does demonstrate that there are other ways to solve problems.

Gnosis is being developed by TYMSHARE as a proprietary control program. We will develop proprietary application packages to run on it, as will our customers. My management has asked me to emphasize that we won't be giving Gnosis away, and SHARE management has asked me not to say that it is for sale.

We will have three short presentations this morning, to be given by the other individuals who, along with myself, Constitute the entire Gnosis design, implementation, testing, documentation, and for now, marketing group. Norm Hardy, the senior architect who is solely responsible for getting the project started, will speak first on the reasons we are building Gnosis. Then Bill Frantz will explain the design concepts of Gnosis. Finally, Charlie Landau will give a very brief introduction to the implementation of Gnosis.

1. Introduction and Background

Norm Hardy

1.1. Why do we start another operating system?

We don't know well enough how our current operating systems work! Even VM, which is much smaller than MVS, is not clearly organized around principles that allow one to understand all of the ramifications of a change. There certainly were principles in the designers' minds. They were highly experienced and excellent designers, but these principles were unrecorded and the system has been modified by people who were unaware of those design principles. They can only be guessed by reading code. If guessing wrong causes quick crashes you can guess again. Otherwise you just contribute to the gradual deterioration of a large system by enhancing its function or even fixing bugs. This is one explanation of the suspected phenomenon where fixing a bug in MVT tends to increase the number of bugs.

The systems that we build today are not reliable enough. The problems that we have ascribed to operating systems apply equally to large application programs. We must change software to add new function and we cannot tolerate resulting unreliability.

Some applications are hard or infeasible now. During the past ten years in the computer service business we have missed many opportunities because of the lack of operating system features. Let me give you a couple of examples:

Imagine an owner of a data base that provides a number of vital statistics for each company in the United States with sales larger then $50,000 per year.
This is very valuable data. It is very expensive to produce and maintain. Unrestricted access to this data commands a high price. There are a few very large users that will pay a high price for virtually unlimited access to the data. There are also a large number of users who cannot afford that price but will pay a much smaller price for limited views of the data. Thus less extensive data and summary information have a smaller market value but can be derived from the complete data base. (This is an example of segmentation of the data market.)

If we want to limit one user to seeing certain fields, IBM's IMS system can do this by limiting a program to these fields. This limited information may command a smaller price due to competition.

If we want to restrict another user to seeing records for companies with sales over $300,000 we must do something else [Figure A1].

Figure A1

An example of summary information is the total amount of sales of all companies with headquarters in a given zip code.

If the data user needs to write programs to process these totals by zip-code he will need to call a program that computes a total for a particular zip-code. Such a program needs to read the data base but its caller must be unable to do so. Current operating systems don't allow for such programs with different authority to call each other.
Here is a second lost business opportunity [Figure A2]. Imagine another company (which we call the owner) that creates a data base about chemical patents for its own use. Suppose that the company then decides that it is good business to sell access to this data to a competitor. The competitor has a program to examine entries in the data base. This competitor is vitally concerned that the nature of his searches be kept secret. In particular he wishes to keep this information from the owner of the data base. How do we get these two guys together?

Figure A2

If the search is executed on the data base owner's machine the owner might discover the nature of the competitor's searches.

The owner wants more money for the whole data base than the competitor can afford. Thus the search cannot be done on the user's machine. (Besides, the data is changing.)

The owner and his competitor agree to run on a computer utility (which is trusted by both). They agree on a price based on the number of entries read. The program that counts the records must be trusted by the owner to send the access count to him, and it must be trusted by the user not to send anything else (such as which records were read).

Alternatively, the owner and his competitor may agree that the competitor can provide a program that has access to the whole base and the price depends on the amount of the data returned to the competitor. Current operating systems do not provide a way to install such programs. We have proposed prices but no place to put the programs that administer them.

Tymshare wants to generate business in markets such as these, but finds we can not accomplish this with current operating systems. In our search for a solution we discerned a gap between operating system design in the laboratory and in the commercial world.

When we looked at these kinds of business opportunities and tried to isolate the underlying weaknesses in our current systems we came up with the following list of problem areas.

Authority

Once an operating system prevents a program from doing something that another program can do, some concept of authority has been introduced.

A general problem with current operating systems is that to call a program is to give all of your authority to that program. You must trust all of the programs that you call.

If I write a generally useful program and you use it, my program can delete all of your files when you call it because my program runs with your authority when you call it.

If I delete your files you will soon learn not to call my program. However if my program copies your files into a place where I can read them, you will be none the wiser.

If I know the format of your files I may subtly sabotage them for my own ends.

This is called the Trojan horse problem.

Another problem in most operating systems is that a user cannot install programs that have more authority than their caller. You can write a program that reads your proprietary data base but if you give me that program it can only read my files. If your program must read your files when I run it, you must give me access to your data base and then your program can read your data base when I call your program. Unfortunately I can then read your data base unsupervised by your program.

We have solved some of these problems in VM by putting these programs in different virtual machines but signalling between virtual machines is very expensive.

Pricing flexibility

Sellers of programs and data need more flexible ways of pricing their wares. We have already mentioned the problem of pricing access to data bases.

Copyrights do not protect against all (or even most) program theft and seem entirely inadequate for protecting data bases that change. Detecting and prosecuting violators is difficult, uncertain and seldom directly productive. If the violator is at the same time a valued customer it may be unprofitable to prosecute. (But the loss is still real.)

There are two main ways of pricing data bases currently:

One approach is to offer the base for $10,000 / month. The user can use the data on his own machine and promises not to copy it for others.
Another approach is to keep the data on the data owner's machine. The user may or may not he able to provide programs to access the data in this case. The user who needs to write a program that depends on proprietary data bases owned by different owners has no place to run his program.

Most data base owners will need to segment their market and sell different products at different prices to the different segments. In order to avoid maintaining and storing several related data bases the owner needs to dynamically derive these other products from his main base. The market will require pricing in relation to the degree of access to the data. If the data owner can install programs that have the authority to access the entire data base and report the degree of usage, the data base customer can then write programs that call the owner's programs but cannot read the complete data base directly.

Some data base systems (such as IMS) provide for limiting access to certain fields from certain programs with access defined~ by other programs, however, many other useful kinds of limitations can be provided:

Access programs can provide information derived (irreversibly) from the data base that can be sold at a smaller price.

One example is the totals by zip-code that we have mentioned.

Access programs can measure the degree of access and bill accordingly, perhaps merely counting accessed records.

Another example is the company names (without the gross revenue) of every company in a data base whose gross is more than a given constant.

Many other schemes have been discussed for deriving limited data bases for smaller prices.

It can even be arranged that the data base customer can write his own program if that program is run in a compartment where it is prevented from bypassing the billing program by storing the results where the customer can see it.

Reliability (in the midst of change)

Historically adding new functions has meant less reliability in old functions. We must not be forced to trade off extensibility for reliability. While we can't expect new software to be perfect we think that we should be able to keep it form clobbering the other smoothly running function that is serving real customers and producing revenue.

THE LARGEST PART OF THE COST OF NEW FUNCTION IN THESE SYSTEMS IS THE CRASHES CAUSED BY THE NEW SOFTWARE THAT IMPLEMENTS THE NEW FUNCTION.

We have already mentioned the problems of reliable operating systems. More generally we need to build applications that are more reliable than we know how to build with current methods. For this we need not only more reliable operating systems but ways of organizing application logic so as to isolate bugs in the application logic as well as those in the operating system logic.

In current system design an application is dependent on vast amounts of software working correctly. We call this software the security kernel of the application. Typically the great bulk of the security kernel is code that the application does not even functionally depend on. With current system design an on-line application can be killed by the malfunction of some component that the on-line application is unconcerned with. (If the job scheduling crashes the operating system crashes and the application goes along with it.) How do we isolate bugs so as to limit their effect to those areas that really depend on the code with the bug?

Another way an application may fail is to need a resource that is being monopolized by another program (such as channels in MVS or VM). IBM calls these problems "denial of resources" and does not promise to fix them.

Sharing

We need to share access to data and programs in a very flexible manner.

We need to share user data and code in core as well as on disk.

These requirements are well known and we will not discuss them further.

These may sound like problems that are unique to commercial timesharing companies. We think that they are not.

I understand that General Motors seriously considered a Multics system for its corporate financial programs because the respective divisions of GM were extremely concerned that their financial information not be available to other divisions.

The coincident pressures of new application function and privacy laws may impact a system's reliability severely if these problems are not solved.

1.2 What are we doing about it?

We are writing a completely new operating system designed to the state of the art as we understand it.

Gnosis provides a way to structure systems of programs. As in structured programming, it is possible to know relationships between the programs in the system without examining the programs but only the structure that connects them.

The ideas to which we are referring have been under development in the laboratory since 1955 when Jack Dennis wrote some papers that led to the design of the PDP-1 system in the electrical engineering lab at MIT. These systems are now generically known as ``Capability Based Operating Systems.'' These ideas are perhaps best described in an issue of Computing Surveys (December 1976, Vol. 8, #4) which is devoted to the issue of reliable software.

In designing a capability based operating system we have taken more ideas from new programming languages than from new operating systems. Programming languages have recently been designed to provide the same sort of separation for symbolic programs that we are trying to provide for machine language programs. We have taken more ideas from SIGPLAn than SIGOPS.

1.3 Gnosis is written for the IBM 370. Why the 370?

The 370 instruction set is less than ideal and the memory protection hardware is awkward, but the 370 architecture has become a standard:

There is widely available hardware, even better in the future, we expect. There are second sources. A wide range of CPU sizes is available. Gnosis is small enough to run on the smallest 370 compatible machine that IBM has (4331?) and seems appropriate for the largest (3033?). (But it has not yet been modified to use the sectorized disks.)
Another reason is the widely available software; we hope to do as well as CMS in running programs that were designed for OS. Programs that use simple OS facilities and are designed for arms-length relations with the OS can be moved to Gnosis with little or know modification (as with CMS). If these programs are stable there is perhaps little reason to convert them to use the Gnosis facilities. Compilers tend to be in this category. Most data base systems do not. Some application systems are in this category.
The third reason is that we do not manufacture hardware and so we cannot control its architecture. The 370 architecture has an inertia of its own. It will take years for Gnosis to fully bloom. We anticipate continued availability of the current architecture (perhaps enhanced).

Tymshare runs VM370 which serves as a marvelous womb for a new operating system.

1.4 This project has been going on for nearly three years now.

Most of the privileged code has been written and debugged.

A dozen modules, normally considered an integral part of the operating system, but here written in problem state, have been finished.

The CMS editor and an extensive machine language debugger (DOT) have been imported and specialized for Gnosis.

No module yet written for Gnosis has been larger than a (core) page of code. The editor and debugger which are much larger were imported.

1.5 Why is Tymshare doing all of this?

We see a developing market that we want to be strong in. This is the data market and to a lesser extent the program market.

It is a substantial job to write an operating system. This approach of investing a lot of effort in a long range project is the same approach that we used with Tymnet. That gamble paid off well for Tymshare. While the ideas of Tymnet had not been tested, the ideas of Gnosis have in large part been tested in the laboratory. We don't know whether we will succeed again but this is how we made the case to the corporation.

We believe that:

The advantages of the Gnosis architecture are sufficient to allow us to compete in some markets with organizations that apply several times the technical resources to a given application area that we do.
Some markets are vitally dependent on features such as those in Gnosis and thus will not exist without such systems.
There will be a large data market that can only be served by a centralized data market place.
The data market may be much larger if architectural features such as those in Gnosis are available for use there.
These design ideas will allow a small group to attack critical applications soon while at the same time creating the basis for a powerful and general system later.

We are going to describe these ideas in more detail now. Bill Frantz is going to describe how Gnosis addresses these problems.

2. Gnosis Design Concepts

Bill Frantz

What is Gnosis? How do we attack the problems of authority, pricing flexibility, reliability amid change, and sharing that Norm has described? Let's look at the key ideas.

2.1 Firewalls

The first key idea I wish to discuss is ``firewalls''. The familiar expression ``good fences make good neighbors'' applies to programs as well as to people [Figure B1].

Figure B1

Firewalls are barriers that prevent one section of an application from affecting other sections except through well defined interfaces. The most common kind of firewall is the kind provided by OS between jobs or by VM between virtual machines. Gnosis provides this kind of protection between parts of an application as well as between applications. Gnosis does this by allowing a program to be broken up into many pieces which we call ``domains'' [Figure B2]. Each domain has its own address space, registers, PSW, and authority. The authority of a domain is represented by tokens called ``capabilities''. These capabilities are kept in operating system space and manipulated by system calls.

Figure B2

Capabilities represent such diverse authority as the authority to expend resources, the authority to access a particular piece of data, and the authority to call a particular program. Domains communicate with each other by invoking their capabilities with system calls. The programs in the domains may be written in Fortran, Cobol, PL/I, Assembler, Pascal, Algol68 or any other language appropriate to the application.

2.1.1 Comparison with Other Systems

Let us compare Gnosis firewalls with the firewalls in other existing systems.

Figure B3

[Figure B3] The SVC instruction in OS lets a program efficiently call another program that has more authority than it has. Since, even with APF, there are only two authority levels (user and God), it is not possible to separate sections of an application from each other as can be done in Gnosis. There is no easy way to call a program with less authority than the calling program.

Figure B4

[Figure B4] The VMCF facility in VM/370 allows a program to invoke a function in a different virtual machine. This other virtual machine can have less, more, or in general different authority than the calling virtual machine. In this sense a virtual machine is like a Gnosis domain; however VM does not allow the passing of authority from one virtual machine to another. The analogous Gnosis facility is also fifty to one hundred times faster than VMCF.

2.1.2 What do these firewalls gain us?

Figure B5

[Figure B5] Let us take as an example a reservation system. The reservation system will have different clients, each with its own set of files for such things as accounting data. The files for each client must be protected from access by other clients because the clients are competitors and their data is proprietary. Each client may have several terminals in his office. The programs controlling these terminals should be isolated form each other to allow them to be separately scheduled, and for conceptual simplicity. Each of these clients accesses the main reservation database which must be updated in only certain controlled ways. Let us see how Gnosis lets us structure this application in new ways and the advantages that accrue when we do this.

Figure B6

[Figure B6] The first thing we gain is enhanced protection.

The main reservation database, along with the code that maintains it, can be protected from the code that accesses it. This forces all accesses to comply with the required procedures. This also protects the database from failures in the client code and protects the clients code from failures in the database code. This kind of assured protection greatly aids in locating the failing module when a failure occurs.

The second thing we gain is isolation which prevents entangling alliances between domains.

Each of the client databases is isolated from the code that performs various client oriented applications (e.g. accounting). This in turn is isolated from the code that drives the terminals. These domains can not read or write each other's memory. They can only communicate through the capabilities they hold to each other. This prevents the large number of implicit interactions between modules that we see in current monolithic applications.

Figure B7

[Figure B7] In CICS the implicit, but undocumented, interactions between modules allow a relatively unimportant transaction with a bug in it to do a wild store and take down the whole CICS region. This affects those transactions on which the organization is vitally dependent. In Gnosis, on the other hand, the transaction processing modules can be written to execute in different domains from each other and from the general support modules. This will prevent wild stores from affecting the whole application.

The third thing we gain is auditability by external monitors [Figure B8].

Figure B8

When the external interfaces of a module, application, or even an application complex are well defined, an auditor can go in and monitor the information passing over the interfaces and have some assurance that the programs are behaving as they do in the real production environment. In our example, an auditor might want to monitor the transactions to one client's database. An authorized use can insert a monitor module between the client's application software and the database domain.

Another auditing benefit is that when many small modules are coded instead of a few large ones it is much easier to verify, by either formal or informal means, that each module does what it is supposed to do. Gnosis enforces the interface between domains making clear what is part of the interface and what is not.

The fourth thing that we gain from this structure is better machine resource monitoring.

When it comes time to attempt to tune this sample application for performance, Gnosis allows machine resource usage to be monitored at the module level. This provides a very powerful tool for performance auditing of the system.

The fifth thing that we gain is conceptual simplicity. This means that you only need to know the external specifications [Figure B9].

Figure B9

A very successful example of conceptual simplicity is the 370 microcode. The microcode is complex internally, but the interface definition is simple. The user of the 370 does not need to know about how the microcode implements the instruction set, he only needs to know the external specifications in Principles of Operation, a much easier task. Gnosis programs have the same feature. They can be either simple like the model 115 microcode, or complex like the 3033 microcode. It doesn't matter to the user of the program; he sees only the externally defined interface.

2.2. Hiding Things

The second important idea in Gnosis that I want to talk about is ``hiding things.''

In Gnosis it is possible to hide the details of the implementation of certain functions from the programs that use those functions.

Figure B10

[Figure B10] Hiding things allows us to reduce the conceptual complexity of certain functions. In our example the complexity of the database can be hidden behind an interface that just provides for retrieval, update, insertion, and deletion of reservations. This allows the programmer who is manipulating the data to ignore all the many facets of the database, for example performance options, that are not relevant to his current problem.

Figure B11

Hiding things allows the easy upgrading of functions in a system. A function can start off as a scaffold. This has already proven useful in the Gnosis development project. We were able to move the CMS editor to Gnosis before there was a file system for it by building a scaffold module that met the necessary interface [Figure B11]. This allowed us to debug the editor module, including its file system calls, before we could store files. We can later develop the real file system code. After the basic file system is running we can consider an extension that allows files to be stored in compressed format. This can be a front end module which meets the same interface as the file system but compresses the data going into the file system and expands it again coming out. Note that this enhancement requires no changes to either the editor or the file system.

Figure B12

Hiding things can be used to allow debugging or monitoring interfaces [Figure B12]. We saw earlier how a module can be inserted between two domains to monitor all the information that goes back and forth on the interface. The programs involved can not notice that this has been done and so can not change their behavior. This can be a very powerful tool for program audit or debugging.

Figure B13

Hiding things is useful for implementing a distributed processing system. In Gnosis distributed systems can be implemented by a program that exports capabilities over TYMNET to other computers on the network [Figure B13]. In our example there is no need for the reservation database to be on the same machine as the client's processing programs. This is hidden in such a way that a program need not be aware whether another program it calls is on the same computer of another, geographically separate, computer.

Hiding things is normally implemented in Gnosis by a technique called ``procedural access to data or capabilities.'' This is a technique where a separate program is called to use the data and/or capabilities. When it is used with data it allows the source and storage format of the data to vary without impacting the program that is using the data.

Figure B14

[Figure B14] In our example we could allow programs to access many different kinds of reservation databases, each with its own access procedures and protocols. These protocols would be translated to a common protocol by an interface module. The use of this interface module would be transparent to the client's processing programs.

2.3. User Replaceable Units

The last key idea I with to discuss today is ``user replaceable units.'' This idea leads directly to the end of monolithic operating systems.

Figure B15

Operating systems have had the problem of trying to control size and complexity [Figure B15]. MVS, for example, has divided the privileged code into seven protection keys as an attempt to isolate parts of the operating system from each other. VM has divided its operating system function into the protected security kernel which is called CP, and the unprotected function, which includes directories and virtual storage management, which is called CMS.

Gnosis is built with a small amount of supervisor state code, about 50K. This kernel provides, among other things, the virtual memory and rudimentary dispatching, and it enforces the definitions of domains and capabilities. All the rest of the system is built out of domains, many more than the seven protection key domains in MVS. In fact all the parts of the system that we are talking about today are implemented in domains. If you don't like the function of the domains provided with the system you can replace them. With proper authority, you can replace such things as command language interpreters and spooling systems with new or different ones. this may be done for one user or for groups of users. It may be done either by users or by systems people. there may be several versions of, for example, command languages on one system to serve the different needs of groups such as secretaries and programmers.

The end of monolithic operating systems leads naturally to the idea of the end of monolithic applications [Figure B16].

Figure B16

Gnosis will support applications designed to use the OS interfaces in a manner similar to the way CMS does. We expect the conversion process to be very similar to an OS to CMS conversion. When a monolithic application is first converted to Gnosis it will not automatically gain any of the advantages we have talked about. It will run as it ran under OS or CMS. If it is desired the application may later be broken into several domains. Whether the application is split into multiple domains or not, it can be treated as a single ``black box'' by other domains in the system.

For new applications Gnosis offers application programmers new ways to solve the problems of size and complexity. Traditional applications have many holes in the walls between modules. While they may look simple, the implicit interactions mean that you have to read every line of code in order to find what interfaces were actually implemented. Under Gnosis the same tools that are used to control the size and complexity of the operating system, debug the operating system, and monitor the performance of the operating system are also available to the application programmer. As we have seen from our example, by coding applications in many small domains that are protected from each other it is much easier to add new function or to enhance existing function without introducing bugs in remote areas of the application.

I would like now to introduce Charlie Landau, who will describe in more detail how domains are organized and how they communicate with each other.

3. The Implementation of Gnosis

Charlie Landau

I want to describe in more detail now the mechanisms in Gnosis that we use to implement the features you've just heard about.

3.1. Domains

The most important basic concept in Gnosis is that of a domain. The word comes from the phrase ``protection domain.'' A domain is the thing which is surrounded by firewalls. All the programs I'm going to talk about run in domains.

The important thing to remember about a domain is that it has its own PSW and registers, its own address space, and its own set of capabilities.

3.2. Capabilities

Capabilities are the tokens of authority. They are the doors in the firewalls. Owning a capability gives a domain the right to do something.

The principal mechanism for using capabilities is called ``invocation.'' When a domain executes a certain SVC, one of the capabilities owned by the domain is invoked. That means that something is done with the capability. What is done depends on the capability and on the parameters passed by the invoking domain. The domain can pass as parameters some data and a few capabilities.

Note that a domain has no way to do anything with a capability unless it owns it. It can't even name a capability it doesn't own.

Note also that it makes no difference what domain invokes the capability. A capability is the same in the hands of any domain..

Thus, owning a capability not only gives you the right to use the capability, it gives you the power to confer that right on others by giving them the capability. Every capability can be copied.

The most interesting type of capability is the entry capability [Figure C1].

Figure C1

The circles are domains, and the arrow represents an entry capability.

An entry capability points to a domain. An entry capability to a domain is like ``call only'' access to the domain. The caller and callee have different address spaces and different sets of capabilities. The called domain may have less authority than its caller or it may have more authority, or the authorities of the two programs may be unrelated.

When an entry capability is invoked, that domain is started up and it is given the parameters that were passed by the caller. By this mechanism, ``what is done'' when the capability is invoked is under control of a program and therefore can be completely general.

Any typical user can write programs that are run when entry capabilities are invoked. The mechanism is not restricted to system programmers.

When a domain is called via an entry capability, it gains the authority to return to its caller. Naturally, this authority is represented by a capability, called an exit capability. Exit capabilities can be stored, passed, and shared like any other capability. There is no call/return stack in the kernel.

Bill mentioned some applications, such as auditing and debugging, which involve hiding things from the user of a capability. This is possible because there is no way to identify a capability that you own. A program knows what to expect of its capabilities only because it knows how it got them. What is actually on the other side of the capability is completely hidden from the user of the capability.

This makes it possible to substitute one capability for another without having any effect on the program using the capability. This is one reason why almost any part of the system can be replaced by a properly authorized user.

3.3. Firewalls on implicit actions

We have seen how capabilities can be used to control the explicit interactions of a program with its environment. There are several ways in which a program interacts implicitly with its environment. In Gnosis there are firewalls on all these interactions as well, and the interactions can be controlled by the program designer using capabilities. There are three classes of implicit interactions.

3.3.1. Program Checks

The first class is that of program checks.

When a program causes a program check, something happens. What happens? Well, you usually don't find out what happens by reading the program listing in the vicinity of the trapping instruction. The program takes some action that is implicit.

Figure C2

[Figure C2] In Gnosis, each domain holds a capability which is designated the ``domain keeper.'' When a program causes a program check, the domain keeper is invoked and is passed information relating to the problem. Because the domain keeper can be an entry capability to a domain which is executing a user-written program, any recovery algorithm can be implemented. Probably most domain keepers will be entry capabilities to a debugger.

3.3.2. Resource Exhaustion

The second class of implicit interactions has to do with usage of certain resources.

As a program runs it is using resources such as CPU time.

Because the policies for allocating such resources will probably vary from time to time and from application to application. Gnosis provides a mechanism for implementing such policies using capabilities.

For example, there could be a policy to give a particular application one hour of CPU time between the hours of 5 PM and 9 PM every Friday.

In Gnosis this could be implemented this way [Figure C3].

Figure C3

A domain has a capability called a meter capability which governs its use of CPU time. When the meter is turned on, the domain can run. When the meter is turned off, the domain is suspended until the meter is turned on again. Turning a meter off stops a domain non-destructively. That is, the domain can be stopped and restarted without influencing the logic inside tie domain.

The meter keeper is the program that has the authority to throw the switch. This program can turn the meter on during the times it wants to allow the domain to run.

There is also a counter built into the meter that turns the meter off after a certain amount of CPU time has been used. The meter keeper can set this to limit the amount of resources used by the domain.

The meter capability held by the domain allows it to use resources from the meter but does not give it the authority to change the limit or throw the switch.

3.3.3. Memory References

The third class of implicit interactions is memory references.

A domain owns a special capability which defines its memory space. This capability will be the root of a tree of capabilities. At the leaves of the tree are capabilities to pages.

Let me describe what can be done with memory trees by giving a couple of examples of how they can be used.

The first is an example of sharing memory.

Consider a database system. It has several clients, each with his own database. Each client can be making several transactions at once.

Here is how the memory structure for such a system might look [Figure C4].

Figure C4

We have a domain for each transaction that could be going on at once.

The arrow from domain 1 is the capability which defines that domain's address space.

That arrow points to a structure which defines the address space as consisting of two parts. one part is a collection of pages labeled 'P1'. The information in P1 is private to domain 1 because no other domain has access to it. Domain 1 could store there the information pertaining to the transaction it was processing. Domain 2 likewise has its own private storage, P2.

The other part of domain 1's address space is shared with domain 2. It in turn consists of two parts. The first part consists of the information in database A. Domains 1 and 2 cooperate to access and update this information.

The last part of the address space is the code of the database system, which is shared among all four domains. We assume that the code is reentrant. Access to the code is restricted to read-only. This eliminates the possibility of a bug, exercised by database B, clobbering code which would affect database A.

The second example illustrates what we call a ``virtual copy.''

Let's suppose now that we have an application which for some reason was not written in reentrant code. If we have several people running their own copies of the application, we must make several copies of the code, even though in many cases only part of the code is impure. In other words, we may in fact be able to share parts of the code, but we don't want to examine all the code to find out which parts are pure and which get modified.

We can solve this problem with the following structure. 1 and 2 are two domains using the impure code. The rectangles define a memory map that gives each domain read-only access to the code [Figure C5].

Figure C5

The triangle at the bottom of the map has a special meaning. It means that if any access violation occurs through that map, then the accessing domain will not be given a program check. Rather, the capability leading to the domain labeled ``segment keeper'' is invoked, and is passed information on the access violation.

Now let's observe what happens when domain 1 tries to store into the code. Because its access is read-only, an access violation occurs. The segment keeper is called.

The memory map for domain 2 did not get modified, nor was the code modified, so domain 2 is unaffected. All the remaining code is still shared.

The segment keeper that performs the virtual copy function is a domain that we supply with Gnosis. Of course, users can write their own segment keepers to perform different functions.

3.4. The Programmer's View

So far I've talked about the technical details of Gnosis as viewed by a program running on the system. I'd like now to mention some aspects of Gnosis from the point of view of the programmer who is using the system. Our philosophy has been to keep things simple for the user, and here are some ways in which we do that.

3.4.1. Single-Level Store

In Gnosis all data is organized in what IBM would call a single level store. You use the same means for storing two bytes if data that you use for storing two megabytes of data - namely, you put it in a virtual memory. In fact, there is a mechanism for constructing virtual memories larger than you can address at once with the hardware, so even the largest collections of data can use this one mechanism.

3.4.2. Perpetual Programs

Another interesting feature of Gnosis is perpetual programs.

In most systems, after a crash it suffices to save the file system structure.

In Gnosis, there is no file directory that is an obvious candidate for saving, because there is no single file system. As often as not, the structures that are important to save are built and maintained by user programs.

Our approach to deciding what to save after a crash is very simple. We save everything. We save all data, all capabilities, the complete state of all domains, everything.

This means that a program can run forever - through system crashes, through scheduled maintenance, through upgrades of the operating system or even the CPU.

The effect of this is that we have provided in essence a single level store in the time scale. You use exactly the same technique to install a program whether it is going to be run once and then deleted, or whether you want it to still be running a year from now.

Very briefly, the way we save everything is to take a checkpoint, or snapshot, of the entire system every few minutes. When recovering after a crash, everything is restored to its state as of the last checkpoint. The restart is transparent to most user programs.

3.4.3. Interrupts

Another way we keep things simple for the user is in the area of interrupts.

Interrupts are a source of complexity in many applications, such as the reservation system mentioned earlier. Any structured programming enthusiast will tell you that if GO TO's are bad, interrupts are worse. If you have worked in this area you have probably verified this from your own experience.

In Gnosis we eliminate Interrupts completely [Figure C6].

Figure C6

We have a rule that a domain can be doing only one thing at a time. if it is waiting for input from the terminal, then it is not sending output to the printer and it is not computing. When the input from the terminal arrives, the domain will go on to the next instruction of its program, and not until.

What do we do in place of interrupts? If an application needs to be doing several things at once, it is programmed using several domains. Each domain can be running its own parallel process. Domains are cheap, and creating parallel processes is cheap. The domains communicate with each other using capabilities and shared memory. In this way the interactions of the parts of the application become explicit and that makes the application easier to understand.

3.4.4. Device Independence

In Gnosis we encourage true ``device'' independence.

Because It is so easy to hide things behind entry capabilities in Gnosis, we make a practice of shielding the user from the idiosyncracies of hardware devices. For example, there is a single domain which owns the capability to operate the network interface. That domain provides a standardized interface to the programs that call it.

3.5. Summary

I hope by now we've given you the idea that Gnosis is a different operating system. I want to emphasize the two main points that make Gnosis different.

Programs under Gnosis are built out of protection domains with firewalls between them. Domains are small, simple, and cheap.

Domains communicate through doors in the firewalls, called capabilities. Capabilities are a simple, uniform, efficient means of representing authority.

Thank you all. Now, before I open the floor for questions, I would like to add a few comments on how and why we think four people can build an operating system. When I first joined the project I thought that it was a pretty ambitious project. There are several significant factors which make it possible.

First, and foremost, the Gnosis concept of distinct domains without implicit interactions between them results in simpler programs. Because of this, we have had to spend a great deal of time designing the interfaces between these domains to insure that adequate function exists in each; but perhaps even that is a benefit since we will know exactly how the system goes together. The basic design of Gnosis will ensure that no compromises to the design occur during the implementation.
Second, because individual components are completely isolated from each other, except for the prescribed interfaces, it is a simple matter to implement each domain independently of the remainder of the operating system. Very little scaffolding is required. We went to install the CMS editor in Gnosis and noted all of the things we thought ought to be there as co-requisites, things like a command language to call the editor, a file system, a loader, catalog facilities, and so on. To our surprise, we discovered that we didn't need any of those facilities. We could just connect the editor directly to the terminal handler and test it. This made development go much quicker.
Third, we have been able to coexist with, and take advantage of, CMS during the early going. We expect to use CMS services for quite some while for compiling programs and so forth. Thus our ``critical mass'' of code is very much smaller that it would otherwise be.
Fourth, the basic design of Gnosis allows us to write most of the operating system as user code, which means we will be able to eliminate a lot of duplication of effort in terms of testing tools, etc. The system will also be much simpler because all of the details of the hardware are masked in the kernel. Consequently no domain programmer need ever deal with them, which makes the domains simpler, and also greatly reduces the impact of any hardware changes.
We have tended to follow the advice of Fred Brooks in the Mythical Man-month, where he suggests ``be prepared to throw the first one away.'' We have implemented each domain with the simplest possible algorithms in order to test the design. Later we will have to discard many of these domains and rewrite them with high performance algorithms which obey the same interface specifications. Most of these first attempt domains can be implemented In a matter of days.
Last, but certainly not least, we have a relatively high technology ``office of the future'' system called AUGMENT which we are using to keep all of our design notes as well as our user documentation. The use of this system will save us a significant amount of labor as we develop a user community over the next several years.

The combination of these facilities has made it possible for us to implement a great deal of function very quickly. As Norm mentioned earlier, we have only just started running our first domains recently. Yet we expect to be able to have a significant on-line database application operational within a year.

That concludes the formal part of this presentation. I will now open the floor for questions.

Addendum to the HTML Version

The original version of this document was created using Augment, a hypertext system built by Doug Engelbart while at Tymshare. Augment, and its predecessor NLS (also by Engelbart), were the very first on-re-keying line hypertext systems. Unfortunately, the electronic original of this document has been lost. This text was re-keyed from a paper copy. I attempted to correct spelling errors in the original as I found them, so any typos you find in this version are probably my fault.

In the course of re-keying the document, it became painfully clear that Augment's use of indentation did not translate well into HTML. Eventually, I gave up on it. I have added headings, converted things to bulleted and numbered lists in some places, and removed some paragraph breaks for the sake of better flow. The end result conveys the same content, but it reads more like a paper document than like the Augment version. Since the format has changed drastically in any case, I've also inlined the figures, which were shown as slides at the presentation. The figures were scanned from an nth generation photocopy of the presentation materials. If some kind soul feels like regenerating them I'd be delighted to adopt clearer versions.

-- shap

Addendum to the Addendum

I've cleaned up the graphics and style-sheeted the whole thing into a more readable font. I've also spell-checked and generally did an editing pass for readability. It's still the same document, but it should look much nicer now.

--Matt