Agorics Inc.- Proving Properties of Instruction-Level Programs

Proving Properties of
Instruction-Level Programs

Preface

Copyright 1995 Agorics, Inc.

While Agorics, Inc. reserves all copyrights, the information in this document is to be considered public information and is available for use without restriction. Agorics disclaims any warranty as to the utility, accuracy or effectiveness of the information contained in this document and specifically disclaims any liability for consequential damages that may result directly or indirectly from use of the information in this document.

DRAFT

Many programming languages reduce each module of program code to a set of assembly language-level instructions (bytecodes or native assembler instructions). When the time comes to load a given module, its instructions must be checked to make sure they will not violate certain properties of the higher-level language, such as memory boundaries. This paper presents a technique for implementing such load-time checking, as exemplified by the Bouncer; component of Joule.

While Joule applies this bouncing technique to bytecodes, the mechanism is as applicable to other software-implemented instruction sets and to native machine instructions. The technique is equally appropriate for strongly typed languages such as Ada, Pascal, Modula 3, C, and C++, and for latently typed languages such as Smalltalk, Scheme, and Joule.

The "bouncing" technique described here is not new, but is rather derived straightforwardly from previous work, including:

type state checking, by which a variable's value can be statically determined as unreferenced and therefore deletable.
abstract interpretation, in which properties of programs (deadlock freedom, computational complexity, etc.) are computed by interpreting instructions using only abstract properties of the actual data values.
a machine-code verifier created by L. Peter Deutsch for the 940 operating system to verify security properties of instruction-level programs.

This document contains the following sections and subsections:

1.1. About Modules
1.2. The Bouncer as Load-Time Checker
          1.2.1. Bouncer Rules
          1.2.2. Bouncer State
          1.2.3. Control-Flow Checking
          1.2.4. Representation Type Checking
1.3. Summary
1.4. Acknowledgments

1.1 About Modules

In this paper, a "module" is the smallest unit of compiled code that can be loaded into a runtime environment. In the Joule language, this unit of code corresponds to the Module form. In Joule, a module of code is reduced to bytecodes by the compiler.

1.2 The Bouncer as Load-Time Checker

The Bouncer is a program that statically checks each module before it is loaded, rejecting any module that does not pass specific static checks. The name "Bouncer" is inspired by the doorman of a nightclub who keeps out unsavory characters.

1.2.1 Bouncer Rules

The Bouncer maintains internal state which changes as it processes each instruction. First, it checks each instruction to make sure the preconditions for that instruction have been established by prior instructions in the module. Then it alters its state to reflect the mutating impact of the instruction, and finally tests to make sure its own invariants are intact.

Thus, for each type of instruction, a set of rules have been specified for comparing the state of the abstract machine against known requirements. The rules for checking the preconditions of an instruction are called its "test" rules. The rules for checking the state-mutating impact of the instruction are called its "mutate" rules.

In this way, the Bouncer protects Joule's memory safety and security. A module that fails to satisfy the rules is rejected, or "bounced." Acceptable code is passed by the Bouncer to be interpreted or translated by other components of the Joule abstract machine.

The Bouncer does not attempt to prove the correctness of every possible correct program, which is theoretically impossible. Rather, it proves a verifiable subset of programs and rejects all others. In this respect, it is similar to type checking in languages such as Pascal. No loss of expressive power results from this constriction of possibilities, though the set of alternative implementations of a given solution may be reduced.

1.2.2 Bouncer State

The Bouncer state machine has a set of static registers. These exist at bounce time, not at run time; they represent the state of what the Bouncer knows about the actions of the program code before and after each instruction. Some of the static Bouncer registers are:

pc--The current value of the Joule abstract machine's program counter.
numLocals--The number of local variables allocated in the current activation frame.
writable[numLocals]--An array of booleans indicating, for each local variable, whether it is legal to assign it.
types[numLocals]--An array of static types, providing knowledge about the runtime representation type ("uninitialized", "tagged pointer", "facet pointer", etc.) of each local variable of the current Activation. The actual local variables are in the array locals[numLocals], which exist only at run time, not at bounce time.

In these terms, the instruction

OP_MOVE (3, 5)

means to move the value in locals[5] into the position locals[3]. It is subject to this set of rules:

Test writable[3]

Mutate types[3] := types[5];

This tests that local variable number three can be assigned to, and specifies how to modify the typestate used to test the next instruction. All of the array indexing is implicitly bounds checked, so if numLocals is less than 6, this load attempt is also bounced.

1.2.3 Control-Flow Checking

First, the Bouncer derives a model of the state of the abstract machine before and after each instruction in the module being checked that is consistent with all forks and joins called for by the instructions.

1.2.3.1 Forking

A program fork results from the evaluation of some Boolean expression, which may be true, false, or a pointer to some other value.

The instruction that implements forking,

OP_TEST_IF (cond, onFalse, onFail)

falls through to the next instruction if locals[cond] is primitively true, branches to onFalse if locals[cond] is primitively false, and branches to onFail otherwise. In the Bouncer, this means that we propagate the current Bouncer state to all three locations.

1.2.3.2 Joining

Following a fork, it may be necessary to interpret the re-joining of the two execution paths into a single consistent state.

Joining

Figure 1: Joining -- L and R represent the two (possibly inconsistent) prior states of the Bouncer; J represents the resulting state.

Treating this join as if it were an instruction, the Bouncer applies these rules (where the subscripts L and R designate the static values in each of the two prior states):

Test   numLocalsL== numLocalsR

Mutate for (i = 0; i < numLocals; i++) {
                 if (writableL[i] == writableR[i]) {  
                       writableJ[i] := writableL[i];

                 } else {
                       writableJ[i] := FALSE;
                 }

                 if (typesL[i] == typesR[i]) {
                       typesJ[i] := typesL[i];
                 } else {
                       typesJ[i] := VOID;

                 } 
              }

That is, if the two prior states agree on the type of a slot, the slot is considered to be of that type in the state representation J; if not, the slot is considered uninitialized, and cannot be read by code following the join until it is assigned.

1.2.3.3 Looping

If any loops occur in the module, the instructions within the loop again need to be checked against the Bouncer state as it exists at the time of the backward jump. If multiple possible states loop back to the same point, they are treated as joins.

The proliferation of states resulting from multiple forks and joins is controlled because inconsistent slots collapse to "uninitialized" when joined. This means that, on any iteration through the module, there will be strictly fewer joins required than on the previous iteration, because any inconsistencies previously encountered have collapsed the affected slots to "uninitialized".

The Bouncer inevitably either rejects the program for being unsafe or converges to a single consistent model describing the state of the Joule abstract machine prior to each instruction in the module. This representation must converge to a fixed-point state of the Bouncer because the collapse of inconsistent slots to "uninitialized" is monotonic, and the number of slots is finite.

1.2.4 Representation Type Checking

There are two Bouncer phases. After the Bouncer has derived a single consistent state history of the Joule abstract machine based on the control-flow checking, the Bouncer then passes through the instructions of the module and applies the test and mutate rules to each, based on the state arrived at in the control-flow check for each instruction. If any instruction generates an unacceptable state of the Bouncer by failing to conform to the rules, the module is rejected and an exception is signaled.

1.3 Summary

The smallest unit of program code that can be loaded is called a module. Each module consists of a set of assembly-level instructions, which may be either bytecodes (as in Joule) or native instructions.

When a module is loaded, its instructions must be checked to make sure they will not violate certain properties of the higher-level language, such as memory boundaries. This paper presented a technique for implementing such load-time checking, using the Bouncer component of the Joule abstract machine as an example.

While Joule applies this bouncing technique to bytecodes, the mechanism is equally applicable to native assembler instructions. The technique can be extended to verify more sophisticated properties, such as statically known reference counts, static array bounds checking, type structure for objects in the heap, and real-time constraints.

1.4 Acknowledgments

Future drafts will include the many references appropriate to credit the early development of these ideas.

[ Agorics Home Page ] [ Technical Library ] [ Info on Joule ]

Last updated: 20 June, 2001