Groping for Marshaling

Contrasting Marshaling and Argument Collection

I presume here that the problem is to gather some data that is conveniently described in some programming language and put it on a bit serial communications link so that a peer program can take those bits and make them conveniently accessible again in the same or similar language.

This is exactly what a compiler does for call sites to routines whose parameters are declared as call by value. This is not part of the specification of the language but it is necessarily specified if code from one compiler is to call code from another compiler. These days processor manuals often specify how C programs must pass values.

The languages C and C++ have no adequate string type that can be passed by value. The language PL/I Has a very good string type but lacks even C’s structure types. Algol 68 has both but the compilers are exceedingly rare. I have not seen documents describing how values are passed.

PL/I has some useful patterns in passing arguments which I will describe here. Call sites and parameter access sites are compiled in view of the same parameter declarations. A two dimensional array of integers may be a parameter. Since the array bounds need not be specified in the parameter declarations the implementers decided to always explicitly produce the array bounds at each call site, even when the array bounds were specified in the declarations. A parameter that was an array always declared how many dimensions and type of the base element. Only the information missing from the declaration was made available to the compiled code of the routine. This information was called the dope vector. The dope vector could be constructed at compile time in case of arrays with constant dimensions. There were also routines that took arbitrary types of values as arguments. Such routines could not be defined in PL/I but calls to them could be. IBM supplied useful routines, such as Print with such parameters and documented how to write assembler routines with such parameters. Call sites were compiled with coded information describing the real types of the arguments. Dope vectors would be provided as necessary in this case as well.

The PL/I dope vector for an n dimensional array on the 360 series was:

24 bits: Virtual origin of an n dimensional array (address of A[0, 0, … 0] even when 0 was invalid as a subscript),
n 32 bit strides,
n pairs of 16 bit signed numbers for upper and lower bounds of respective subscripts.

See page 169.

I lied a bit above; PL/I cannot pass arrays by value, only by reference. Dope vectors were also by reference as well as the coded types. Algol68 does pass arrays by values and the routine learns the lengths. Strings are arrays of characters. Algol 68 also provides typed unions and a type safe way to access such unions.

In all of these cases calls by value put the passed data on the stack contiguously just as is necessary for the wire. (Usually part is left in registers but stack space is still allocated.) What is explicit (occupies space on the stack) and what is known by the parameter accessing sites seems like a candidate model for marshaling data and accessing it.

The new concern is malicious call sites. For the compiler it was already necessary to trust the caller; he is in your address space!