Notes on OCaml

A smattering of OCaml

I am currently intrigued by OCaml. Their approach to typing is promising; they call it type inferencing. I have seen and understood fragmentary stories about how to use this to gain the combined benefits of static typing (efficiency) and latent typing (expressiveness). I have seen enough to be intrigued, but not yet convinced. (As of 2010 I am convinced that there is a significant application realm where OCaml has a substantial advantage over other languages that I know.)

First, a short comparison with Scheme. Scheme is a very much different language and the Scheme 5 report can be read by an intelligent person with scarcely any computer background. The OCaml definition requires many small concepts that are anything but elementary and are not explained in the official document. I think this is a property of the language and not the skill of the report authors. I think a more elementary description of OCaml is possible but that would serve a different audience. I would like to see how big it would be. I suspect that Haskel is worse on this spectrum. See this for more comparison.

OCaml gives up several venerable language features. Type coercion and polymorphic infix operators and functions are mostly gone. (= < <= etc. remain.) Addition requires different infix operators for int and float. “3 + 4.2” has been valid since FORTRAN I. All expression contexts in OCaml are weak, except at the read-eval-print loop. This may be a small price to pay for the benefits.

Whereas an informal intuition of Scheme’s dynamic types suffices for much Scheme programming, a more formal understanding of types and Caml’s language for expressing them is necessary for Caml programming. A conversation between the programmer and compiler is typically necessary before the compiler will accept a program and this conversation is mostly about types in the meta-language about types.

Language Definition

There is a considerable chasm between the many good OCaml tutorials on the web, and what I take to be the official language definition which uses much terminology that is described in Benjamin Pierce’s book “Types and Programming Languages”. These tutorials will allow many to write useful OCaml programs but many mysteries will remain. An implementer of a new OCaml compiler will not learn here what type inferences are to be supported. I think that the official language definition assumes concepts presented on a few advanced academic courses. This is OK except that there should be a reference to a source of such information. Some of my programs would not have been possible without having read Pierce’s book.

Two months have passed and I see more strengths and weaknesses of OCaml. I feel more certain now that I have not found anything like a complete language definition. (Indeed see this.) I think that a brief definition is possible of the meaning of accepted programs. My recent reading of Benjamin Pierce’s book “Types and Programming Languages” confirms my suspicion that there is a broad variety to the possible sets of acceptable programs. There is a gamut of type inferencing schemes. I don’t know yet which set Inria’s compiler accepts. If you have a compiler to interact with as you write your programs, this is OK for many purposes. It is not OK for some design processes where you plan a large body of code based on a misunderstanding of what the compiler will accept. One tactic is to interactively test the types you plan for your project. The OCaml REPL is good for this. The remaining uncertainty of inferencing rules can supposedly be accommodated by including explicit types when the compiler fails to infer. Pierce illustrates this with the marvelous example of the fix operator:
fix = λf. (λx. f (λy. x x y)) (λx. f (λy. x x y))
which provides for recursion but in a way that OCaml’s type inferencing cannot grok. Type inferencing is a sort of proof seeking which fails safely. It does not suffice to provide a correct program; OCaml requires one that it can understand. Here is a good description of simple type inferencing. fix works in Scheme. My division algebras code had to be rewritten to accommodate such limitations as well, but in that case it was easy and OCaml’s infix operators make the code look more nearly like the original 19th century notation. This more commonly useful program may be impossible in OCaml.

Another worry here is longevity of code. Will future compilers accept code that is accepted by today’s compilers? One possible solution to this conundrum is to provide trusting compilers, which either check dynamically or don’t check at all. But then you must fall back on dynamic typing which is slow and an entirely different technology. Another comfort is that today’s compilers are open source.

Scope

let f x y = x + y/2 in (f 3 4)*(f 4 6) is an expression that denotes the number 35. Two scopes appear briefly. f, x and y are the identifiers in question and those to the left of the = are the defining occurrences and to the right are the applied occurrences. The background colors match the scopes of these two identifier classes. I believe that the manual does not say this presumably because it is too obvious. I think that the manual should say even obvious things. When many “let ... in ...” patterns are nested in a tree scopes become particularly confusing. Scheme’s parentheses together with a simple editor that is parentheses savvy wins here in my experience. I have not found an OCaml savvy editor.

Performance

There are plausible rumors of good performance. Here is one point.

I get the feeling that OCaml may be suitable for large software; their concept of signature seems more principled and uniform than C’s prototypes and certainly the type system is stronger.

It would be good to have a tool that reports inferred types for a particular program. The read-eval-print loop reports on types, but that tool is not easily used to study a program packaged for production.

Syntax

OCaml’s syntax needs few parentheses, in contrast to Scheme, but this means that you must learn much more of the language before you can even grossly parse a program. Indeed you must parse grossly before you can consult the reference manual to find meaning. In Scheme a parenthesis savvy editor can get you to the equivalent of an abstract syntax tree—not so in OCaml. You can get back part way to the Scheme advantage by including parentheses in an OCaml program. It is still unclear which parentheses change the meaning of the program but at least there is some agreement between programmer and compiler on programs with extra parentheses. This is a parenthesized program as debugged and this is the same program with only necessary parentheses. In this case removing any of the necessary parentheses caused syntax errors rather than semantic changes.
I don’t know yet how to find statement boundaries; statements seem to be self terminating rather than terminated or separated by semicolons. (Update: There are a few keywords that inaugurate new ‘statements’ which they call ‘definitions’ here.)

I have a problem with both Scheme and OCaml. There are two textual styles of code. One binds several values in the top level and is thus easy to explore using a REPL. This results in excessive scope for identifiers however. This thru this are a progression of modification from the verbose to the concise. I understand that module systems are supposed to ameliorate this. I dislike this extra conceptual overhead but I have not seen a solution to this problem.

Unification seems fundamental to OCaml. I think it makes a large class of code more readable. The construct match pattern with alternatives is perhaps mere sugar, but type inferencing to deduce the signature of a function is not even syntax, let alone syntactic sugar. I think there is shared mechanism here.

Koans

module Xx = struct module = Z struct end end;;

# let a b = match b with (x, y) -> 2 | [p; q; r] -> 3;;
This pattern matches values of type 'a list
but is here used to match values of type 'b * 'c

# [3, 5; 6, 8];;
- : (int * int) list = [(3, 5); (6, 8)]
# type 'p btree = Empty | Node of 'p * 'p btree * 'p btree;;
type 'a btree = Empty | Node of 'a * 'a btree * 'a btree

The “'p” in the type definition is a mere transient parameter and not part of the type any more than 'x' is part of the Scheme function
(lambda (x) (cons 3 x)).

# type xxx = Empty | Intg of int;;
type xxx = Empty | Intg of int
# Empty;;
- : xxx = Empty
# type 'p btree = Empty | Node of 'p * 'p btree * 'p btree;;
type 'a btree = Empty | Node of 'a * 'a btree * 'a btree
# Empty;;
- : 'a btree = Empty

The identifier “Empty” is silently swept away by a new type definition. Maybe batch compilation has warnings. In practice such identifiers can be localized to OCaml modules which reduces symbol pollution.

module Q = struct
  type a = A | B
    let z = B
  type b = B | C
  end;; 
# Q.z;;
- : Q.a = Q.B
# Q.B;;
- : Q.b = Q.B

Minimal separate compilation:

File aa.ml	let p n = print_int n
File aa.mli	val p : int -> unit
File ab.ml	Aa.p 13
Shell command	ocamlc aa.mli aa.ml ab.ml
Shell command	./a.out

The code in file aa.ml is thus an honorary module called Aa.

To test two types, x and y, for compatibility, type fun ((a : x) : y) -> ();;

With the command line option -rectypes OCaml contemplates infinite types such as

type a = int * 'b as 'b;;

and functions taking and delivering such values:

# ((fun (i, r) -> r) : (a -> a));;
- : a -> a = <fun>

Such values may be created thus: let rec a = (1, (2, a)) in exp wherein the expression a=a is fatal.

type a = int * (int * (int * 'a)) as 'a
type b = int * (int * 'b) as 'b;;

a and b define the same type, or at least compatible types according to the compatibility test above.

let rec a = [a];;
val a : 'a list as 'a =
  [[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[…]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]

type a = A1 | A2;;
let y x = match x with A1 -> 1 | A2 -> 2;;
type b = A1 | B2;;
type c = A1 | A2;;
(A1 : b), (A1 : a), (A1 : c);; => - : b * a * c = (A1, A1, A1)
((A1 : b), (A1 : a), (A1 : c)) = (A1, A1, A1) => - : bool = true
y A1;; => - : int = 1

I cannot explain this. Type definitions generates new type constructors (if constructor is new), but not new types. Type constructors have their independent existence and types are mere sets thereof. When REPL wants to report the type of a constructor it just grabs the the most recent type that includes the constructor.

type a = A1 | A2;;
let y x = match x with A1 -> 1 | A2 -> 2;;
type b = A1 of int | B1;;
y (A1 4);; => Error: The constructor A1 expects 0 argument(s), but is applied here to 1 argument(s)

Explication

Here are some deductions that I have made about the OCaml reference manual which seem fundamental enough that they should be explicit, if indeed they are true.

Clearly functions have one argument and achieve their normal ends thru Currying. The syntax and tutorial are clever enough so that some users will never need to understand Currying and that ‘->’ is right associative. Parts of the manual, such as section 6.4, require understanding Currying. It is a bit brutal to learn the concept of Currying without at least a small warning.

A value cannot be created until its type is defined. It seems that the type definition must precede the value creation and the expression that creates the value must be in scope of the type definition. I don’t know whether an expression can have a value whose type is not in scope. (6 weeks later scopes seem conventional and obvious. One pitfall that confused me is that “let x = 5;;” at the top level of the read-eval-print loop, or in a module definition bind x in the following text. “let x = 5 in exp” binds it only in exp.) Algol 68, for all its documentation faults, made the scope of a definition abundantly clear.

In section “6.3 Names” I think the following should be said:

defining occurrence

applied occurrence

These are nits that languages differ on in subtle ways; indeed I am guessing. I have not found many scope rules. If scope is not the technical term the authors prefer, there should be a description of what terminology supplants such talk of scope. Section 6.2 is a good place to be clear for each of the 11 sorts of name space. For instance I have found nothing that explains the useful result

# let a = 42;;
val a : int = 42
# a;;
- : int = 42
# module Zz = struct
   let a = 3 and b = a
   end;;
module Zz : sig val a : int val b : int end
# Zz.b;;
- : int = 42

I presume that a typexpr of form “' ident” has no defining occurrence but that one should be imagined at the head of each typedef(?).

Is not “/” an infix-op?

Syntactic categories external-declaration, lowercase-ident and capitalized-ident need definitions.

The syntax given in section 6.11 produces “”.

I can’t find the semantics to go along with the syntax: let-binding ::= pattern = expr ∣ value-name { parameter } [: typexpr] = expr which is given here.

I miss the named let of Scheme. If I want ∑[j = 1 to 99]f(j) I write

(let sm ((s 0)(k 99)) (if (< k 1) s (sm (+ (f k) s) (- k 1))))

I have not found a similar OCaml construct. Google(OCaml “named let”) corroborates that OCaml indeed lacks this and that others miss it. It seems I must give the recursive function a name even if I need it only once, and that name pollutes too large a scope.
The best OCaml counterpart pattern I have found is:

let rec sm s k = if k<1 then s else sm ((f k) + s) (k - 1) in sm 0 99

Perhaps this is as general. These both need tail call optimization.

This refers to “compilation unit interfaces”. It would be good to have a link there to the definition of the format of such files. Perhaps this is the right link.

Quote:

unit-name

I cannot find how that name is derived. Experimenting leads me to guess it is by capitalizing the first letter of the file name.

This page should point to an index of technical terms.

Tail Call Optimization

After open Random;; the construct (bits ()) returns a new random number. If you want a function bl n that returns a list of n random numbers there are these ways of defining bl:

let rec bl n = if n = 0 then [] else (bits ())::(bl (n - 1));;: which I find the most natural definition, but it blows the stack for large n.
let bl n = let rec xx ls n = if n = 0 then ls else xx ((bits ())::ls) (n - 1) in xx [] n;;: which is a bit klutzier, but exploits tail call optimization and so avoids stack exhaustion.
let bl n = let xh = ref [] in for i = 1 to n do xh := (bits ())::!xh done; !xh;;: which succumbs to mutability.

We need head recursion as well as tail recursion!

See division algebras implemented in OCaml and some topology too. See too the Fast Fourier Transform.

typos

I have not found an explanation of Caml’s or OCaml’s type system. I think most who come to Caml come via ML or from an academic presentation of type systems. There are many academic papers on type systems but the official definition does not refer to any of them. The implementation from Inria does impressive type inferencing but it is customary for a language definition to require some degree of implementation functionality—in this case some class of programs for which the compiler’s inferring will suffice; but this is merely customary.

Caml’s rules are stricter than those of Scheme. I think that all of my Scheme programs (but perhaps this) could be done in Caml, and most would be improved.

Variants are another instance of the Caml rule that all contexts are weak which means that each expression must manifest its own type, which is to say that each expression context is weak. (Strong contexts allow coercion.) In the type type m = Int of int | Float of float the scope of Int is the same as the scope of m. When the compiler sees Int it knows that it is of type m and provides no coercion. (By contrast C knows which enum a tag is from but coerces it to any context requirement.) From:

type growth = Taller | Shorter | Same
type btype = LeftLean | RightLean | Balanced
and 'a tree = {key: 'a; tag: btype; left: 'a node; right: 'a node}
and 'a node = Tree of 'a tree | Empty;;
let w = {key  = 42; tag = 3; left = Empty; right = Empty};;

it knows that w is an 'a tree for some 'a. It has also confirmed that Empty in left = Empty is of type 'a node, etc.

;; normal factorial definition:
let rec fac n = if n= 0 then 1 else n * (fac (n - 1))
;; tail call definition:
let fac n = let rec fx a j = if j=0 then a else fx (a*j) (j-1) in fx 1 n
;; sum function: sum n f = (floating) sum of f(j) for 0<=j<n 
let sum n f = let rec s a j = if j=0 then a else s (a +. (f (j - 1))) (j - 1) in s 0. n

I cannot find how to produce from the expression syntax, expressions such as Node(Empty, x, Empty, 1) which I see in set.ml. By elimination it seems to be expr::=constr expr but that requires interpreting Node(Empty, x, Empty, 1) as an expr. As an expr that is supposed to denote a tuple but a constr will not take a tuple as a surrogate.

I don’t remember where I first found the print_int function. A few months later I remembered that there was a type specific print function but I could not remember how to spell it. Searching the Inria site under documentation does not mention print_float except in illustrations.

I was reading about whether science courses should include ‘hands-on’ labs instead of mere on-line simulations. I concluded that I am not empirically inclined and am not the sort that discovers new laws, although I like to invent them (as in Keykos) and I admire those who discover new laws. I like the systems that I use to come with a user manual with a well described theory of operation. Newton and later Einstein provided a good user’s manual. I feel that I must approach OCaml empirically, teasing out the subtleties.

Here are some nice comments on OCaml types.

It emerges from the formal syntax that all functor invocations are compile time actions. It would be pedagogically good to say this in the introduction to functors. More generally all types are compile time artifacts and functor invocations produce types.

There are design patterns here that might be critical to my Ram Plan which I have been unable to complete as of 2006.

See my transcription of a Scheme program.
Some simple graphics
OCaml abstraction failure

General OCaml Resource preliminary note on GADTs
Notes on “Types and Programming Languages” by Pierce.
Another root
Good note by Kragen Sittaker

Existential types with dot notation.
Some surprises
Notes on the manual
Synergy (mostly ocaml)

Some OCaml programs:
Boundary of Complex
Sort