Preliminary meta note: this ‘review’ is in some ways more about Keykos than Capsicum because important issues are raised and given names.
Some of these issues implicitly guided Keykos design and were perhaps undocumented.
For this I thank Watson for teasing out these issues, naming them and explaining them.
There is much good information here about recent systems of which I had heard little detail.
Watson is capability savvy so this information is probably relevant.
The references in this thesis are unusually useful.
I have adapted some terminology in this thesis to new and old descriptions of Keykos features a few times.
Beware that “Mac” means Macintosh while “MAC” means ‘Mandatory Access Control’ which means very roughly that A controls whether B can show something to C.
“MLS” is ‘multi level security’ and “Biba” is an integrity policy framework named after its inventor.
“VFS” is ‘Virtual File System’ which is kernel code that makes it possible for code outside the kernel to define the meaning of parts of the file system space.
System V IPC;
Capsicum proper: Groping the source: the kernel sources.
An e-mail list
Section 2: Here is a digression on concurrency vulnerabilities provoked by chapter 2.
I have now read or at least skimmed this section.
It appears to me that there are no insuperable obstacles to imposing a security policy such as those mentioned, except possibly for complexity.
It shows a heroic job of modularizing policy within a kernel such as the Unix kernel.
(Snide Remark) You don’t have to be nearly so smart to build a secure capability system as to tack security onto Unix.
Section 3.2: I will take this to be the typical ‘stackable file system’ until I find better.
Actually it seems to be a framework for such.
It is in the kernel and supports particular file systems that are implemented in kernel code.
Section 3.4 Quote:
An object-centric approach facilitates implementing mandatory access control policies that are concerned with the flow of information between subjects and objects (such as MLS and Biba).
All Keykos kernel calls address objects.
Indeed object invocation is the only kernel call.
Section 3.4.1 (Guiding principles):
Do not commit to a particular access control policy.
At this stage, however, a single system is presumed to such a commitment.
I run ahead here—sorry.
Another possible goal, keep your TCB out of my TCB.
Just because we must share hardware, do not make my security vulnerable to bugs in your security policy code.
Support multiple simultaneous and independent policies.
We are getting there!
Section 3.5.1: (Framework Startup)
I believe that initialization of complex systems is necessarily ad-hoc.
Much grief arrises from the C++ notion of just putting your initialization code in static blocks and all will automatically work just right.
As I read I realize that there are probably places in the legacy kernels, of which I am unaware, that carry information which may be subject to security policies.
In so far as security is merely tracking causal chains, we may not be concerned with the nature of these mysterious wheels and cogs.
I shall suppose that this is the case.
Capability discipline imposes itself on even those system components of which you are unaware!
Section 3.5.3: Quote:
This approach [legacy Unix kernel code following object precepts] is often a natural fit for the kernel architecture, which often (despite a lack of formal language support for object-oriented programming) takes on an object-oriented structure.
Without this fortuitous fact Capsicum would be infeasible.
MAC Framework entry point invocation is necessarily somewhat subjective, but generally revolves around balancing placing the checks deep enough to allow a single enforcement point for a particular level abstraction.
The same issues arise in capability systems, except there the onus on the subsystem builder is to depend on only the capabilities he needs thus explicitly manifesting the information flow.
The MAC Framework requires that the corresponding builder be honest about what information flows there are and thus to call the Framework appropriately.
In short the MAC Framework requires the subsystem builder to declare the flows whereas in the ‘micro-kernel’ approach the discipline is enforced.
Section 3.5.6 (Policy composition)
In the MAC Framework design, only policies limiting the rights granted to subjects relative to the base access control policy are supported, requiring a much simpler composition in which the set of rights granted is the intersection of rights granted across all registered policies.
This meta-policy is simple, deterministic, predictable by developers, and above all, useful.
This is a clear statement but illustrates a bias that pervades much computer security theory: “The purpose of computer security is to keep bad things from happening.”.
Newly imposed policies change behavior of code, even security relevant code in ways adverse to security.
This is perhaps what gives security a bad name among programmers; it tends to produce collateral damage, all for some ‘greater good’.
A security framework such as capabilities can enhance allowable authority because it can make authority narrow.
The author makes this point elsewhere.
I deny that the meta-policy is ‘predictable by developers’.
The rule is easy to understand but the ramifications are hard to foresee.
The developer of the broken app referred to in the previous paragraph could not predict that his ambient authority would be different in some new situation.
The art of software platform design is to provide predictable behavior.
The new policy broke program logic not to keep some secret, but because of a transgression of some rule which in different somewhat similar situations would have betrayed a secret.
Section 3.6.1: (The Biba integrity policy)
In general, high integrity subjects are allowed to write but not read lower integrity objects, and low integrity subjects are allowed to read but not write high integrity objects.
Using this terminology the quote appears to isolate rather than merely defend the high integrity modules.
In real systems with which I am familiar the higher integrity world must be able to depend on the low integrity world but only according to the logic of code in the high integrity world.
This is the crux of the modern bane of parsing data from an untrusted source.
There are effects from low integrity to high, but at the behest of the high integrity logic—the platform defends, but does not isolate the high from the low.
The MAC version is nearly the definition of the philosophical term epiphenomenon with the low integrity world being unaffected by the high.
Section 3.8 (Related work):
Unsatisfyingly, no single policy model has proven simple, flexible, and useful for all configurations.
The quote speaks here of policies suitable for the MAC Framework.
I have no experience in applying these models but this quote matches my expectations.
These models seem always to be oversimplifications of real security problems.
By their nature they represent add-ons to the application rather than security as an integral part of the application and its specification.
See Natural Security.
Section 4.3.9 (Performance optimizations)
In Mac OS X, however, the assumption is that sandboxing will apply only to specific high-risk processes, making it desirable to avoid the overhead of enforcement on other processes.
Section 4.4.3 (Complexity)
Quote (concerning the 240 MAC policy entry points that a policy must implement):
One answer to this question is that the number of MAC Framework interfaces corresponds to the product of the number of core kernel objects requiring controls, and the number of methods on those objects — i.e., that the complexity of the framework corresponds directly to core complexity in the kernel itself, and that reducing that exposed complexity would reduce necessary expressiveness.
It seems appropriate to note here that there are just two state bearing objects types in Keykos, the page and the node.
Further the policy is that if you have a capability to one of those you can do the operation that it specifies.
Section 5 (Capsicum: practical capabilities for UNIX) Quote:
The problems addressed by the MAC Framework and Capsicum are not strictly orthogonal, but the composition of the two approaches successfully captures notions of system-centric and application-centric controls.
I have often argued against melding fundamental security mechanisms as it makes understanding them difficult.
It makes the application designer’s job much more difficult to the point of impossible even for innocent applications.
The designer has the duty to make the right things happen.
That reminds me of the recurring joke that the main passion of the IT department is preventing the use of computers.
An access policy should be responsible for assuring legitimate access, not merely piling on evermore excuses to deny it.
I am reminded of an anecdote where a collection of people who had not collaborated before assembled to design a new crypto protocol.
There was much confusion over the meaning of “fail-safe” in the context of two computers failing to find a mutually agreeable crypto parameters.
One group assumed that to transmit in the clear was the fail-safe answer, and the other that to forbid the connection was proper.
(It is clear that there is not one correct answer.)
Section 5.1 Quote:
Capsicum capabilities should not be confused with operating system privileges, occasionally referred to as capabilities in the OS literature.
Operating system privilege, on the other hand, refers to exemption from access control or integrity properties granted to processes (perhaps assigned via a role system), such as the right to override DAC permissions or load kernel modules.
A fine-grained privilege policy supplements, but does not replace, a capability system such as Capsicum.
All such authority in Keykos, outside that of the kernel by virtue of running in privileged mode, is via capabilities!
We found no need to override anything like DAC (there was none) or capabilities.
There was a closely held capability to peek at kernel memory but we never found a need to poke it.
I will be curious to learn what protection is needed in Capsicum not provided by capabilities.
We have modified several applications, including base FreeBSD utilities and Chromium, to use Capsicum primitives.
No special privilege is required, and code changes are minimal: the tcpdump utility, plagued with security vulnerabilities in the past, can be sandboxed with Capsicum in around ten lines of code, and Chromium can have OS-supported sandboxing in just 100 lines.
That is in line with what I would expect.
Section 5.2 Quote
It is a good idea in either case!
Capsicum requires application modification to exploit new security functionality, but this may be done gradually, rather than requiring a wholesale conversion to a pure capability model.
Keykos has a limited ability to run unmodified Unix programs, more limited than Capsicum but no particular conceptual limit to how far this limit could be pushed.
This model requires a number of pragmatic design choices, not least the decision to eschew microkernel architecture and migration to pure message-passing.
While applications may adopt a message-passing approach, and indeed will need to do so to fully utilize the Capsicum architecture, we provide “fast paths” in the form of direct system call manipulation of kernel objects through delegated file descriptors.
I don’t know what ‘pure message-passing’ would be.
Later Keykos literature unified presentation by describing kernel object (primitive object) invocation as if by message.
In Keykos one need not distinguish between a kernel object and extended object in order to invoke it.
Indeed one cannot generally discover the difference.
We had always assumed that the same invocation style would apply to kernel objects but messages in Keykos are a bit of a figment as they as consumed in the same unit of operation as they are produced.
There are no messages in externally documented system states.
Messages are not marshaled except in obscure and rare cases.
Otherwise I know of no real or proposed architecture without kernel implemented objects.
Certainly Keykos has kernel objects.
There are certainly tradeoffs for particular sorts of objects on whether they are in the kernel.
Sometimes a design change that removes an object from the kernel results in even more hair in the kernel necessary to support the exiled object in its job.
The balance in Keykos was to minimize total kernel complexity, which meant new kernel objects as we seldom opted to limit the overall system functionality.
I tentatively imagine that the classic style of kernel invocation could not be re-conceived as a message send.
In a pure capability plan the invocation first selects the addressee, and then specifies what is to be done.
In Unix and most other systems, one first specifies the operation and then the operand.
(verb-noun vs. noun-verb)
I suspect there are ramifications.
More later on this! ...
In section 5.2.1 they tell of removing the ambient authority of namespaces from processes in capability mode.
This seems like a good conceptual start.
They go on to say that devices in file directory /dev are also restricted but I thought that was already gone by virtue ore removing the file name space.
I am confused.
Perhaps the denizens of /dev are unavailable even with access to /dev.
That reboot() is denied would imply that capability mode processes might otherwise be root mode, which I would find peculiar.
Section 5.2.2 Quote:
The cap_new() system call creates a new capability given an existing file descriptor and a mask of rights; if the original descriptor is a capability, the requested rights must be a subset of the original rights.
Perhaps most capability systems use this plan.
I am not sure that this is an important point but Keykos considered and rejected this plan.
The code in each sort of object in Keykos gets the raw data byte (8 bits) from the real key and can do with it as it pleases.
Page and Segment nodes share some of the layout of their respective data bytes.
This means that there is no generic key weakening operation.
Some objects interpret bit of the data byte in ways that would conflict with such a weakening.
(Table 5.1) enumerates the namespaces!!
Some of these are outside the kernel’s purview.
I hope that it will be feasible somewhere to enumerate the authority of a Capsicum process in capability mode, instead of enumerating those denied Unix things.
It is possible to imagine less conservative solutions, such as preventing upward renames that could introduce exploitable cycles, or additional synchronization; these strike us as risky tactics, and we have selected the simplest solution, at some cost to flexibility.
When Keykos tries to provide Unix like environments we consider directories to be an object that holds capabilities to the things inside.
We do not generally disallow cyclic directory structures.
Section 5.2.3 Quote:
Even with Capsicum’s kernel primitives, creating sandboxes without leaking undesired resources via file descriptors, memory mappings, or memory contents is difficult.
libcapsicum therefore provides an API for starting scrubbed sandbox processes, and explicit delegation APIs to assign rights to sandboxes.
Unix and Linux need this!
Section 5.3.3; Quote:
After implementing Capsicum, we encountered a concurrency vulnerability exploiting non-atomicity in namei(): two threads can concurrently collude in manipulating the file system to escape their respective sandboxes.
I am glad that Keykos kernel calls are atomic.
Now I know even more reasons!
For example, whereas system call interposition suffered from fundamental races, the MAC Framework was crafted specifically to integrate security policies with the kernel’s synchronization approach.
Keykos never had need for the sort of interposition described in chapter 2 of this thesis.
Capsicum does not enforce the use of a specific Interface Description Language (IDL), as existing compartmentalized or privilege-separated applications have their own, often hand-coded, RPC marshaling already.
Here, our design choice differs from historic microkernel systems, which universally have selected a specific IDL, such as the Mach Interface Generator (MIG) on Mach.
Keykos never invested in an IDL.
How would one enforce such an IDL, except thru lack of documentation?
Keykos had general purpose key invocation facilities specific to each of the languages that were commonly used.
To call some particular object from some language, you read the object specs which are in terms of a data string and up to four keys, and read or already know how to name arbitrary byte strings and keys in the language that you are using.
This section begins to raise questions in the reader’s mind that should have been answered before now.
What are the conditions for success of an openat(4, "foo") call?
Do permissions still play a role?
I hope not.
Non listable directories might solve some security problems but I think such problems are better solved in other ways.
Here is the tortuous semantics of the new kernel call renameat().
I would point out that rename is not an operation on a file but an operation on a directory, or perhaps two directories.
It should require a capability to those directories but it does not.
Section 5.4 () Quote
Adapting applications for use with sandboxing is a non-trivial task, regardless of the framework, as it requires analyzing programs to determine their resource dependencies, and adopting a distributed system programming style in which components must use message passing or explicit shared memory rather than relying on a common address space for communication.
I should add here that running legacy Unix (Linux) applications in Keykos is likewise problematic.
I have not seen any particular app behavior that could not be reproduced in a suitably contrived Keykos environment, running the legacy binaries.
Such contrivances do not arbitrarily compose however.
When we bring a legacy app into Keykos a problem of expectations arises.
Such expectations are often expressed in Unix concepts which are meaningless in a capability world.
A gradual transition is difficult in either technology.
Important cases are easy, however.
Keykos ran several compilers and X11 in their native binary form.
Section 5.4.3 (gzip) Quote:
Many gzip sessions can run independently for many different users, and there can be no assumption that placing them in the same sandbox provides the desired security properties.
I had not realized that there was to be just one sandbox per program.
That would put it into the same design category as Mach.
If we say that a stack frame represents the state of some invocation of some program, then we can say that Keykos protection domains are for one one invocation.
This is an overstatement which needs to be corrected and more precise as well!
Section 5.4.4 (Chromium) Quote:
The only significant Capsicum change to the FreeBSD port of Chromium was to switch from System V shared memory (permitted in Linux sandboxes) to POSIX shared memory code as used in the Mac OS X port, which is capability-oriented and hence permitted in capability mode.
Approximately 100 additional lines of code were required to introduce calls to lc limitfd() to limit access to file descriptors passed to sandbox processes, such as Chromium data pak files, stdio, and /dev/random, font files, and to call cap_enter().
This compares favorably with the 4.3 million lines of code in the Chromium source tree, but would not have been possible without existing sandbox support in the design.
We believe it should be possible, without a significantly larger number of lines of code, to explore using the libcapsicum API directly.
This is good news for anyone who would constrain Chromium’s guests.
Section 5.5 (Comparison of sandboxing technologies)
This is a survey of sandbox technologies across Chromium ports.
I had not found such information when I sought it before.
Section 5.5.5 (Linux seccomp)
In order to allow other system calls, Chromium constructs a process in which one thread executes in seccomp mode, and another “trusted” thread sharing the same address space has normal system call access.
Chromium rewrites glibc system call vectors to forward system calls to the trusted thread, where they are filtered in order to prevent access to inappropriate shared memory objects, opening files for write, etc.
It would seem that the untrusted thread would have write access to the virtual memory of the trusted thread.
Section 6 (Related work)
However, unlike pure capability systems, Capsicum both retains global namespaces (outside of capability mode), and differentiates capabilities for resources offered by the kernel and userspace.
I would say that Capsicum provides capability enclaves within a conventional system.
Much like the primitive Unix enclaves that Keykos provides within a capability system.
Section 5.9 (Conclusion)
Capsicum’s relationship to existing access control and security techniques appears constructive: it usefully complements mandatory techniques in a manner not dissimilar 160 to the link between capabilities and mandatory access control found in systems such as DTMach  and LOCK .
There always exists a risk that the composition of security technologies leads to new security failures, which cannot be ignored — on the other hand, the relationship between mandatory access control and capabilities has been studied extensively in the research literature, giving us confidence that they can be used together successfully.
If you view the platform’s job as providing access to information and allowing programmed actions, then the 1960 computer might be taken as the ultimate platform—no protection.
If you take the platform’s job as preventing bad things from happening then the computer with no power is the ultimate.
Capabilities provide something like property rights which we, and our culture, have evolved to reason about.
Capsicum lends itself to adoption by blending immediate security improvements to current applications with the long-term prospects of a more capability-oriented future.
I think that this is plausible, I certainly hope so.
My research into the concurrency implications of software interposition for security do not apply just to system call wrappers, as explored in Chapter 2: interposition is a software design construct used throughout security as it separates enforcement and policy from the implementation of an underlying service.
“Interposition” is a somewhat new word.
As used in the above quote it should not apply to the Keykos wrapper pattern.
Perhaps the difference is merely that arguments are indeed copied upon invocation.
(Note that invoking a C function with prototype void fun(int) the argument is indeed copied—into a register!
I would call that a copied message.
Be careful of your blanket opprobrium.)
The appeal of system call wrappers lies in large part in their simplicity: they impose controls on a well-defined interface independently of the implementation of the underlying system.
As my research into concurrency vulnerabilities has shown, this appeal is deceptive — successful interposition involves not just sequential or nested invocations of software components, but instead requires a rich understanding of their semantic composition.
Interposition cannot simply be dismissed, however: it is a fundamental primitive of software composition, so we require a better understanding of how to apply it correctly.
It is clear here that Watson does not distinguish between interposition and wrapping.
With those terminology rules it would appear that message copying is the only way to go.
Note to myself: I must think longer about ‘semantic races’.
I remain unsatisfied with the integration of security event auditing with the MAC Framework; while the framework can usefully control audit operation, how best to allow policies to generate audit records, annotating existing events with extended access control information, is an unsolved problem.
See Horton’s Who Done It?.
Compartmentalized applications, when authored in the C language and executing in the UNIX process model, are fundamentally distributed applications: the programmer is reduced to using message passing, which is not only markedly slower than direct function invocation, but also introduces significant programming hurdles due to the loss of an assumption of a single address space.
Afore mentioned blanket opprobrium.
There are throughout the thesis scattered comments about the inefficiency of message passing.
Keykos messages were short.
A 4K byte limit on strings was seldom encountered in application design.
Larger amounts of data could be passed via segment (think unnamed file) or sharing memory between caller and callee.
Also the 370 implementations put great effort on making the data copy instruction (MVCVL) fast; it saturated the bus between cache and RAM with efficient transactions.
Hardware context switching was another matter.
The fastest 370 context switch was 90 instructions but that is misleading for some of those instructions invoked complex memory map hardware functions.
120 instructions was typical of realistic short switches.
The Motorola 88K (an early RISC CPU) context switch was 500 clock cycles.
To be answered:
- Can one define new objects without writing new kernel code?
- Can you write code to produce objects inside the capability paradigm?
- Is there Synergy?
- Which capabilities can be virtualized?
(none according to section 5.8)
- Can you confine?
- What plays the rôle of a Kekos node? (perhaps a process with only the task of holding capabilities)
A very high level comment about Capsicum is that it may do a good job at protecting old time residents against the guests, but not at protecting the residents against other residents.
I still need a place to run small code entrusted with a powerful private RSA key.
I would also like to run code with code with authority to spend significant amounts of my money.
For these purposes I must still trust the residents—all the code that I have found useful to install on my machine.
I think that Capsicum does not achieve this.
Perhaps such protection can be implemented upon Capsicum.
It may indeed guard the installed apps and their state against the many transient guests brought by the browser, and that is worth a lot.
Biba, concerned with integrity, allows only downward effects; unclassified is the epiphenomena.
MLS, concerned with secrets, allows only upward effects; classified is the epiphenomena.
Together they prevent all communication between the levels.
Biba blinds the general lest he be confused by what he sees.
MLS muzzles the general, including most under his command, lest the world discover what his military decisions are.
Just what the enemy ordered.
Information flow, up and down, are of course what military command structure is all about.
Biba together with MLS would seen to remove computers from any rôle therein.
These points have been made many times before!
Abstraction is a powerful tool but abstracting secrecy and integrity from application logic seems like a fatal error—MLS and Biba do just that.
They are fatally against the grain of the real functional requirements of much of the software!