Anti Massive Breach

We recently hear of a massive data breach wherein someone seems to have exfiltrated from Yahoo’s computers, detailed information about 500,000,000 of its customers. There are presumably many Yahoo employees with legitimate access to that information and many programs with routine access to that data between the data and those employees. Among these many employees we must fear some malfeasance. I suggest here a capability pattern to avoid such massive breaches even so.

By “breach” I mean failure of veiling. We need a better word; “steal” or “theft” suggests the rightful owner no longer has it. In this note “anti-breach” refers to steps taken to prevent more than a small fraction of the data leaving the system except as dictated by system design.

There are several sorts of ‘legitimate access’:

Account agents employed by data owner to deal with individual accounts. These may need unvetted software to do their job but are yet separated from the data by unvetted abstraction software responsible for:
- limiting the data that that particular agent can see,
- limiting database modifications that that particular agent can perform,
- maintaining database invariants.
Such agents may legitimately need access to dozens of accounts per day.
Analysts who need to find patterns and to understand the nature of the customer base. Analysts need unvetted programs with unfettered RO access to the data but with limited ability to return data.
Routine unvetted decision support functions that rely on statistics produced by unvetted programs with unlimited RO access to the database.
Execution of business decisions that require massive write access to data base.
Most importantly, the software that initiates sessions in conjunction with checking passwords. The initiator needs a function to map username to a pair (salted hashed password, opaque cap to client account). The opaque cap will be sent to a session server who can open it.

By ‘unvetted’, applied to code, I mean that the anti-breach logic does not rely on properties of that code.

Some seeming problems may be solved by building virtual massive databases with fictitious users, for debugging programs that need massive read access but must be confined when run on real data. I don’t think this is a real problem, however.

In the Yahoo case the new session initiator is probably the heaviest user of the data base.

These initiators are probably geographically distributed and concomitantly more vulnerable. They can steal the plain password of clients whose traffic passes thru them. They can guess user names and steal hashed passwords when they guess correctly but they tend to reveal themselves if they guess wrong too often. This code presumably runs physically close to the code that with access to the symmetric crypto TLS session key. I guess that the questions are “How good is transmission of opaque caps?, How secure is the platform of the initiator?, How simple is the code?”.

Note that the initiator lacks access to the account and that the session server lacks access to even the hashed password. The session server has access only to those accounts allocated to it by the initiators. I think it is clear that there is one small critical component that must merely be done right and on which anti-breach logic does rely: The initiator which includes password checking and may include TLS code. Continuity of security across the TLS and password checking is vital and this component must also live on an extra secure platform. These are difficult security anti-breach problems with the initiator that the ideas in this note do not bear on.

The only novel thing here is that there is in all directions from the massive data base a membrane with counters. A simple Shannon style “information theory” argument can be made which measures and limits the amount of info that leaves. Untrusted programs are allowed within that membrane but limited as to how much they can exfiltrate. This all presumes an additional quantitative burden on whoever administers access to the data base, and the software they use.

This is merely a modification and extension of ideas designed long ago for Derwent.

The simple observation here is that this pattern was common already when those agents, and the consumers of their reports, were separated from the data by a 110 b/sec teletype. Such programmers, or their clients, were not in a position to exfiltrate the gigabytes of the database thru their terminals. Capability patterns can provide similar limitations today. Todays’s programmers may be unable to imagine programming limited to a peak data rate of 110 bits/sec. It was easy and superior in some ways to today’s world.

New Software

What is the shape of the software needed to bring this plan about? There is a significant element of allocation that must be a human activity by someone who is responsible for anti-breach and who knows the legitimate access needs. We name that person the ‘admin’ here. The fundamental tool of the admin is an object we will refer to here as ”adm” for short. Adm has unfettered access to the massive database but uses that only as described here. At the current abstraction level we will say that adm has exclusive unfettered access by excluding the hashed salted passwords from the database. Admin has exclusive access to adm, what ever that means in an institution designed to survive loss of an individual.

There may be a partitioning of the database so that one client’s data is within one partition. This is a hack which might improve protection but might not be necessary.

There is an order on adm (method in normal language) to create and return a database ‘portal’ that returns n ‘account references’ per day. An account reference is a capability. Such a portal can be made rescindable but that is someone else logic, not part of adm. An ‘account reference’ can examine and modify the properties within a customer account. For each account there are facets to its account reference that limit actions on that account. For each sort of reference there is an order on amd that returns a portal limited to such facets.

We need at this point to consider that a company such as Yahoo provides data serves between its clients. The state of an account will include capabilities of some sort to other accounts, perhaps in other partitions. We also need to consider the lifetime of these account capabilities. Perhaps the server object that services a session, retains an account cap for only the duration of the session. This design needs better information on the service of the institution. (Loose end!!)

Another order on adm sends a sealed factory. That factory produces objects that are allowed unlimited access to RO account capabilities. This order requires discreet factories. There are probably other resource limits on the yields of the factory.

There are other tools for the admin that help remember past admin actions and provide tools to rescind those.