Rick van Rein

Mon 22 February 2016


Automating Legal Terms

Legal terms on the Internet are a nuisance. The term TL;DR ("Too Long, Didn't Read") is used to indicate how meaningless the wordy drivel has become, due to sheer length. Interestingly, contracts have a lot of similarity. We should be able to automate them, in fact.

Anyone who provides a service feels the responsibility to put up terms and conditions under which they are supplied; without them, the risk of users driving them to court is simply too great for most. Laws like privacy laws additionally require explicit statements of such usage patterns.

There is nothing wrong with having clear terms whenever parties enter in an agreement. What is wrong however, is the form in which this is done, namely lengthy proze that takes much more time to actually process than the relation may be worth. Having an automatically processable form of contract, together with a mapping to readable text, would be helpful to automate acceptance of terms, and make the legal agreement efficient, while still permitting careful scrutiny of the terms.

Everyhting is an Object

Contracts usually limit themselves to a few simple handles on reality, such as

  • Parties, such as a service provider and its user or group of users
  • Objects, such as services, software, elements of data
  • Rights, both permissions and requirements to parties
  • Actions, such as starting and ending a contract, payments
  • Events, such as bankrupcy of a party, or breaking with contract terms

In terms of programming, we would refer to each of these as an object and each of these has its own identity. Not a human identity such as a word (or domain name), because those change (or expire). Numeric identities are more neutral and are therefore totally open to definition.

A numeric scheme that is especially of interest here, is the so-called Object Identity, commonly abbreviated to OID. Many of these have already been specified to represent, for instance, a bit of information. Two examples are


representing a Postal Address and a Telephone Number, respectively. When such an OID is used together with a bit of data, it is clear what this data represents, as well as what forms it may take.

Note that these valued OIDs function like form fields; they are not meant as placeholders for legal text, because that would defy the purpose of automation. Many OIDs will not even carry data, but merely express "the offering party" or "contract termination".

There are a few very nice properties to OIDs that make them highly suitable for extensible legal automation:

  • They are numeric. Unlike human identities such as words (or domain names) they do not change (or expire) but once defined, they are set in stone. The owner of an OID is usually advised to publish the meaning of such OIDs, once they are defined.

  • Their ownership can transfer at any of the dots between the numbers. The initial part of the OID notation is owned by standards bodies, but gradually lower entities are granted control over the OID. In fact, "lower" may also be in the service of peers.

  • Anyone can get hold of OIDs, so there is nothing exclusive about OIDs, merely about prefixes assigned to certain members. Meaning, if you really need to define a new object then you are not dependent on others. It is merely a matter of coming to a uniform (and easy to process) format that it is advisable to hop on with existing definitions already made by others.

  • The numbers are extensible at the end, which is usually done for refinement of the above-ordered OID. An OID could indeed specify what such refinements mean, for instance adding rights or adding obligations. This might be used to construct a notation for generalisation and specialisation of objects.

  • The numbers have canonical representations. In decimal, one could state that leading zeroes are not written down, and all of a sudden there are no variations in writing down these numbers. A similar thing applies when these numbers are written down in a binary format, such as DER. Having a canonical representation means that numbers can be compared very easily, either for equality or for one falling under another (as a dotted extension).

  • The numbers between the dots are unbounded. Not only in theory, but even in practice it is possible to write down numbers of arbitrary sizes; binary notations such as DER are not constrained to computation boundaries such as 32 or 64 bits, as our computers are. Comparison of numbers still works without needing to map OIDs to their numerical form.

These properties make the OID system highly dynamic and extensible. This should be a great value in composing a standard set of objects.

It is in the interest of all parties concerned to use a limited set of OIDs. It reduces the pressure on end users to accept new definitions and also helps with automatic processing. Especially the possible combination between new OIDs and all the ones that already exist can be dramatic, and should be avoided as much as possible if the intention is to get away from legal agreements that are in effect obscuring a relation, rather than clarifying it.

It is the immutable nature of the OID definitions that makes their terms readable; and technical constructs can be created to have third parties add notary signatures to what they have seen as definitions in another party's registry of OIDs.

Everything is a Relation

Things get interesting when objects are related. They start to form a logic. When (event) happens, then (action) will follow. During (service), the (party) may (right). And of course (party) resides at (postal address) and can be contacted at (phone number). There should be a fairly limited set of these relations. And each of them can be defined accurately... with yet another OID.

But now we turn to the topic of grammar or, as data formats name it, syntax. The relations that are possible between any number of objects should be written down in a form that is clear enough for automated processing but, at the same time, be extensible.

Being open to automated processing means that the logic should be really simple; when processing complex logic expressions, computers can end up being heavily loaded. Most contracts take a form "a AND b AND c" anyway, possibly with a few "(a1 OR a2)" variations. Generally, the use of "NOT" should be avoided. To get even more technical, implications may not be ideal on grounds of soundness.

So, what languages are there to write down such logical things? Plenty, I would say. But it is very important for security to not use a programming language but a data format, and that limits our choices. In addition, we want it to be supportive of future extensions and map to various forms. XML comes to mind for its easy mapping to readable web documents, but binary formats are much more compact and are easier to get into a canonical form suitable for digital signing. (XML can be signed with XML DSIG, but that is incompatible with the notational flexibility of XML, which leads to many problems in practice.)

The extremes can meet in the ASN.1 language. This is a data format specification language that can be mapped to various equivalent representations, inlcuding XML, JSON and canonical binary formats such as DER. So it would be possible to pass a binary form between parties, with digital signatures applied to them, and present it to the user through web interactions.

Having a general syntax that can be signed in one representation and deliver another representation for actual reading is a useful and unparallelled property of ASN.1. When it comes to this level of flexibility, which can greatly aid the user in presenting terms in a style that they like to review, there is no alternative, really.

So, what should be done?

If we want to go this way with InternetWide, we are going to need legal help. We will need authors who

  • Standardise objects as OIDs
  • Standardise relations as OIDs
  • Standardise an ASN.1 legal notation

and who can make a start describing the process of writing legal terms under these constraints, and helping people understand the importance of shared legal notions and a world-wide limitation of variation.

In addition, there will be a need for software to

  • Translate legal terms to the old, wordy form
  • Store and edit legal constructs deemed acceptable by a party
  • Fill out a party's details (using LDAP, for instance)
  • Mark unrecognised parts
  • Automate acceptance when nothing stands out as non-standard
  • Establish third-party timestamping of OID registries

Who knows, in some remote future we might end up negotiating terms, and end up with truly professional relationship management. But for now, the goal of automating the boring acceptance of agreements under a personal set of values is going to be much more useful than merely trying to get away with "TL;DR".

Want a demo?

Then take a look at the communication filter example.

Go Top