Rick van Rein
Published

vr 17 april 2020

←Home

Identity 15: Entering Security Codes

Strong security starts with big numbers, in the range of 38 digits (128 bits) for the current level, with an imminent upgrade to 77 digits (256 bits) to thwart Quantum Computers. We can make such security codes a little less dreadful to use.

The best passwords have a lot of entropy, meaning surprising bits. While English proze has a density of 1 bit per character, because of its structure. A number has about 3 bits per digit, hexidecimal comes in at precisely 4 and BASE32 and BASE64 bring 5 and 6 bits into the game. The lack of structure in these formats are terrible for human users.

This is a somewhat fundamental article.

Helm Keys are Long Codes

As explained before we need fallback codes to bootstrap our online identity. A very simple system for that is the HAAN Service, for which both user name and password are represented as long codes. Normally, you would not enter those manually, but let's say did, and registered a fallback identity with a typo. Whoops!

As an example, these are fallback credentials that a HAAN Service could provide:

Haan Key: 5XICN3JA3FJ7IILMWNNGHQLH3FZIXYPU@unicorn.demo.arpa2.org
Password: 6P5ED4TDH4S7ZQHKM3FUDF4SZDNXROAU

(I can simply post these actual credentials here; they have no value because they will not be registered anywhere.)

It is easy to make a mistake when you would enter these by hand. In the web world, the "solution" might be to let you enter the same address twice, or to authenticate just to be sure. We prefer to squash that need because it involves typing long codes, and because the passwords are reserved for fallback/recovery uses.

The solution is then to make those long codes, that you only enter in exceptions, just a little longer (as has already been done in the above). This gives a security level of more bits than strictly required, and if we find a way to ensure that the security does not drop below the intended level, we may allow corrections.

Counting Entropy

When we speak of 128 bits or 256 bits of information, we speak of the entropy, or the amount of pure information, without any structure, so completely surprising with every bit. Contrast that with somewhat predictable codes that are known as redundant. Practical codes are a mixture; English proze has some surprises (otherwise there would be no use reading it) but not too much (allowing us to recognise structure).

The term entropy dates back to Shannon's information theory, which considers noisy channels and the amount of entropy they could transport reliably. If 30% of the channel's content gets distorted by noise or clicks, then only 70% of the bits sent will arrive properly. If you add redundant bits you might detect and perhaps even correct errors. (You might even like to read up on this 1948 technology.)

This is pretty much the approach that we have in mind; we need to match a registered identity, but may remove some of its entropy to correct for mistakes. To do so, we allow for some operations on the text and count the number of bits to encode them. Subtracting all the necessary changes should not bring the security level below the desired level of 128 or 256 bits.

Entropy Lost due to Typos

People make a few common mistakes when they type information:

  • Add a character
  • Change a character
  • Drop a character
  • Swapping two adjecent characters

Again, we can go back to very old computer science theory to find algorithms doing this. The combination of these four aspects is resolved in the Damerau-Levenshtein algorithm. Normally, these algorithms count the number of editing operations, but a weight may be assigned to each of them. The weight we coose is the number of bits to encode operations:

  • Exact matches: 0 bits lost
  • Switches between upper/lower case: 0 bits lost
  • Switches between I and 1 or between O and 0 in BASE32 codes: 0 bits lost
  • Adding a character: 11 bits lost
  • Changing a character: 11 bits lost
  • Changing a character to a keyboard neighboir: 7 bits lost
  • Dropping a character: 7 bits lost
  • Swapping two characters: 7 bits lost

Now you wonder how we got to these bit losses, of course. It is really straightforward; we devised an encoding:

  • Equivalent changes consume no entropy
  • Operation choices consume 2 or 3 bits of entropy
  • Positions consume an estimated 4 bits (there will be ~4 of them in up to 64 characters)
  • Character data consumes 5 bits of entropy (for BASE32)
  • Adding a character: 00<pos><data>
  • Changing a character: 01<pos><data>
  • Changing to keyboard left: 100<pos>
  • Changing to keyboard right: 101<pos>
  • Drop a character: 110<pos>
  • Swap adjacent characters: 111<pos>

Note how this encoding would be parseable, there are no overlapping operations. The codes would be completely surprising, so they take away entropy. If we start with 160 or 320 bits of entropy, we can loose 32 or 64 bits of entropy and still have 128 or 256 bits left.

We also don't need to do anything special with these codes; we can just generate more random bits to include; this is effectively the entropy from which we shall subtract.

Examples

We shall look at a few small codes and see how much changes in them. The input code is vertical and the output is horizontal, and each matric node shows the entropy lost up to that point in the two codes. The output is in the bottom right position.

Going from ABBA to ABBA, we find no cost because we simply run along the diagonal:

 0/base 11/add  22/add  33/base
 7/del   0/base 11/base 22/add
14/del   7/base  0/base 11/add
21/base 14/del   7/del   0/base

The code base is a choice for plain copy or substitution; add and del and swap mean what you think they mean.

Now let's swap the last two characters and go from ABBA to ABAB:

 0/base 11/add  22/base 33/add
 7/del   0/base 11/add  22/base
14/del   7/base 11/base 11/base
21/base 14/del   7/base  7/swap

Finally, let's see what happens when adding characters from ABBA to ABBBA:

 0/base 11/add  22/add  33/add  44/base
 7/del   0/base 11/base 22/base 33/add
14/del   7/base  0/base 11/base 22/add
21/base 14/del   7/del  11/base 11/base

As you can see, you can make about 3-4 mistakes when 160 bits are supplied for the 128 bit security level and 6-8 when 320 bits are supplied for the 256 bit security level. The redundancy really pays off.

We believe it is useful to have a way to allow gradual decay of security levels. This allows control over the achieved level, and makes it measurable.

Extensions to SASL

This mechanism can be implemented in SASL, when it goes from an authenticated/logged-in username to an authorisation/access-control username. Many of the mechanisms, including even the trivial PLAIN mechanism, support such changes. Any implementation of SASL therefore needs code to see if such changes are permitted.

It would be typical of a HAAN Service (Helm Arbitracy Access Node) to facilitate these corrective measures and help people to help themselves.

It should be clear from this that the 128-bit and 256-bit forms must have distinct computations; one must not be a prefix or otherwise modified form of the other. It should not be supported that a 256-bit identity could be used to facilitate (a lot) more editing to get to a 128-bit identity!

Extensions to ARPA2 Helm

The ARPA2 Helm can use SASL with Realm Crossover. When a user wants to login, they can choose to relay to a HAAN Service, which responds with the authorisation identifier. In normal circumstances this would match the identity stored in the ARPA2 Helm, which may look it up without distinguishing upper/lower case. If a typo was made in the registered identity, the HAAN Service must be asked to derive and count the editing operations, to see if sufficient entropy remains. If so, the authorisation identifier that passes can be the mistyped one known at the ARPA2 Helm.

To the ARPA2 Helm, this means that hints must be given when a login fails. It too could use this comparison to see if an entered correct identifier matches one that was registered with sufficient entropy left; if so, it might present that as an option. Note that this is not an information leak; it assumes that a user already logged in with their correct identity, which just not happens to match the one registered in the ARPA2 Helm.

There will be a wish to correct wrongly registered identities, of course. To the ARPA2 Helm, that kind of action is within normal operational parameters.

Go Top