Rick van Rein

di 22 december 2020


Access 2: Shaped like a Matrix

There are two ways of looking at Access Control. One is easy, with a direct relation to the resources being managed. The other is advanced, but like putty in the hands of administrators; moreover, it is highly efficient. Efficiency matters; it allows us to enforce access control everywhere, with no experienced discomfort. We derive the efficient model from the one that is easy to use.

This article is part of a series on access control and is related to another series on identity.

In this article, a few examples of Access Control in the InternetWide Architecture will be explained. They will be part of the ARPA2 common support library that we develop alongside.

Access Control is structured like a Matrix

Access Control in Rows and Columns

In the above diagram, we have drawn yellow hexagonal cells, each expressing some access control information.

  • %R is the Access Right of reading, %RW for reading and writing, and %CRWD adds creation and deletion rights.
  • =aadmin is a variable definition supplied along with the Access Rights. What variables mean depends on the Access Type (application area). Here, it might mean only when using an admin alias.
  • ^log triggers an action in the evaluating software. It too is defined by the Access Type, and it may or may not be meaningful to the application evaluating access rights.

Variables and triggers are rather luxureous features to add, but they are almost zero-cost and can greatly leverage expressiveness, so we added them. The heart of the matter lies in the %rights indications, of course.

The cells express the rights where the coordinates in the blue boxes on top and the green trapeziums on the left meet.

The blue boxes on the top are what we call the Access Name; in general terms these are UTF-8 Strings with generic handling rules. The actual string is specific to the Access Type being used; for document access the string holds a volume/path format; for communication it holds the userid of the local user.

The green trapeziums on the left are Remote Selectors; at the base, they start as a full-blown Remote Identity, but gradual abstraction steps turns them into a pattern that could match if the more specific forms did not. Access Control will start at the base and iterate upwards to try to gain access.

The fish is the last to discover water, because it is all around him. Similarly, this entire image is considered to apply to an local domain being accessed (the Access Domain), and to an Access Type (such as document access).

Resource-Centric view on the ACL

Access Control makes most sense in relation to resources, even if this does not yield the most scalable implementation. Under any object that we want to protect, we can iterate who may access it and who may not. In the diagram above, this is looking at one column at a time.

Let's write the hdd/photo column in a computer-sciency ACL notation.

dn: ...,accessDomain=example.org,...
object: labeledURIObject
object: accessControl
accessType: 3c2291f6-fc11-3d83-9908-f79b2d2f4ced
accessName: hdd/photo
accessRule: ^log %R ~@example.com %CRWD ~john@example.com
cn: Collection of Photos on Bugs, Beetles and Beatles
labeledURI: https://photo.example.com/john/bugs

This defines two Access Rules consisting of a sequence of code words. We use % before rights, ^ before triggers and, not shown here, = to set any of 26 variables and # to comment out (or disable) a word.

This example notation defines a single object with all the information we need for Access Control:

  • Access Domain from the dn: line
  • Access Type from the accessType line
  • Access Name from the accessName line
  • Access Rights and a few extra facilities in the accessRule line.
  • Remote Selectors at various levels in the accessRule line.

All this information is packed tightly together, and can be requested with a single query. This is the common approach for LDAP, of which this is an example. It works quite well, as long as not too many users approach any single object.

A problem with this model is that all the information is out in the open, making it is easy to iterate the data. This can be a problem to security and to privacy, and an interesting target for data theft. This makes the form less suitable for sharing with service providers, especially when these are third parties and certainly when these are large-scale operations.

Example Service using the ACL

Imagine a service like an Apache web server, configured with HTTP-SASL authentication and entry-level Access Control:

<VirtualHost ...>
   DocumentRoot ...
   ServerName photo.example.com
   <Location /john/bugs>
      AuthType SASL
      Require valid user
      AccessRule ^log %R ~@example.com %CRWD ~john@example.com

Note how simple this is. The Access Domain, Access Type and Access Name are implicit in the webserver configuration, but they are there. But they are not really used; Apache can just evaluate the AccessRule directive against the authenticated %REMOTE_USER variable.

This is lovely for personally run servers. But it also has a few problems that would block its adoption with massive domain hosting providers:

  • Very long AccessRule lists would slow down the web server, so this model scales poorly;
  • Editing configuration files is unworkable for hosting providers, except perhaps for static entries.

So, we shift our focus to a model with an external database that can be queried more efficiently, and that does not to search linearly through the AccessRule. Using the key derivation diagram of the previous post, replicated below, we can collapse quite a bit of information into one Type Key, and this could include a Database Secret which makes the entire action irreversible:

Key Schedule for ARPA2 Access Control

<VirtualHost ...>
   DocumentRoot ...
   ServerName photo.example.com
   <Location /john/bugs>
      AuthType SASL
      Require valid user
      AccessTypeKey ebf036285570769e0359b24cdceb7dd56c1558436ec7ceffa9f251e8916ae6bf
      AccessName "hdd/photo"

This setup is static for any given Access Domain, Access Type and of course the database secret. The only variable part that remains is the Access Name, which happens to be static in this particular example, but which may in general vary.

The Access Name is a formatted string, and when setup as a regular expression it may extract information from the current query, including its path and query parameters. Not explicitly visible but still dynamic is the authenticated value in %REMOTE_USER that would be incorporated into the ACL process.

We have now arrived at the other perspective on the ACL; we shift our focus from columns to rows of the matrix of cells!

Remote-Centric view on the ACL

Consider once more the diagram at the top of this posting. The remote selector goes through an upward iteration process, and for each of the strings found we land on a different row of the matrix with each a different selection of cells.

The information to select the right column is also present, so we have the actual cell coordinate in the matrix diagram. Now all we need to do is to look for it efficiently (not linearly). We can look in a simple key-value database for each iteration output string. The most concrete one wins, so we start at the Remote Selector at the bottom of the left column and work our way up until we find a hit. The cases for @. therefore serve as default settings.

This is how the second Apache webserver configuration above works. It uses the AccessTypeKey with the AccessName to derive most of the lookup key, and then extends it one by one with the Remote Selector obtained through iteration.

The contents of the cells are simpler than those in LDAP,

accessRule: ^log %R ~@example.com %CRWD ~john@example.com

The model for the key-value database turns this inside-out and delivers separate key-value mappings:

( ebf0...bf, "hdd/photo",     "@example.com" ) --> "^log %R"
( ebf0...bf, "hdd/photo", "john@example.com" ) --> "%CRWD"

If we start looking for john@example.com we would hit the rule for %CRWD immediately. When mary@example.com takes a look she would fail but retrying with @example.com should would find ^log %R for read access and being logged. She might run into some beautiful Coccinellidae magnifica images, but she won't be able to squash them (the images, of course).

The distribution of long list simplifies the Access Rule, and they could be found in log(N) steps. There are a few nitty-gritty optimisations to get even more out of this scheme, but these may be a topic for a separate post.

Go Top