There are two ways of looking at Access Control. One is easy, with a
direct relation to the resources being managed. The other is advanced,
but like putty in the hands of administrators; moreover, it is highly
efficient. Efficiency matters; it allows us to enforce access control
everywhere, with no experienced discomfort. We derive the efficient
model from the one that is easy to use.
This article is part of a series on access control and is related to another series on identity.
In this article, a few examples of Access Control in the InternetWide Architecture will be explained. They will be part of the ARPA2 common support library that we develop alongside.
Access Control in Rows and Columns
In the above diagram, we have drawn yellow hexagonal cells, each expressing some access control information.
%R
is the Access Right of reading,%RW
for reading and writing, and%CRWD
adds creation and deletion rights.=aadmin
is a variable definition supplied along with the Access Rights. What variables mean depends on the Access Type (application area). Here, it might mean only when using anadmin
alias.^log
triggers an action in the evaluating software. It too is defined by the Access Type, and it may or may not be meaningful to the application evaluating access rights.
Variables and triggers are rather luxureous features to add, but they are
almost zero-cost and can greatly leverage expressiveness, so we added them.
The heart of the matter lies in the %rights
indications, of course.
The cells express the rights where the coordinates in the blue boxes on top and the green trapeziums on the left meet.
The blue boxes on the top are what we call the Access Name; in general
terms these are UTF-8 Strings with generic handling rules. The actual string
is specific to the Access Type being used; for document
access the string holds a volume/path
format; for communication it holds the
userid
of the local user.
The green trapeziums on the left are Remote Selectors; at the base, they start as a full-blown Remote Identity, but gradual abstraction steps turns them into a pattern that could match if the more specific forms did not. Access Control will start at the base and iterate upwards to try to gain access.
The fish is the last to discover water, because it is all around him. Similarly, this entire image is considered to apply to an local domain being accessed (the Access Domain), and to an Access Type (such as document access).
Resource-Centric view on the ACL
Access Control makes most sense in relation to resources, even if this does not yield the most scalable implementation. Under any object that we want to protect, we can iterate who may access it and who may not. In the diagram above, this is looking at one column at a time.
Let's write the hdd/photo
column in a computer-sciency ACL notation.
dn: ...,accessDomain=example.org,...
object: labeledURIObject
object: accessControl
accessType: 3c2291f6-fc11-3d83-9908-f79b2d2f4ced
accessName: hdd/photo
accessRule: ^log %R ~@example.com %CRWD ~john@example.com
cn: Collection of Photos on Bugs, Beetles and Beatles
labeledURI: https://photo.example.com/john/bugs
...
This defines two Access Rules consisting of a sequence of code
words. We use %
before rights, ^
before triggers and, not
shown here, =
to set any of 26 variables and #
to comment out
(or disable) a word.
This example notation defines a single object with all the information we need for Access Control:
- Access Domain from the
dn:
line - Access Type from the
accessType
line - Access Name from the
accessName
line - Access Rights and a few extra facilities in the
accessRule
line. - Remote Selectors at various levels in the
accessRule
line.
All this information is packed tightly together, and can be requested with a single query. This is the common approach for LDAP, of which this is an example. It works quite well, as long as not too many users approach any single object.
A problem with this model is that all the information is out in the open, making it is easy to iterate the data. This can be a problem to security and to privacy, and an interesting target for data theft. This makes the form less suitable for sharing with service providers, especially when these are third parties and certainly when these are large-scale operations.
Example Service using the ACL
Imagine a service like an Apache web server, configured with HTTP-SASL authentication and entry-level Access Control:
<VirtualHost ...>
DocumentRoot ...
ServerName photo.example.com
<Location /john/bugs>
AuthType SASL
Require valid user
...
AccessRule ^log %R ~@example.com %CRWD ~john@example.com
...
</Location>
...
</VirtualHost>
Note how simple this is. The Access Domain, Access Type and Access Name
are implicit in the webserver configuration, but they are there. But
they are not really used; Apache can just evaluate the AccessRule
directive against the authenticated %REMOTE_USER
variable.
This is lovely for personally run servers. But it also has a few problems that would block its adoption with massive domain hosting providers:
- Very long
AccessRule
lists would slow down the web server, so this model scales poorly; - Editing configuration files is unworkable for hosting providers, except perhaps for static entries.
So, we shift our focus to a model with an external database that can be queried more efficiently, and that does not to search linearly through the AccessRule. Using the key derivation diagram of the previous post, replicated below, we can collapse quite a bit of information into one Type Key, and this could include a Database Secret which makes the entire action irreversible:
<VirtualHost ...>
DocumentRoot ...
ServerName photo.example.com
<Location /john/bugs>
AuthType SASL
Require valid user
...
AccessTypeKey ebf036285570769e0359b24cdceb7dd56c1558436ec7ceffa9f251e8916ae6bf
AccessName "hdd/photo"
...
</Location>
...
</VirtualHost>
This setup is static for any given Access Domain, Access Type and of course the database secret. The only variable part that remains is the Access Name, which happens to be static in this particular example, but which may in general vary.
The Access Name is a formatted string, and when setup as a regular expression
it may extract information from the current query, including its path and query
parameters. Not explicitly visible but still dynamic is the authenticated
value in %REMOTE_USER
that would be incorporated into the ACL process.
We have now arrived at the other perspective on the ACL; we shift our focus from columns to rows of the matrix of cells!
Remote-Centric view on the ACL
Consider once more the diagram at the top of this posting. The remote selector goes through an upward iteration process, and for each of the strings found we land on a different row of the matrix with each a different selection of cells.
The information to select the right column is also present, so we have the
actual cell coordinate in the matrix diagram. Now all we need to do is to
look for it efficiently (not linearly). We can look in a simple key-value
database for each iteration output string.
The most concrete one wins, so we start at the Remote Selector at the
bottom of the left column and work our way up until we find a hit. The cases
for @.
therefore serve as default settings.
This is how the second Apache webserver configuration above works. It uses the AccessTypeKey with the AccessName to derive most of the lookup key, and then extends it one by one with the Remote Selector obtained through iteration.
The contents of the cells are simpler than those in LDAP,
accessRule: ^log %R ~@example.com %CRWD ~john@example.com
The model for the key-value database turns this inside-out and delivers separate key-value mappings:
( ebf0...bf, "hdd/photo", "@example.com" ) --> "^log %R"
( ebf0...bf, "hdd/photo", "john@example.com" ) --> "%CRWD"
If we start looking for john@example.com
we would hit the rule
for %CRWD
immediately. When mary@example.com
takes a look she
would fail but retrying with @example.com
should would find
^log %R
for read access and being logged. She might run into some
beautiful Coccinellidae magnifica images, but she won't be able to
squash them (the images, of course).
The distribution of long list simplifies the Access Rule, and they could be found in log(N) steps. There are a few nitty-gritty optimisations to get even more out of this scheme, but these may be a topic for a separate post.
Go Top