We weigh in heavily on the idea of Bring Your Own IDentity.
How does this relate to URLs that we see in so many protocols?
And especially the HTTP and AMQP URLs?
We weigh in heavily on the idea of Bring Your Own IDentity. How does this relate to URLs that we see in so many protocols? And especially the HTTP and AMQP URLs?
In the InternetWide Architecture, we have many forms of identity with tight relations to service end users' control over their online presence. After tending to privacy during realm crossover, we want you to Bring Your Own IDentity (BYOID) so that you need not setup local user names and passwords with every service you might want to use.
A long time ago, we accessed "secure" portions of websites with a user name and password, as in
These habits have died out. Passwords in URLs are considered a bad idea, but the use of accounts has been dropped along with it, making HTTP a rare protocols with non-standard notations in the path of a URL to encode users,
http://www.example.com/~john/his/path http://www.example.com/#!/john/his/path http://www.example.com/his/path?user=john http://www.example.com/.well-known/webfinger?resource=acct%3Ajohn%40example.com
These being local standards for the visited domain, they are not as good as the proper use
So, is this last form such a bad idea, or is it just the use of passwords? It seems to be, as non-standard naming mechanisms cannot possibly be integrated with authentication in the browser, leading us to enter passwords in websites and surrendering it to the application layer where it is tightly knit with (adverse) advertisements.
What do the standards say exactly?
Clarity from RFC 3986
The userinfo subcomponent may consist of a user name and, optionally, scheme-specific information about how to gain authorization to access the resource. The user information, if present, is followed by a commercial at-sign ("@") that delimits it from the host.
In other words, the user name is meant to guide the visitor to the information about the user; it is authoritative in that it determines the origin of the information being presented. The common feeling that a user name would be related to a login name of a website is not what the specification say!
Many URI schemes include a hierarchical element for a naming authority so that governance of the name space defined by the remainder of the URI is delegated to that authority (which may, in turn, delegate it further). The generic syntax provides a common means for distinguishing an authority based on a registered name or server address, along with optional port and user information.
Just like the host name is part of the authority, dictating where the information comes from, the optional user name is such a thing too. This is just what those non-standard representations of user names want to do! They are not about login of that user, but about the scoping of the data presented. The login process is an independent concern.
An important implication of this is that the user name must never be stripped from the URI. When it is treated as a login, this might be done, but that would remove part of the authority, and thus represent potentially different data. We do see vastly different reactions in browsers, indeed, and most of them are plain wrong.
Even when user names ought to be rendered, the advise is to do this in a way that distinguishes them from a domain name through something like a separete colour, or in RFC terms:
Applications that render a URI for the sake of user feedback, such as in graphical hypertext browsing, should render userinfo in a way that is distinguished from the rest of a URI, when feasible. Such rendering will assist the user in cases where the userinfo has been misleadingly crafted to look like a trusted domain name (Section 7.6).
It is absolutely no problem to talk to a server with a user name but without ever logging in. This is actually a service to privacy, as long as the user name has not been guessed. It allows you to have a server up without showing all its contents to anyone, robot or human. Leave the part of a website without user name to robots, but do not invite them in the parts with a user name.
A problem for HTTP is how to deliver a user name. This is not trivial, but
it can be done with a simple use of
Basic authentication: simply sending
it without a password can only be interpreted as a signal that no password
is provided (the authentication content would be username, colon, and a
gaping hole where the password would otherwise be). HTTP
servers can simply take in this user name and require no password or other
form of authentication, until the user happens to hit upon a protected part
of the data; at that time, the usual 401/407 exchange would be started,
but even then the
Basic header could be provided.
Stuffit, those Password Requirements
Verifiers SHOULD NOT impose other composition rules (e.g., requiring mixtures of different character types or prohibiting consecutively repeated characters) for memorized secrets. Verifiers SHOULD NOT require memorized secrets to be changed arbitrarily (e.g., periodically). However, verifiers SHALL force a change if there is evidence of compromise of the authenticator.
Leave it to standards authors to make up their own terminology for passwords; but otherwise this should be crystal clear advise. It comes from NIST, by the way, and so this is the official security advise for the American government.
Why, you wonder? It's simple. If you must use a password, then use a generated one. This will use characters from a limited set, but with a lot of entropy, which is the bottom line of a good password. Entropy, explained informally, is a measure of surprise. Do not make up important passwords yourself and hope to remember them; use a password tool or your browser's builtin facilities, even if that involves makes it scrape fields from HTML content when the passwords are entered into layers higher than HTTP.
Strong crypto is still better; the exchange will be different each time, thus disabling the replay of captured traffic. TLS is only a limited protection in this respect, for a number of reasons.
Passwords and Colons in URIs
So what about URIs that contain a user name and password separated by a colon? This form is firmly abolished in the same RFC:
Use of the format "user:password" in the userinfo field is deprecated. Applications should not render as clear text any data after the first colon (":") character found within a userinfo subcomponent unless the data after the colon is the empty string (indicating no password). Applications may choose to ignore or reject such data when it is received as part of a reference and should reject the storage of such data in unencrypted form. The passing of authentication information in clear text has proven to be a security risk in almost every case where it has been used.
Or, passwords in URIs are a bad idea, and things following a colon are considered to be that, and are to be handled carefully.
If we did anything here, we would indeed use it for an authorisation identity, which means stepping down in the inheritance hierarchy but otherwise remaining the same person. But we will probably try to do this on the client, in order to control the visibility of identities -- better done before realm crossover than after.
Users in URIs for AMQP
AMQP 1.0 is a messaging protocol. It is a bit like email, but geared towards automation and high volumes of data that need to be delivered reliably.
AMQP URIs follow the general form, and can include a user name as well. See how they are worked out for RabbitMQ to get an idea.
Again, the user name is in the authoritive part of the URI, so it determines what is visible in the path. Interestingly, URIs can be used at both end points, so we can have two users, each at their own domain, talk to each other. This is very much like the pattern used in email. Compare this to HTTP, where the only URI is on the server side; it is no wonder that the idea of the client account got woven into the one URI that existed, but it should be read as an authoritative section; this is not incompatible with the uses of today, but it feels different. Standardising access to user information is good to do in a standard manner though!
Given the two end point URLs in AMQP, we can actually make two parties talk openly. So what is the relation between that form and authentication and authorisation?
Authentication in AMQP is the customary combination of TLS and SASL. This is a powerful combination for client-server protocols, where the server is authenticated through TLS before the client authenticates through SASL. AMQP 1.0 is a peer-to-peer protocol, but individual connections may be considered client-to-server. The trick is to realise that it is the remote party's URI that needs to be validated, not the local one. We generally consider an authenticated remote domain name to be sufficient grounds to trust all user names under that domain name too; but we could ask for SASL authentication.
Note that this is what BYOID boils down to: the client indicates
email@example.com identity as represented locally, and
authenticates it to the remote system. This is why realm
crossover is an intrinsic part of BYOID.
The client approaches another
firstname.lastname@example.org identity on
the server, and this is the authority part: it determines what
view on the server's resource we would like to have. In other
words, for AMQP it indicates where we would like to deliver
our message. That may include virtual hosts, queues but also,
importantly, the targeted user.
We have designed our ACL system with two kinds of access; resources and communication. The latter applies to AMQP, where two users in each their own realms want to communicate. Whether the remote user is welcomed is decided by the ACL, for which we were able to find a surprisingly efficient implementation based on just a few lookups in a key-value database.
URIs for IRC
Internet Relay Chat predates HTTP, and also the idea of URIs was developed later. IRC is still popular and under active development because of its culture, where it is not assumed that people will respond immediately. This makes it the more relaxed alternative to present-day use of instant messaging.
Connections to IRC are made to a host, not a user. On the host, one chooses an identity (or "nick") and joins in discussion groups. There is a trend towards user authentication so the nick consistently reflects the same user. IRC used to be based on simple fixed passwords, but proper SASL exchanges are a useful addition in present-day IRC, including EXTERNAL (to reference TLS-based login) and with some attention for GSSAPI (to enable Kerberos5), which define two powerful cryptographic means; for use with passwords, SCRAM is an excellent suggestion.
So what would it mean to have user names in IRC URIs? As defined for URIs in general, such a user name is part of the authority section, and defines a name space. So the suggested use is to constain the discussions in IRC that would be visible. Indeed, IRC URIs tend to look like
and when presented like
they would not lookup the
email@example.com -- meaning that user
john could present his own selection of acceptable
topic names. These might overlap with the ones
for the host
example.com or not -- as he sees fit.
Yes, we are talking about a personal chat server,
or at least a personally decorated chat service!
john might even be in for chat only over this
But a much more promising situation now arises. The URI indicates what user is being addressed, and this information is available before using the IRC command to initiate TLS. Could it be possible to setup the TLS exchange directly with this user, in a peer-to-peer fashion? This would make IRC into a secure protocol for peer-to-peer chat, useful for such things as exchanging passwords -- though nobody would use that, if even an aged protocol like IRC can recognise that passwords are a relic of last century.
Why is HTTP different from AMQP and IRC?
HTTP is different from AMQP and IRC in one vital way: AMQP and IRC are communication protocols and HTTP is a resource access protocol. This means that the end points in AMQP are really different users, while in HTTP there is a user who wants to access resources that may or may not be available to him. There is no notion of a user on the HTTP server, except perhaps to mirror the requesting user.
Even group resources, such as a shared object store or web-based groupware are all modelled as a resource in HTTP, not as a user or group as it would be in AMQP.
So, it is not HTTP that is so different; it is the different notion of access control involved here that has caused the misinterpretation of the user part in these URIs. It's probably time we got our act together, and took it for what it really is: a part of the authority section of the HTTP URI.
In the InternetWide Architecture, we are quite strict
about our use of identity. In the case of HTTP, the
identity of the user involved in the session is a
property attached to the client, not the server.
This is why a URI
a good way of scoping information (presumably about
a user named
john) but not in any way related to
authorisation or access control. Access control must
be a matter of client identity.