Identity 12: User Names in URLs

// under web architecture identity standards

We weigh in heavily on the idea of Bring Your Own IDentity. How does this relate to URLs that we see in so many protocols? And especially the HTTP and AMQP URLs?

In the InternetWide Architecture, we have many forms of identity with tight relations to service end users' control over their online presence. After tending to privacy during realm crossover, we want you to Bring Your Own IDentity (BYOID) so that you need not setup local user names and passwords with every service you might want to use.

A long time ago, we accessed "secure" portions of websites with a user name and password, as in

http://john:sekreet@www.example.com

These habits have died out. Passwords in URLs are considered a bad idea, but the use of accounts has been dropped along with it, making HTTP a rare protocols with non-standard notations in the path of a URL to encode users,

http://www.example.com/~john/his/path
http://www.example.com/#!/john/his/path
http://www.example.com/his/path?user=john
http://www.example.com/.well-known/webfinger?resource=acct%3Ajohn%40example.com

These being local standards for the visited domain, they are not as good as the proper use

http://john@example.com/his/path

So, is this last form such a bad idea, or is it just the use of passwords? It seems to be, as non-standard naming mechanisms cannot possibly be integrated with authentication in the browser, leading us to enter passwords in websites and surrendering it to the application layer where it is tightly knit with (adverse) advertisements.

What do the standards say exactly?

Clarity from RFC 3986

The formal definition of the URI syntax is RFC 3986 and it is quite clear about user names which it considers part of the authority section of the URI:

The userinfo subcomponent may consist of a user name and, optionally,
scheme-specific information about how to gain authorization to access
the resource.  The user information, if present, is followed by a
commercial at-sign ("@") that delimits it from the host.

In other words, the user name is meant to guide the visitor to the information about the user; it is authoritative in that it determines the origin of the information being presented. The common feeling that a user name would be related to a login name of a website is not what the specification say!

Many URI schemes include a hierarchical element for a naming
authority so that governance of the name space defined by the
remainder of the URI is delegated to that authority (which may, in
turn, delegate it further).  The generic syntax provides a common
means for distinguishing an authority based on a registered name or
server address, along with optional port and user information.

Just like the host name is part of the authority, dictating where the information comes from, the optional user name is such a thing too. This is just what those non-standard representations of user names want to do! They are not about login of that user, but about the scoping of the data presented. The login process is an independent concern.

An important implication of this is that the user name must never be stripped from the URI. When it is treated as a login, this might be done, but that would remove part of the authority, and thus represent potentially different data. We do see vastly different reactions in browsers, indeed, and most of them are plain wrong.

Even when user names ought to be rendered, the advise is to do this in a way that distinguishes them from a domain name through something like a separete colour, or in RFC terms:

Applications that render a URI for the sake of user feedback, such as
in graphical hypertext browsing, should render userinfo in a way that
is distinguished from the rest of a URI, when feasible.  Such
rendering will assist the user in cases where the userinfo has been
misleadingly crafted to look like a trusted domain name
(Section 7.6).

It is absolutely no problem to talk to a server with a user name but without ever logging in. This is actually a service to privacy, as long as the user name has not been guessed. It allows you to have a server up without showing all its contents to anyone, robot or human. Leave the part of a website without user name to robots, but do not invite them in the parts with a user name.

A problem for HTTP is how to deliver a user name. This is not trivial, but it can be done with a simple use of Basic authentication: simply sending it without a password can only be interpreted as a signal that no password is provided (the authentication content would be username, colon, and a gaping hole where the password would otherwise be). HTTP servers can simply take in this user name and require no password or other form of authentication, until the user happens to hit upon a protected part of the data; at that time, the usual 401/407 exchange would be started, but even then the Basic header could be provided.

In general, authentication should take place in a place away from the JavaScript application, to improve security as well as to empower the browser to take control and help with automation. Authentication should be part of the HTTP protocol or even run below it, in TLS. No amount of reasoning about JavaScript code guiding the user can make up for what they are consistently guiding with, the use of passwords. And as a result of this crummy mechanism, needing to enter it over and over again. And forgetting it. Doing badly at picking them. Keeping them static for longer than is wise.

Stuffit, those Password Requirements

One reason that JavaScript "needs" to interact with the password is to "help" the user choose a "strong" one. You know, add a digit, mix capitals and lowercase, that sort of wonnabe-security. The official viewpoint is that this sort of oppression is a bad idea, even from a security standpoint. One more reason for access to password from the JavaScript/advertising space gone.

Verifiers SHOULD NOT impose other composition rules (e.g.,
requiring mixtures of different character types or prohibiting
consecutively repeated characters) for memorized secrets.
Verifiers SHOULD NOT require memorized secrets to be changed
arbitrarily (e.g., periodically). However, verifiers SHALL
force a change if there is evidence of compromise of the
authenticator.

Leave it to standards authors to make up their own terminology for passwords; but otherwise this should be crystal clear advise. It comes from NIST, by the way, and so this is the official security advise for the American government.

Why, you wonder? It's simple. If you must use a password, then use a generated one. This will use characters from a limited set, but with a lot of entropy, which is the bottom line of a good password. Entropy, explained informally, is a measure of surprise. Do not make up important passwords yourself and hope to remember them; use a password tool or your browser's builtin facilities, even if that involves makes it scrape fields from HTML content when the passwords are entered into layers higher than HTTP.

Strong crypto is still better; the exchange will be different each time, thus disabling the replay of captured traffic. TLS is only a limited protection in this respect, for a number of reasons.

Passwords and Colons in URIs

So what about URIs that contain a user name and password separated by a colon? This form is firmly abolished in the same RFC:

Use of the format "user:password" in the userinfo field is
deprecated.  Applications should not render as clear text any data
after the first colon (":") character found within a userinfo
subcomponent unless the data after the colon is the empty string
(indicating no password).  Applications may choose to ignore or
reject such data when it is received as part of a reference and
should reject the storage of such data in unencrypted form.  The
passing of authentication information in clear text has proven to be
a security risk in almost every case where it has been used.

Or, passwords in URIs are a bad idea, and things following a colon are considered to be that, and are to be handled carefully.

If we did anything here, we would indeed use it for an authorisation identity, which means stepping down in the inheritance hierarchy but otherwise remaining the same person. But we will probably try to do this on the client, in order to control the visibility of identities -- better done before realm crossover than after.

Users in URIs for AMQP

AMQP 1.0 is a messaging protocol. It is a bit like email, but geared towards automation and high volumes of data that need to be delivered reliably.

AMQP URIs follow the general form, and can include a user name as well. See how they are worked out for RabbitMQ to get an idea.

Again, the user name is in the authoritive part of the URI, so it determines what is visible in the path. Interestingly, URIs can be used at both end points, so we can have two users, each at their own domain, talk to each other. This is very much like the pattern used in email. Compare this to HTTP, where the only URI is on the server side; it is no wonder that the idea of the client account got woven into the one URI that existed, but it should be read as an authoritative section; this is not incompatible with the uses of today, but it feels different. Standardising access to user information is good to do in a standard manner though!

Given the two end point URLs in AMQP, we can actually make two parties talk openly. So what is the relation between that form and authentication and authorisation?

Authentication in AMQP is the customary combination of TLS and SASL. This is a powerful combination for client-server protocols, where the server is authenticated through TLS before the client authenticates through SASL. AMQP 1.0 is a peer-to-peer protocol, but individual connections may be considered client-to-server. The trick is to realise that it is the remote party's URI that needs to be validated, not the local one. We generally consider an authenticated remote domain name to be sufficient grounds to trust all user names under that domain name too; but we could ask for SASL authentication.

Note that this is what BYOID boils down to: the client indicates its user@domain.name identity as represented locally, and authenticates it to the remote system. This is why realm crossover is an intrinsic part of BYOID.

The client approaches another user@domain.name identity on the server, and this is the authority part: it determines what view on the server's resource we would like to have. In other words, for AMQP it indicates where we would like to deliver our message. That may include virtual hosts, queues but also, importantly, the targeted user.

We have designed our ACL system with two kinds of access; resources and communication. The latter applies to AMQP, where two users in each their own realms want to communicate. Whether the remote user is welcomed is decided by the ACL, for which we were able to find a surprisingly efficient implementation based on just a few lookups in a key-value database.

URIs for IRC

Internet Relay Chat predates HTTP, and also the idea of URIs was developed later. IRC is still popular and under active development because of its culture, where it is not assumed that people will respond immediately. This makes it the more relaxed alternative to present-day use of instant messaging.

Connections to IRC are made to a host, not a user. On the host, one chooses an identity (or "nick") and joins in discussion groups. There is a trend towards user authentication so the nick consistently reflects the same user. IRC used to be based on simple fixed passwords, but proper SASL exchanges are a useful addition in present-day IRC, including EXTERNAL (to reference TLS-based login) and with some attention for GSSAPI (to enable Kerberos5), which define two powerful cryptographic means; for use with passwords, SCRAM is an excellent suggestion.

So what would it mean to have user names in IRC URIs? As defined for URIs in general, such a user name is part of the authority section, and defines a name space. So the suggested use is to constain the discussions in IRC that would be visible. Indeed, IRC URIs tend to look like

irc://example.com/topic

and when presented like

irc://john@example.com/topic

they would not lookup the topic under example.com but under john@example.com -- meaning that user john could present his own selection of acceptable topic names. These might overlap with the ones for the host example.com or not -- as he sees fit. Yes, we are talking about a personal chat server, or at least a personally decorated chat service! User john might even be in for chat only over this URI.

But a much more promising situation now arises. The URI indicates what user is being addressed, and this information is available before using the IRC command to initiate TLS. Could it be possible to setup the TLS exchange directly with this user, in a peer-to-peer fashion? This would make IRC into a secure protocol for peer-to-peer chat, useful for such things as exchanging passwords -- though nobody would use that, if even an aged protocol like IRC can recognise that passwords are a relic of last century.

Why is HTTP different from AMQP and IRC?

HTTP is different from AMQP and IRC in one vital way: AMQP and IRC are communication protocols and HTTP is a resource access protocol. This means that the end points in AMQP are really different users, while in HTTP there is a user who wants to access resources that may or may not be available to him. There is no notion of a user on the HTTP server, except perhaps to mirror the requesting user.

Even group resources, such as a shared object store or web-based groupware are all modelled as a resource in HTTP, not as a user or group as it would be in AMQP.

So, it is not HTTP that is so different; it is the different notion of access control involved here that has caused the misinterpretation of the user part in these URIs. It's probably time we got our act together, and took it for what it really is: a part of the authority section of the HTTP URI.

In the InternetWide Architecture, we are quite strict about our use of identity. In the case of HTTP, the identity of the user involved in the session is a property attached to the client, not the server. This is why a URI http://john@example.com/path is a good way of scoping information (presumably about a user named john) but not in any way related to authorisation or access control. Access control must be a matter of client identity.

We have proposed an upgrade to HTTP by adding general SASL mechanisms by defining HTTP SASL and allowing authentication schemes to mature for all protocols at the same time, and automatically adding their value to HTTP. This would abolish the "guidance" of password entry in JavaScript, and instead update HTTP to a level that IRC has already reached, along with a plethora of other protocols.

Go Top

Rick van Rein