Rick van Rein
Published

ma 14 augustus 2017

←Home

Comparing the privacy of HTTP and LDAP

In several places of the InternetWide Architecture, we use LDAP as our data protocol — because it is the most refined standard protocol for digging around in data. What we haven't yet discussed is how its privacy compares to, say, HTTP.

Most development these days focusses on HTTP, for which we already argued that it has deliberately poor semantics which is perfectly suitable for passing around documents as a whole, but that at the same time makes it less suitable for the data fragments of an average interactive website. The ARPA2 projects assume that domains publish credentials such as public keys in the dedicated data protocol LDAP, which is much better suited for the detailed inquiries for which it was designed and standardised. We now add a new reason, and that is one of privacy.

When we mention privacy in this article, we are talking about the client's privacy, so the aspect that is so obviously trodded on by many HTTP applications. We shall see that LDAP raises no concerns to this end. (And, just as a quick note, LDAP allows the server full control over the privacy of its data.)

Ingredients of Privacy Assaults

To be able to attack someone's privacy, a profile must be made, consisting of minute actions. Given this profile, all those seemingly unrelated bits of data can be related; if not on their own then by comparing with others to learn what sorts of interests cluster together. This is done with statistics, which means that the overlap between peers does not have to be perfect, but trends can be detected. Then, based on what your profile is missing relative to comparable ones, you can be adv(ert)ised to look at something new.

This is an automatic process. No human eyeballs are involved, or even needed. And, as a result you can expect downright blunt associations to be made for you. We all know the story of the girl who was presented baby-driven ads in a store before she even knew she was pregnant — much to the horror of her mother who joined her. A very embarrassing situation. And we may not all shop in such shouty stores that display advertisements based on your profile, but we all share our browser environments to friends who want to "quickly search for something". Imagine the embarrassment of them seeing a pattern in your ads that covers an interest that you don't care to share?

Advertisements are not alone. Police states have a habit of confiscating the information and using it to find a culprit. In doing so, they require access to masses of information, including the innocent majority. It is available, so hey, why not use it? And if citizens complain, there is always the marketing approach of pointing at an offense that nobody approves of (child abuse is a common one) and continue to follow these techniques for many other, and often far less agreeable purposes. As an example of what we are heading for: a photo of you with a glass of beer, not necessarily taken by you, matched with car tracking records that show you driving later at night. That's easier too, so why abandon that option if we also allow it to find heavier perpetrators?

What is Privacy, exactly?

Most people consider privacy as a black-or-white matter. In fact, there are at least three flavours of how you want your data dealt with:

  • Pertinent Privacy is where nobody knows a thing about you.
  • Need-to-know is where information is shared cautiously, and only when the need for the knowledge is understood.
  • Total Transparency is where everyone knows everything about you.

Most people see Pertinent Privacy as an impossible model on today's Internet, and accept the exact opposite approach of Total Transparency. Anyone who complains about privacy is considered to aim for Pertinent Privacy, and seen as out-of-this-world idealists. The reality is that the spreading of information on a Need-to-know basis is what we can all aim for, and achieve. In fact, privacy laws are written that way; they balance between what information must not pass hands and exceptions that are legally permitted. Usually, the permits include anything under consent.

Given the sketch of Privacy Assaults in the previous section, it should be clear that the market is pulling to one extreme, namely that of Total Transparency, while users who are given a choice would opt for Pertinent Privacy. It is vital about any Privacy debate that this is like a contest of rope-pulling, and that the only viable answer lies in the middle, with the Need-to-know. This is why you are asked for your consent in matters of privacy.

It is good to realise that these are actual choices you can consider. You do not need to agree to things you don't like, because there are always others offering the same, or other sites announcing the same information, but under better conditions. You get to choose, and this is precisely how the market mechanism works; vendors get the choice of what they offer and under what conditions, but as a customer you get to choose from the many offerings. When you make selections on the way you are being dealt with part of your online behaviour, then you are helping the Internet mature, simply by selecting those vendors that you agree with.

A few simple starters that may help are browser plugins; the next section gives you the reasons for wanting them:

  • Privacy Plugins will remove content from pages that assault your behaviour
  • Ad Blockers stop advertisements from being downloaded and shown
  • Terms Summaries can help you to quickly skim through the conditions for a site; you can use this and maybe add an occasional site to the database
  • Browsers made by large-scale ad-sellers raise questions

How HTTP can Attack your Privacy

To build a profile, one needs two ingredients, namely bits and pieces of data (or terms used to search for them) and some way to identify you in a (more or less) unique manner. The pieces of data cannot be avoided, because that is usually what the entire exchange is about. But the identity is the place where you have some control.

Such identities can sit in many places:

  • URLs may have resource paths that include long codes, often including your identity.
  • Cookies may be set by a site, and your browser spoons them out when you return.
  • An account under which you logged in. This is why most Facebook content is invisible to outsiders (and closing networks is the clearest sign of having only the server side of interests at heart).
  • Browsers send a lot of information on their version, platform and so on; taken together, this identifies most browsers uniquely. (Yes, that one surprised me too.)
  • JavaScript is free to do anything, and may use localStorage where cookies are banned.

The power of the web is that it combines resources from all over the place; the privacy downside of this model is that it can spill your identity all over the place as well. Advertising and tracing networks tend to serve many sites at the same time, and they can offer behavioural tracing to sites that adds are lured to those services ("321 unique users online"), but some of these networks can also combine your behaviour across the many sites that take out their services. Quite a powerful mechanism to collect profiles.

Ways of stripping these unwanted identities that trace you include:

  • Use a Privacy Plugin and Ad Blocker in your browser to strip known tracking facilities. Enjoy being part of their network and consider what you can do back for them.
  • Use your browser's privacy mode.
  • Have multiple browser profiles or separate accounts for separate uses.
  • Sites with popups that make the content unreadable are not just a nuisance, they also mostly have a motive of tracking you; is it an important site to you, or just another search result? If you don't like it, get out. Excessive profiling will hopefully teach them that they are a nuisance.

Some sites complain that they can't live without advertisements. One of the ideas behind InternetWide is that the cost of publishing information and even services online is so incredibly low that this should not be a true concern to anyone. We should however move to a model where services can earn money for what they offer. And in that respect we see many, many clever starters that do it in a lovely way; free entrance-level service paid for by those who require a higher standard of service for which they are willing to pay. In terms of individual online presence, we believe that owning your own domain name offers you your vital square on the block, and the ability to publish information comes along with it for free.

How LDAP does not Attack your Privacy

Compared to HTTP, the LDAP protocol is much more the work of a technician who just wanted a good representation of data queries across networks. It also is a lot more tight in its specifications.

You should not ask LDAP for a large document, but you can ask it for the data describing a person, or any other resource, with given attributes that are known. It will then search the database and come up with the matching objects.

The representation of an LDAP client is very, very boring; it does not state what make, brand or version it is, let alone the platform on which it runs. It just states what version of the generic protocol it uses, which has been 3 for over ten years now, and probably many years to come.

Attributes have values and even a type that may be varied, could we put an identity in there? Well, the values are usually needed for processing, so that would look odd. And their types come from an extendable base, but applications will only process the ones they recognise. Neither of these stand a chance of being reflected to the server, where they could help to build a profile of the user.

Locations of data are another matter. In many cases, these are considered to be readable to the user, but as in HTTP, that is no longer the case. This means that references may be traceable. What is not possible however, is to start a trace with an identity setup in a prior phase, so the only form of tracing possible is "will an application follow a link from A to B or leave it alone?" — pretty much the usage patterns that may be of benefit to operators of the data store, but it will be too fragmented to be of use to profile users. There certainly is no way of linking the various sessions that a user has with different sites, again with the one exception of references between them.

Access to LDAP can be subjected to authentication. And indeed, when this is done, the user's behaviour can be traced. This is in fact always true for any protocol; it needs to connect a query to an account and so to the potential of tracing behaviour; that is in general unstoppable.

This problem worsens when the same identity is used in many places, because then all of a sudden it is possible for these places to get together, inasfar as permitted by law, and combine information from various sources. This would have been a weak point for our Bring Your Own IDentity policy, where the client determines an identity to be used on any service they approach; but this is why we also introduced pseudonyms and aliases and the automatic change of identities when crossing over realm boundaries.

Since profiles need to combine data to a form of user identity, it seems that LDAP is the winner from this perspective as well as from its much greater semantics; delving into LDAP for data avoids that this data can be lined up in a profile.

This can still be broken when HTTP references LDAP for data; it could generate paths to objects in LDAP that contain a unique identifier. At the risk of repeating ourselves, this is only possible when downloading complete applications in the form of JavaScript or an App, which is required when applications go beyond open protocols, as is the case with virtually all "web wrappers" for protocols that have independent stand-alone implementations that can do their specific task more aptly.

Conclusion

Once again, we learn that HTTP is a very potent tool due to its general nature, but that this also leads to problems. It is invariably better to rely on specialsed open protocols such as, in this case, LDAP as the only protocol for refined access to data stored remotely. A bit to our own surprise, it turns out that this is also true where privacy is a concern (which it ought to be in every situation where we cross over our realm boundaries).

Go Top