Rick van Rein
Published

Sun 16 February 2020

←Home

Reservoir: Your Data reaching out to You

Many of us rely on "cloud" storage services. They enable us to access the same files from everywhere, but they are fairly dumb and leave all the thinking to people. ARPA2 Reservoir is different, in that it supports metadata, automation and integration in your tools.

This is a story to explain the work that we are putting into ARPA2 Reservoir, a system for bulk data management that develops under the freedom-provoking mindset of the InternetWide Architecture.

Human Habits

On your computer, you have files that you can open in the right applications with some help of your operating system. You sort these files into folders, using a structure that makes sense to you. Unfortunately, the computer has more difficulty in making sense of these human habits, and you often end up wading through your folders, looking for that one file... you are so lacking in metadata that even a brute-force word search is helpful.

On the web, you can upload and download pictures and music and enjoy them within a browser environment. But not everything works as you like it to. Web environments were designed to access data resources, but they rarely do just that; they come with code that controls what you can do with the data, be it as JavaScript or in a mobile App. The general result of this is one-sided automation; it is completely automatic for the server side of things, but you may have to do a lot more manual work. You get to do it when you want, but that's about as far as client-friendliness goes.

Agile Automation

Reservoir can be used over a web interface, but it does not force you and offers additional channels to control your data to facilitate much better automation. These extra channels are based on standard protocols, and their more refined semantics make them more specific to an application, but also more informative to tools, and so friendlier to automation. The web is nice if you want to browse around; but it quickly grows into a nuisance if you have to.

A major part of data storage is its protection; not everyone should be able to access all data, but there is a lot of value in sharing, for instance within a group. Groups may include a set of collegues, a family or your choir. All this is expressed in our identity model.

Minimal Mechanics

At the mechanistic level, ARPA2 Reservoir is really simple. It orders data in three levels: (user at) domain, folder and perhaps a file. Or, as we call it: authority, collection and perhaps a resource,

//domain/collection/
//domain/collection/resource

This looks a bit like a web location, but it also looks like other locations. This is a bare form for URIs, the Internet's standard way of describing data/resources.

  • The domain is something like orvelte.nep to represent the data owner.
  • Collection is like a folder; it is a container for bits of data.
  • Resource is like a file; it is something you can upload or download.

The structure of this path is, along with any other URI, a gradual delegation of authority, or of the right to publish. The domain is the starting point of authority for Internet users, and is designed by a system that gradually delegates to the domain owner. The Collection than adds a delegation to a group of objects, and down to a specific Resource. Just like you should never look for your bank's name on a web search engine and click the first one to come along, you should not directly address a resource. Because yes, your bank would show up, but not, you are not certain that only your bank shows up in the search results; anyone could be found when they merely mention the bank's name!

Now, Collections and Resources look awful in this format, each being a UUID, so the form 15dbe45d-d636-41a9-94a5-0d16a8dbe09b -- they are not meant for human consumption. They are however useful: Long enough codes to avoid clashes and the form is nice and consistent, ideal for use in computer programs. Simplicity is key in all this. They provide relatively short links, yet these would be unique, so the benefits outweigh the ugly form -- for technical uses.

Lovely Looks

As a user, you will want to access your resources differently. You may want to have folders for companies and people named after domain names and email addresses, for instance. And below that you may want to split things into projects, or into separate years. But the URI format of Reservoir does not support arbitrary names, or even just two layers of Collection above a Resource!

The trick is that a Collection doubles as an Index, holding named references to other Collections. So when you want to proceed to photo/holiday/2014 from some initial Collection, the name photo is looked up in the index of that initial Collection, which yields another Collection. In that, you look for holiday to find a third; and in that you lookup 2014 to find the Collection you were after.

You have now located the Collection of Resources that you were looking for. Depending on your tooling it may have replaced your path with the standard format above, but that would be useful to share with others, who you might not necessarily want to show your path, especially if you are about to revise its structure.

Hospitable Homes

What you don't have yet, is the initial Collection for your search. You could bookmark it, but a nice starting point would be very useful.

For this reason, a Home Index is assigned to every domain and to every domain user. So, if the authority section is orvelte.nep you will access the home index with shared resources for the village's domain; in general this form can also accept the form bakker@orvelte.nep to locate the home index for the bakker user at the orvelte.nep domain.

Others might use this too; they might send information to an address like this, using a protocol like AMQP or, for small things like photos, email attachments. You would still receive the textual part of an email, but attachments might be stripped off and uploaded to the ARPA2 Reservoir for better integration with your browser. The email would replace the attachment with a link to the web location.

Apprehensive Applications

Useful as it may be to have a Home Index, it gets cluttered if all kinds of information are piled up in it. You probably want to keep your music and photos separate "views" on your data, and although it is important that a backup tool uploads their regular batches you are less likely to want to see it between your letters.

This is where an apphint, short for application hint, comes into play. Various types of application can use this to indicate a preference for a separate storage area. There are plenty of general-purpose names to use: music, photo, movie, book, article, contact, agenda, shopping, backup, ...

Such an apphint is sent by applications when they access your Home Index. Thans lands them another Collection that the default/unnnamed one. You can link in this other one as part of your Home Index, or anywhere underneath, or only use it over tools.

Note how it is meaningful to have apphint words for kinds of data, rather than for applications per se. There is no reason why multiple applications could not access the same music set, if they address different sides to it -- playing, tagging, musical analysis, track editing, and so on.

Luring Links

Thanks to the apphint mechanism, it is possible to create interfaces that focus on a part of your data. Access control may limit access to the data so these interfaces do not gain access to anything else; they simply get their own private view on your online presence and need not have any relation with what others see or what you do.

Given that services are available to you, how do you know where they are? The trick is to define those links clearly. We do this for a given domain authority, like orvelte.nep, with links in their top descriptive node. This covers just the //authority part, not /collection/ and certainly not /collection/resource, so the definition is for all to enjoy.

A good example might be https://music.orvelte.nep/superblast/player?apphint=music which would be a basic link to address a music server. If so desired, the /collection/ or /collection/resource could be added (in this URI it would go before the ? for technical reasons) and the server, recognising the host name music.orvelte.nep, would know to address the Reservoir for orvelte.nep and its local setup makes it recognise any added Collection and Resource references.

In this example, the ?apphint=music is likely interpreted by this service to mean that this application references your collection with the apphint set to music -- but that is something considered while this link was being added, and not a concern to its users because it is merely a suggestive local convention of the service. For using it, just reproduce and optionally insert a targeted Collection and perhaps a Resource and open it in any suitable application -- click and go!

Smart Security

Although Reservoir is about sharing data, it is also recultant to do this to anyone. Collections have an Access Control List defined on them, so only designated parties may access the data. This may include you, a group you are a member of, your friends or customers in other domains, and even applications that send an apphint that was setup for them.

Users who first start using an apphint will silently clone the access control setup from their domain, which presumably was put in place by their administrator. This ensures that the apphint mechanism automatically restricts access to the intended service.

Individual Resources do not have their own access rights; that would be confusing to most of us, and tedious to the rest. The concept is really simple, that you can create a separate Collection to group different access patterns.

Since access is determined for the //domain/colluuid/ prefix form alone, the path that leads to it is not shared as part of the canonical URI, nor is the user name. This is helpful by not providing information that is not strictly required. Imagine Reservoir as a mechanism for exchanging sales information; the other party does not see your username, and so cannot bother you over email.

Magical Metadata

All this, and we did not even explain the vital improvement of metadata. The data itself is boring to store on any disk, but metadata describing it can be very useful in many applications, for instance to present titles, sort out older information, add a lock item if someone has placed it under one, and so on.

Metadata is stored in LDAP, a directory that can be queried through a standard algorithm. If you want manual labour, you can stick to HTTP, which we happily supply with a wrapper for metadata. But if you want true automation, where you can run an agent to do things for you, you should use LDAP instead. It assigns so much more meaning to the literal bytes exchanged that it can be automatically processed with a rather deep understanding of what the various words and fields mean; no need for a user's gazing eyeballs and manual mousing.

You no longer have to plough through your files and open any that looks like it might be the one you misplaced. Instead, you can search for titles, descriptions, references, cross-links with other documents, document identifiers and so on.

Pungent Protocols

HTTP and LDAP need not be the only protocols involved in ARPA2 Reservoir, and indeed they are not.

Attachments may be uploaded over LMTP, for instance. And between domains operated by different parties, we use AMQP 1.0, which is ultimately suited for distributing automation-friendly formats. It is a bit like email, except that it bypasses the human eyeballs and instead is subjected to access control, possibly to spam/virus scanning and then on to automated processing.

And there might be possible uses for XMPP chat, SIP telephony, BEEP peering, SFTP for file transfers, RMT for remote "tape" backups, and so on. There are so many protocols that can serve you and make your life more automated than mere HTTP that it could leave you dizzy. ARPA2 Reservoir is such a different approach to data storage that it can make them all flourish.

You should expect protocol-specific URIs to look similar to the //authority/collection/resource format, and possibly support the human-friendly paths to get to data as well. The format is like a URI for good reason; often, you only need to stick a protocol and a colon in front.

The idea is that all protocols are named in the home index for the domain. A few notes about that:

  • Links with descriptions (URI-space-text format) are used to present to used; the addition of /collection/ or /collection/resource can be easily automated.
  • Links without descriptions (URI only) are used for automation purposes. In those, the scheme (such as https: or ldap:) will be a useful selection criterium.

Protocols may also be mentioned in DNS, using SRV records. This has not yet materialised.

Ushering Users

Most of these wonderful protocols have a notion of users, but not all of them do. For HTTP, there is no standard manner of identifying users. We specified a User header to do just that. This is not related to authentication, as that is about client-side users; it is an indication of the user we want to reach on the server, so in this case, the home index of the Reservoir.

Not all tools support this User extension to HTTP yet. Until they do, you can work around it by using the ugly alternative format that represents the user's UUID in a longer string. Did we already mention that HTTP is not optimal in saving you work? Or, your administrator may setup links to your Collection in the home index for the domain, and you need to click once more than under better circumstances (because this is one of those non-standard things that require your eyeballs and manual intervention).

Similar things apply to the Collection and Resource values; protocols that cannot express the hierarchy of Reservoir would be receiving additional headers. Think of SMTP and SIP as examples of this; both have a URI format that can easily be extended with headers. We have not tried to define those yet.

Go Top