Rick van Rein
Published

zo 05 oktober 2014

←Home

Eye on SNMP 1: The Good, the Bad and the Ugly

No design for hosting infrastructure is complete without attention for monitoring. There are many such systems, but there is only one standardised system, namely SNMP.

This document is part of an article series on Monitoring.

As our understanding of our perceived hosting platform matures, we need to look into monitoring. Most existing solutions focus, naturally, on internal network monitoring. Since we aim to support even these facilities with cross-over between platforms, we need to look into a standard. And even though some solutions are commonplace (like Nagios), they generally do what the standard solution does, but with proprietary solutions. So we had to look into SNMP and analyse its properties.

The Good

SNMP is a standard. That is probably the most attractive property it has; RFC's have been written and packets standardised. This means that SNMP is the one product that can be plugged into any network, or made to crossover between networks. It is worth noting however, that monitoring generally follows the same model as SNMP, so the translation is not necessarily a difficult one. It is also common practice; routers and switches tend to use SNMP (and RMON) because their producers want to deliver their devices with monitoring-agnostic firmware.

SNMP is typed, which makes it highly informative. For every piece of information there is a specification that describes its interpretation. This is a rather big advantage over the more popular monitoring approaches, especially when sending monitoring information across realm boundaries. Unlike scripted plugins such as Nagios's, where textual information is provided along with a good/bad judgement, there is an option for much more refined monitoring.

An intruiging facility in SNMP is that it also permits setting of values over the protocol, and even to create instances in monitoring tables, which could have all sorts of side-effects. Imagine adding a hostname in your monitoring solution, and seeing it added to your DNS, webserver and so on. It is possible with SNMP, but mostly in theory. This is due to minimal attention for authentication. Still, if it was wrapped in a more secure protocol this might be a very interesting opportunity.

SNMP monitoring is an extensible system; it uses ASN.1 Object Identifiers to uniquely identify the so-called objects that can be monitored, and it is possible to define indexes; for instance, there is an OID for network interfaces, and a number can be added to iterate over multiple of them. Indeed, there is a facility called conceptual tables which presents information in such a format that it can be printed as tables, where rows reflect instances and columns reflect attributes, just like in the SQL database model.

The IETF has had a craze over SNMP for a while, but it stopped at some point; the aspects covered in RFCs roughly come down to networking and systems monitoring. There is a specification dealing with applications, but very little of the content of applications (mail queues, DNS zones, virtual web hosts, and so on) are supported by standard RFCs. This is often expanded by vendors who have created their own extensions.

In terms of SNMP, extensions define their structures in a MIB, which is an ASN.1 dialect that basically pins down OIDs and their use. The OIDs are the vital choice that makes SNMP extensible through these MIBs. IANA maintains a registry into which each private organisation can request an OID for their own use; for example, our ARPA2 project has been allocated 1.3.6.1.4.1.44469 so that any extension after that with dot-number pairs is our prerogative.

The tables that we mentioned are a bit special about SNMP, as they enable something very interesting, namely discovery of things to be monitored. First, if SNMP servers (the so-called agents) are run on their standard port, it is possible to scan the network for them. Second, within an SNMP agent it is possible to walk through the tree structure defined by the OIDs for objects that can be monitored. Or, based on the MIB for things to be monitored, it is possible to iterate over a table of things, such as a table of network interfaces. A smart monitoring system can do all of this automatically, when it is only provided with the MIBs to monitor.

Agents themselves follow a rather intelligent design; there is a master agent which communicates with remote peers, and which is the expert on the SNMP protocol; then there is a standard socket interface known as AgentX, to which local sub-agents can register to add their own detailed knowledge about applications.

Finally, a choice in the design of SNMP is that it polls with lossy packets. This means that an overfull network will not suffer under even more load due to SNMP, which is especially useful because that protocol may be more active on stressed networks, thus worsening the problem. Running SNMP over UDP therefore seems like the prudent choice for internal network monitoring.

The Bad

SNMP is particularly bad at authentication. It originally mentioned a word in plaintext by way of authentication; the SNMPv3 improves on this but has not been generally accepted as a result of the complications it brings. When monitoring on an internal network, this may not be a problem, and this is improved by firewalls, VLANs and VPNs. Still, it is an oversight in the original design that needs extra work to be remedied. In the ARPA2 project, we intend to experiment by wrapping SNMP frames into GSS-API frames, using the Kerberos5 mechanism to sustain encryption and mutual authentication.

Once authenticated, access control is a possibility. Lacking a practice of authentication, this facility is barely usable. Firewalls are probably the best bet that is available -- and probably the most efficient one too.

The lack of authentication and access control has another disadvantage; it means that the potential of using SNMP for remote control is underexploited. It has to be, since permitting control to anyone endangers network stability.

Although many predefined MIBs exist for SNMP, both standardised and from vendors, there is virtually nothing for application state monitoring. This is the realm of system administrators, and it is not surprising that the functionality has been implemented in scripts, accessed over protocols such as OpenSSH. This lack of MIBs at the application level is something that ARPA2 is likely to work on; when applications can summarise their internal state and push it out, or register functions to pull it out, they should have a valuable extension to their interface with their context; Net-SNMP agent establishes such an API, and provides example code that guides the way.

The lossy transmission medium of SNMP has is disadvantageous when transmitting the data across operational realms, that is, over the Internet. On this front, ARPA2 intends to experiment with SCTP as a transmission channel. SCTP is the protocol that can send frames out-of-order, but with reliable delivery.

Perhaps as a result of our interest in a reliable transport, we have been looking for mechanisms that limit traffic, by not polling over and over but instead use a subscription mechanism. Typically, this would require a few extra SNMP operations, which the master SNMP agent could handle on behalf of its sub-agents, who may only need to process GET and GETBULK requests. Similar, but not quite the same, we imagine that it would be useful to support a generic form of pro-active submission of data from an SNMP agent to a monitoring station; this is possible with the so-called traps of SNMP, but it has not been defined in a generic manner yet.

As part of our interest in subscription over reliable transports, we would appreciate a facility for a table column type "valid through" that gives the timestamp until which no further polling is necessary. This may be used in pro-active or passive responses to indicate that the monitoring system can relax for some time. Applications include DNS-related timing certainties: "rest assured that we will be serving this zone until time T at which the current SOA specifies removal of the zone".

Note that these potential extensions are specifically of interest to tables, and especially large ones; they enable a mode where only changes are transmitted, but nothing more.

The Ugly

SNMP is a binary format. This adds to its compactness, making messages less than 100 bytes on the wire, but it also means that we need non-trivial tools such as Net-SNMP and WireShark to decode them. (It might be argued that the hex dumps are quit readable, but that is not for everyone.)

Object Identifiers are ugly identifiers. The strings of dots and numbers are virtually meaningless to human users, even if they have the great advantage of gradual delegation of control. But most human users are not able to memorise them.

The formalisms that define the format of monitored objects are the MIBs. These are written in a macro-language defined on top of ASN.1. Where ASN.1 specifications for data formats are quite readable, this property is not inherited by MIBs. They tend to be difficult to read, making it a skill to actually use them in any other way than through tools.

All these aspects are generally covered in software, but in spite of the S in SNMP meaning Simple, it cannot be said that this software is simple at all. Yes, a switch may parse the few formats that it understands and respond directly; an AgentX implementation may be similar, but the entire system consisting of master and sub-agents, as well as monitoring and the potential of control, result in rather complex interactions. This is best embedded into an application.

Conclusion

We have learnt that SNMP is a powerful structure for monitoring, and it is fully understandable that other monitoring solutions mimic it. The binary format of its messages are less appealing to administrators, but it does make expansions of applications with SNMP simpler; in a sense, applications could have built-in support for monitoring, which is not currently the case. It is this lack of built-in monitoring that makes us script around textual output from commandline interactions with a program or its file system. And it is this need for scripting which has made script-based solutions such as Nagios more practical.

We are going to experiment with built-in monitoring for applications. This is the model that is also successful for switches and routers, and we believe it may be worthwhile using the modular format of many applications to create an SNMP plugin.

Specifically, we will be adding SNMP to our SteamWorks Shaft component. In addition, we may explore getting some useful data directly out of OpenDNSSEC and perhaps SMARTd -- we find Shaft interesting because it helps to monitor infrastructure that spans across a network; we think OpenDNSSEC is an interesting target because it distributes zone management over two components whose state should ideally end up in one table; we think SMARTd is the typical target because it does not need separate setup and testing when it plugs into a standardised monitoring infrastructure. The latter two examples may use SNMP as an optional extension; only when they can open the libraries and register with the master agent would they actually employ SNMP.

Go Top