Low Priority for SNMP Handling
Devices such as routers often give low priority to the SNMP, which handles SNMP requests, compared to other tasks that are deemed more critical, such as packet forwarding or running routing protocols. Therefore, in times of high processor load, it is possible that SNMP queries get dropped or time out. This is annoying, because it is often just in such extreme situations that one is interested in monitoring data to understand how devices perform. On the other hand, giving management traffic lower priority seems like the right thing to do, because the main purpose of routers is to forward traffic, and keeping the routing protocols such as OSPF, IS-IS, and BGP up and running is a prerequisite for packets to be forwarded correctly.
Countermeasures
Retransmission and timeouts on the manager side
Remember that the "manager" is the program that sends SNMP requests to the "agent" on the managed device. In typical usage, an SNMP request such as get
, get-next
, get-bulk
, or set
, is sent as a single UDP packet, and expects another single UDP packet in return. The manager sends a request and waits for a response. If the manager doesn't receive a response for some amount of time (the timeout), it will usually send another request, and so on, until it gives up.
Configuring the timeout/retransmission parameters can be a difficult trade-off: If the timeouts are set too short and/or too few retransmissions are attempted, then some retrievals may fail completely. On the other hand, if the timeouts are long and/or many retransmissions are attempted, a dead device can result in a very long overall sequence of attempts to retrieve data, especially if several different values are to be measured.
Intelligent batching of requests, get-bulk
for table traversal
SNMP allows many variables to be retrieved in a single request. In general, a manager should attempt to "batch" retrievals of different variables as much as possible, in order to reduce management traffic, load on the agent, and the risk of lost requests/responses. When "walking" rows of SNMP's conceptual tables, the manager should use the efficient get-bulk
in preference of get-next
. Note that get-bulk
is only available in SNMPv2 and later. Batching requests can complicate manager code, which must take more care about situations where packets (especially responses) become too big, but easing the work of the agent is often worth it.
Process scheduler on the managed device
One reason for delayed SNMP processing is when other tasks of high or equal priority occupy the managed device's processing resources for extended periods of time. Depending on the system, this can be addressed by shortening the scheduling interval, or by changing the relative priorities of the SNMP agent and the other processes it is competing with. The latter can be dangerous in that the SNMP agent could start to "starve" other more vital activities if it has to process complex requests.
On Cisco IOS routers, the scheduling intervals can be changed using scheduler allocate, and the priority of the SNMP agent process can be changed using the hidden command snmp-server priority, to either low or high.
Prioritizing network management traffic
Network management packets can also be dropped independently of SNMP processing, either in transit because of traffic congestion, or on the input queue of the processing router, if the router cannot handle all traffic sent to(rather than through) it quickly enough. It is possible to address either issue using quality-of-service mechanisms, such as priority queueing or Cisco's "selective packet discard" (SPD).
-- SimonLeinen - 09 Aug 2009