This is a new category of article that falls under "RARE software architecture" special blog series. As its name implies, it deals with topics related to RARE/freeRouter software / Monitoring.
Requirement- Basic Linux/Unix knowledge
- Service provider networking knowledge
| |
Overview
In Greek mythology, Prometheus is a Titan that is credited mankind creation by stealing Fire from Gods and by giving it to human. In the RARE context, Prometheus is a the software from prometheus.io project. It became very popular in the IT industry as it is very simple to implement/configure while providing a great number of metrics without impacting application performance. It is heavily used in microservices environment such as docker and Kubernetes. The mythological reference gives us an indication of how Prometheus is operating. At a constant rate, Prometheus metric collector or server is stealing metrics from Prometheus agent. All the stolen metrics are then consolidated in Time Series database ready to be poured to a queueing system for proper visualization.
Before going further, allow me a brief digression by sharing with you a small anecdote that leds to this ongoing work related to network monitoring for RARE. As mentioned previously, our focus is to elaborate RARE/freeRouter solution the possibility to be monitored in an operational environment. In that context, we started with the implementation of a lightweight SNMP stack that provided relevant result via SNMP tools like LibreNMS. This is great for organisation that wouldn’t want invest time on anything but SNMP.
However, we felt a lack of flexibility due to SNMP inherent structure and we needed more versatile and instant monitoring capabilities. More importantly the need to export infinite metric type from Control Plane in a more flexible way arise. How metrics such as: Number of IPv4/IPv6 routes, IPv4 BGP prefix, IPv6 BGP prefix platform JVM memory etc. could be shared without too much hassle ?
After some internal discussion, I just said: "I’m not a monitoring expert but we have tools like ELK and PROMETHEUS and GRAFANA in NMaaS catalog … Shouldn’t we consider use this ?"
The answer was: « Let’s give it a try and fire up a Prometheus and Grafana instance from NMaaS platform !»
Some hacking at the control plane code level were initiated, after few hours freeRouter lead developer came up with a solution and said: Let me introduce you "freeRouter prometheus agent »
And thanks to the great support of NMaaS team, in few minutes and some point and clicks (it took longer than expected as I’m not good with GUI) we were able to test this agent.
Why is it important you might say ? It is just that with prometheus simplicity and low resource overhead with have full control plane metrics visibility !
As a side note, INT/TELEMETRY/NETFLOW/IPFIX provide different type of data that are to at the same scale…
People with INT/TELEMETRY/NETFLOW/IPFIX are talking about (disclaimer:buzz word) data lake. Which is correct.
While in our case, we are just focusing on exposing CONTROL PLANE METRICS (so it is definitely not a lake), but we don’t need a "lake of data" in order to monitor and ensure a router operation.
Again, kudos to NMaaS team that made this happen so the we can test this on the P4 LAB with — ZERO — effort.
Article objective
In this article, we will present freeRouter and Prometheus integration and as an example we will implement one of the 22 grafana dashboard that we developed and published here. In the rest of the article we will assume that you are a running one or more freeRouter nodes.
Diagram
[ #001 ] - Cookbook
Configure a Prometheus server
The first step is to implement a prometheus server. Using NMaaS it is pretty instantaneous. However, if you plan to deploy prometheus in an other platform just follow the installation guide here.
Once deployed you can push the following prometheus.yaml config:
global:
scrape_interval: 15s
evaluation_interval: 30s
alerting:
alertmanagers:
- static_configs:
- targets:
rule_files:
scrape_configs:
- job_name: 'router'
metrics_path: /metrics
scrape_interval: 15s
static_configs:
- targets: ['192.168.0.1:9001','192.168.0.2:9001']
labels:
In this configuration we assume that we have 2 freeRouters that are configured as above (192.168.0.1:9001 and 192.168.0.2:9001) in prometheus worls these are called targets:
- each target are interrogated or "scraped" very "scrap_interval" which is 15s here
- the main job name is called; "router"
- metrics_path is: "/metrics" so the scraped URL is: "http://192.168.0.1:9001/metrics"
Note that this had to be deployed only once for all of your routers. However, each time you'd like to add a new router, you have to add a new target in the "targets" YAML list.
Configure Prometheus FreeRouter control plane
In this example let's focus our interest interface metrics. Please note that this configuration should be deployed on each freeRouter and connectivity should be available between all targets and the prometheus server.
The objective is to tell freeRouter control plane to expose hardware and software counter interface metric. In order to do this just copy/paste the stanza here below via freeRouter CLI:
!
server prometheus <PROMETHEUS_SERVER_NAME>
metric inthw command sho inter hwsumm
metric inthw prepend iface_hw_byte_
metric inthw name 0 ifc=
metric inthw replace \. _
metric inthw column 1 name st
metric inthw column 1 replace admin -1
metric inthw column 1 replace down 0
metric inthw column 1 replace up 1
metric inthw column 2 name tx
metric inthw column 3 name rx
metric inthw column 4 name dr
metric intsw command sho inter summ
metric intsw prepend iface_sw_byte_
metric intsw name 0 ifc=
metric intsw replace \. _
metric intsw column 1 name st
metric intsw column 1 replace admin -1
metric intsw column 1 replace down 0
metric intsw column 1 replace up 1
metric intsw column 2 name tx
metric intsw column 3 name rx
metric intsw column 4 name dr
vrf <VRF_NAME>
exit
!
So this basically means:
- From freeRouter CLI, issue the following command:
sho inter hwsumm
interface state tx rx drop
hairpin41 up 67404 0 0
hairpin42 up 153134 0 0
sdn1 up 412319805 1057514903 1152305
sdn2 up 1038840147 407307558 202
sdn3 admin 0 0 0
sdn4 admin 0 0 0
sdn5 admin 0 0 0
sdn6 admin 0 0 0
sdn998 up 9154 0 0
sdn999 up 199178 262939 0
tunnel1965 up 0 9122896 0
- prepend to the metric name: "iface_hw_byte_"
- column 0 will have prometheus label ifc=
- replace all dots "." by "_" . (so interface bundle1.123 will become bundle1_123)
- column 1 defines a metric name "iface_hw_byte_" concatenated to "st" => "iface_hw_byte_st" which is essentially interface status
- if column 1 "state" value is admin/down/up we associate value -1/0/1
- column 2 defines a metric name "iface_hw_byte_" concatenated to "tx" => "iface_hw_byte_tx" which is essentially interface bytes transmitted counter
- column 3 defines a metric name "iface_hw_byte_" concatenated to "rx" => "iface_hw_byte_rx" which is essentially interface bytes received counter
- column 4 defines a metric name "iface_hw_byte_" concatenated to "dr" => "iface_hw_byte_dr" which is essentially interface bytes dropped counter
And if you followed this correctly, we are repeating these lines for software interface counter metric.
check the "Targets" menu drop down selection
From that point you should be able to use PromQL query filed in order to check that you can retrieve the metrics we defined above.
Grafana configuration
For metric visualisation, we will use Grafana. Therefore:
- install Grafana from official web site.
- Once installed configure Prometheus as Grafana data source:
- fill in all the prometheus server information
- check the the data source is defined correctly by clicking the "Save & test" button
At that point your Grafana and Prometheus are correctly binded.
- now you need to import "RARE/freeRouter interface bytes" dashboard
- download freeRouter interface bytes dashboard here
- import the dashboard via ID or simply download JSON or use JSON panel
And Voila !
In order to immediately see the graph zoom in to 5m period with a refresh of 5s and you should see automagically the interface bytes TX/RX on all interface for each targets.
Discussion
This example related to interface metrics is universal, as the metrics at freeRouter level are yielded through a generic CLI command:
- "show interface hwsummary"
- or "show interface swsummary".
There will be some times when some metrics will be tied to specificities of your network. Let me give you a couple of example:
the metrics below assume that you have deployed a link state IGP called: "isis 1" or in your case you could have arbitrary deployed "isis 2200". (2200 is RENATER AS number)
metric lsigp4int command sho ipv4 isis 1 interface
metric lsigp4int prepend lsigp4_iface_
metric lsigp4int name 0 proto="isis1",ifc=
metric lsigp4int replace \. _
metric lsigp4int column 1 name neighbors
metric lsigp4peer command sho ipv4 isis 1 topology 2
metric lsigp4peer prepend lsigp4_peers_
metric lsigp4peer name 0 proto="isis1",node=
metric lsigp4peer replace \. _
metric lsigp4peer column 1 name reachable
metric lsigp4peer column 1 replace false 0
metric lsigp4peer column 1 replace true 1
metric lsigp4peer column 6 name neighbors
metric lsigp4perf command sho ipv4 isis 1 spf 2 | inc reachable|fill|calc|run
metric lsigp4perf prepend lsigp4_perf_
metric lsigp4perf labels proto="isis1"
metric lsigp4perf skip 0
metric lsigp4perf column 1 name val
metric lsigp6int command sho ipv6 isis 1 interface
metric lsigp6int prepend lsigp6_iface_
metric lsigp6int name 0 proto="isis1",ifc=
metric lsigp6int replace \. _
metric lsigp6int column 1 name neighbors
metric lsigp6peer command sho ipv6 isis 1 topology 2
metric lsigp6peer name 0 proto="isis1",node=
metric lsigp6peer prepend lsigp6_peers_
metric lsigp6peer replace \. _
metric lsigp6peer column 1 name reachable
metric lsigp6peer column 1 replace false 0
metric lsigp6peer column 1 replace true 1
metric lsigp6peer column 6 name neighbors
metric lsigp6perf command sho ipv6 isis 1 spf 2 | inc reachable|fill|calc|run
metric lsigp6perf prepend lsigp6_perf_
metric lsigp6perf labels proto="isis1"
metric lsigp6perf skip 0
metric lsigp6perf column 1 name val
in the metric below the variable is BGP AS number "65535":
metric bgp4peer command sho ipv4 bgp 65535 summ
metric bgp4peer prepend bgp4_peer_
metric bgp4peer name 4 peer=
metric bgp4peer replace \. _
metric bgp4peer column 1 name learn
metric bgp4peer column 2 name advert
metric bgp4peer column 3 name state
metric bgp4peer column 3 replace false 0
metric bgp4peer column 3 replace true 1
metric bgp4perf command sho ipv4 bgp 65535 best | exc last
metric bgp4perf prepend bgp4_perf_
metric bgp4perf replace \s _
metric bgp4perf column 1 name val
metric bgp6peer command sho ipv6 bgp 65535 summ
metric bgp6peer prepend bgp6_peer_
metric bgp6peer name 4 peer=
metric bgp6peer replace \: _
metric bgp6peer column 1 name learn
metric bgp6peer column 2 name advert
metric bgp6peer column 3 name state
metric bgp6peer column 3 replace false 0
metric bgp6peer column 3 replace true 1
metric bgp6perf command sho ipv6 bgp 65535 best | exc last
metric bgp6perf prepend bgp6_perf_
metric bgp6perf replace \s _
metric bgp6perf column 1 name val
Last example with "LDP null" metrics, in this particular case the variable object is the VRF name: "inet"
metric ldp4nul command sho ipv4 ldp inet nulled-summary
metric ldp4nul prepend ldp4null_
metric ldp4nul name 3 ip=
metric ldp4nul skip 2
metric ldp4nul replace \. _
metric ldp4nul column 0 name prefix_learn
metric ldp4nul column 1 name prefix_advert
metric ldp4nul column 2 name prefix_nulled
metric ldp6nul command sho ipv6 ldp inet nulled-summary
metric ldp6nul prepend ldp6null_
metric ldp6nul name 3 ip=
metric ldp6nul skip 2
metric ldp6nul replace \: _
metric ldp6nul column 0 name prefix_learn
metric ldp6nul column 1 name prefix_advert
metric ldp6nul column 2 name prefix_nulled
Conclusion
In this 1st article, you were presented :
- freeRouter/Prometheus integration
- How to add a new router in the list of Prometheus target
- How to integrate a RARE/freeRouter Grafana Dashboard. (Feel free to adapt the other available dashboard query to your context !)