1.1. Credentials and Access
1.1.1. SSH Access
-
Please use your
GEANT AD account(GEANT\dante.xxx or GEANT\Firsname.Lastname)to have
ssh access and sudoer rights on the required servers.
1.1.2. Other Access
- Required credentials are in the "GÉANT Dashboard v3" LastPass folder
1.2. Problems
1.2.1. New alarms are not appearing in the gui
1.2.1.1. Possible Cause: Traps not being processed by RabbitMQ
1.2.1.1.1. Analysis
- Open one or more of the following RabbitMQ management consoles. (Credentials are in the "GÉANT Dashboard v3" LastPass folder)
- Scroll down to the "Nodes" section
- There should be 3 rows in the table and all status icons should be green (currently - there is a red bar showing a deprecated node - this will be removed when possible). The expected node names are:
- rabbit@prod-noc-alarms01
- rabbit@prod-noc-alarms02
- rabbit@prod-noc-alarms03
1.2.1.1.2. Solution
- If one of the 3 nodes is failing or missing from the list, log into the failing server via ssh and restart the RabbitMQ service:
-
systemctl restart dashboard-docker.service
-
- After a minute or two the management consoles should show the cluster is restored.
1.2.1.1.3. Solution #2
- If all 3 nodes appear in the list, but if the state of the nodes is different when logging into their respective administration gui's
- follow these instructions to restart/rebootstrap the cluster
1.2.2. Collectors have stopped working
1.2.2.1. Analysis
- Open this Correlation status dashboard
- Scroll down to the "Collectors" panel
- Check that the graph shows a nonzero rate of traps being processes
1.2.2.2. Solution
- On each of the following servers:
- net-alarms01.geant.org
- net-alarms02.geant.org
- net-alarms03.geant.org
- Log in via ssh and execute the following command:
-
sudo systemctl restart trap_collector
-
1.2.3. Possible Cause: Correlators have stopped working
1.2.3.1. Analysis
- Open this Correlation status dashboard
- Scroll down to the "Collectors" panel
- Check that the graph shows the leader collector processing a non-zero rate of traps. The current leader can be identified by the FORWARDER with state 2 in the "Raft States" panel.
1.2.3.2. Solution
- On each of the following servers:
- net-alarms01.geant.org
- net-alarms02.geant.org
- net-alarms03.geant.org
- Log in via ssh and execute the following command:
-
sudo systemctl restart trap_correlator
-