Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents


Credentials and Access

SSH Access

  • Please use your GEANT AD account(GEANT\dante.xxx or GEANT\Firsname.Lastname)to have ssh access and sudoer rights on the required servers.

Other Access

  • Required credentials are in the "GÉANT Dashboard v3" LastPass folder

Problems

...

New alarms are not appearing in the gui

Possible Cause

...

: Traps not being processed by RabbitMQ

Analysis
  1. Open

...

  1. one or more of the following RabbitMQ management consoles.  (Credentials are in the "GÉANT Dashboard v3" LastPass folder)

...

...

...

Check Nodes section shows no errors

Solution
  • Restart RabbitMQ on failing node
  1. Scroll down to the "Nodes" section
  2. There should be 3 rows in the table and all status icons should be green (currently - there is a red bar showing a deprecated node - this will be removed when possible).  The expected node names are:
    • rabbit@prod-noc-alarms01
    • rabbit@prod-noc-alarms02
    • rabbit@prod-noc-alarms03
Solution
  1. If one of the 3 nodes is failing or missing from the list, log into the failing server via ssh and restart the RabbitMQ service:
    • systemctl restart dashboard-docker.service
  2. After a minute or two the management consoles should show the cluster is restored.
Solution #2
  1. If all 3 nodes appear in the list, but if the state of the nodes is different when logging into their respective administration gui's

...

Collectors have stopped working

Analysis

  1. Open

...

Check the following graph is not at 0

  • Collectors
Solution
  • Restart collectors

Possible Cause

  1. this Correlation status dashboard
  2. Scroll down to the "Collectors" panel
  3. Check that the graph shows a nonzero rate of traps being processes

Solution

  1. On each of the following servers:
    • net-alarms01.geant.org
    • net-alarms02.geant.org
    • net-alarms03.geant.org
  2. Log in via ssh and execute the following command:
    • sudo systemctl restart trap_collector

Possible Cause: Correlators have stopped working

Analysis

  1. Open

...

Check the following graphs are not at 0

  • Correlators - received
  • Correlators - handled
Solution
  1. this Correlation status dashboard
  2. Scroll down to the "Collectors" panel
  3. Check that the graph shows the leader collector processing a non-zero rate of traps.  The current leader can be identified by the FORWARDER with state 2 in the "Raft States" panel.

Solution

  1. On each of the following servers:
    • net-alarms01.geant.org
    • net-alarms02.geant.org
    • net-alarms03.geant.org
  2. Log in via ssh and execute the following command:
    • sudo systemctl restart trap_correlator

...

Content by Label
showLabelsfalse
max5
spacesSD
showSpacefalse
sortmodified
reversetrue
typepage
cqllabel = "kb-troubleshooting-article" and type = "page" and space = "SD"
labelskb-troubleshooting-article

...