Requirements
Service view
- GEANT proxy
- Will use test instance to monitor
- Will develop full chain test against this proxy
- InAcademia
- Would like multiple endpoints reported on
- eduroam
- Please re-use data from eduroam technical monitor site
- eduGAIN
- Report on eduGAIN metadata and supporting sites
- DO NOT query eduGAIN metadata itself
- eduTEAMS
- Per T&I service dashboard required
- No endpoints discussed yet
Dashboard
(MUST) Separate status reports on a per-service basis
- (MUST) Publicly available part - per service public status pages (also for multiple monitors): (a simple) overview dashboard (do we need to show even more detailed information in order to support the resolution or direct the blame, or just a simple and constrained overview?)
- (MUST) Login for protected parts
- (MUST) Allow further links to specific services components if listed
- (SHOULD) Allow messages to be presented at a public page
- (SHOULD?) Overall overview for specific stakeholder (EC) may still have value, even if it is not useful for individual services.
- (COULD?) Consider representation for complex flow(s), like e.g. including login.
Data (Sources)
- (MUST) Checks: HTTP(S) return status, keyword(s) in returned pages, ping, port
- (SHOULD) API access (to data collected by the platform, or to data to be collected by the platform from existing tools, or both?)
- (COULD) Optional: check specific headers
- (COULD) Authenticated access to pages (HTTP basic/digest) (is this just for data collection or also for links in the dashboard?)
Access and visibility
- (MUST) Specifically branded to service in term of URL and use of logos and other branding elements (e.g. colour schemes?); CNAME possible per status page
- (MUST) Publicly available top-level landing for a service-specific status page, redirecting from generic URL to specific page of the vendor (either a feature of to be procured service or perhaps a simple DNS based redirect), e.g. status.edugain.org and status.eduroam.org.
- "Sub" services like e.g. edugain access check would be maybe visible as part of status. edugain, but not as separate pages???
- (SHOULD) Multiple users
- (SHOULD) Separation of duties (multiple users for separate monitors)
- (COULD) 2FA access to the management portal
To be discussed
- Compound states based on several sources and rules?
- Filters? (example: Can be noisy; one test site used during the assessment phase would be working without error, and yet a number of the globally-distributed testing servers would report a problem with this test site. No pattern was found for this behaviour.)
- Charts? Interactive charts?
- Long term history and trends?
- Collection of service usage/load stats?
- Alerts (email, DM, webhooks, SMS)?
- Checks from multiple locations?
- Downtime filtering?
- Maintenance downtimes?
Features
Compare features of
- Pingdom https://www.pingdom.com/
- UptimeRobot https://uptimerobot.com/
Some checks from https://docs.google.com/presentation/d/1aeP1R_RK9wNipguSYbcDbyurAkFQD7BV31QcR99f6YQ/edit#slide=id.g7183f71b4a_0_48 onward