--draft –
Scope of Incident Management Process
Incident is as an unplanned interruption or reduction in the quality of Seamless Access service or a failure of one of Seamless Access Service components that has not yet impacted service. The purpose of incident management is to restore normal service operation as quickly as possible and minimize the adverse impact on business operations, thus ensuring that agreed levels of service quality are maintained.
Two types of incidents are defined:
- Generic incidents
- TIER2 infrastructure incidents
Handling of security incidents and resolution times are defined in the Seamless Access Operational Level Agreements - OLA. This process elaborates on them in a bit more detail.
Generic Incidents
Start points:
- S1: Incident detected by Monitoring tool and reported as event to SUNET NOC
- S2: Incidents reported by Users
- what is the process for that!?
Tasks:
- IM 01: SUNET NOC will create an incident in status.seamlessaccess.org if they receive any alarms from NagiosXi and SA Pingdom account or other sources.
- IM 02: The incident in status.seamlessaccess.org should be updated at least once a day.
- IM 03: SUNET NOC will identify if the incident is critical or minor.
- IM 04: If it is a critical incident, the Service Operations Manager, Product Manager and Technical Lead will be notified. This is according to 6.2. Incident handling.
- IM 05: In case it is a minor incident, NOC needs to check if it is related to the software running in the docker images or an operational incident that can be handled by the NOC itself. In the first case, it will be escalated to Technical Lead/layer 3 support.
- IM 06: In case of an operational incident, whoever is in charge of SUNET NOC will try to resolve the incident with common knowledge or help of SA documentation https://wiki.sunet.se/display/sunetops/SeamlessAccess.
- IM 07: See if the incident is resolved.
- IM 08: If it is resolved, the incident in status.seamlessaccess.org should be updated and closed.
- IM 09: SUNET NOC may suggest how this problem can be avoided in the future by giving suggestion to operation team of SA (SUNET Engineering).
- IM 10: If the problem is not resolved then it should be escalated to the operation team of SA (SUNET Engineering) which will create a Jira ticket in Seamlessaccess project in jira.sunet.se
- IM 11: The SA operation team will try to resolve this incident.
- IM 12: See if the incident is resolved.
- IM 13: If it is resolved, the incident in status.seamlessaccess.org should be updated and closed.
- IM 14: The operation team will note how this problem can be avoided in the future and implement that solution later in a separate case.
- IM 15: In case, the operation team is unsuccessful to resolve the issue, it will be escalated to Technical Lead/layer 3 support.
- IM 16: The layer 3 support may suggest how this problem can be avoided in the future and implement that solution later in a separate case with help of the operation team (SUNET Engineering).