Correlation Engine

Umbrela Fault Management system on events collection and consolidation level provides foundation for further operational processes automation.

Most valuable results from Fault Management system communication service providers achieve on next level - network and services events and faults management automation.

When network fault arise NOC engineer usually should perform following actions for fault management:

• Select among all events those, which indicates impact on services;
• Prioritize them;
• Identify Root Cause:
- By themselves;
- Create and dispatch trouble ticket to resolve fault by other engineer;
• Restore problem:
- By themselves;
- Create and dispatch trouble ticket to resolve fault by other engineer;
• Perform other operations (like inform engineer on site about events – see Site Access solution);
• Prepare reports.

In telecommunication network of national-wide operator there are hundreds of thousands active events. Manual events filtering and correlation inevitably lead to missing service affecting faults. There is typical situation when fault was not critical at beginning, but not being solved timely – lead to severe service impact. Network operations engineers performs mostly “firefighting” function instead of “fire prevention”.

From experience of using Fault Management systems by our customers, automation of routine, day-to-day activities most effect provides.

Using Correlation Engine (see picture below) significantly simplifies creating, maintenance and making changes into events processing rules (cases). One of our customers use more than 500 cases to automate fault management process.

Events from Fault Management System are selected according to defined in cases filters, verified if fields in events meet condition criteria and then processed by specified modules.

Each module implements specific algorithm. There are following modules: Reprioritization, Correlation, Maintenance, Enrichment, Notification, Runbook (outside), Trouble Ticketing automation. Output of one module can be used as input for another.

Reprioritization allows to change event’s attributes in Netcool ObjectServer according to defined rule or to create new synthetic (service) events.

Correlation correlates (ties) event with others (one to many connection). With Netcool EventList or WebGUI it is possible to navigate between correlated events in both directions.

Maintenance marks events in Netcool ObjectServer that comes from equipment with planned works on it. Usually this information is taken from external systems or sources.

Enrichment allows adding to event information from external data sources such as Network Inventory, Work Ordering, CRM, etc. It is most used module. Could select data from one data source, then make based on the results request to second data source and update corresponding data in first data source.

Notification performs notification about specific event with appropriate subject, body according to list of addresses.

Runbook (Outside) runs external scripts for events enrichment with data from NMS/EMS or for execution of corrective commands on equipment.

TT automation allows two-way integration with trouble ticketing and work ordering systems.

Our experience of implementing the Correlation Engine in different mobile and fixed operators shows that most of engineer’s routine tasks can be automated by this set of modules.

As a result of Fault Management Automation:
1. All routine, repetitive network monitoring and fault management tasks are performed automatically.
2. NOC engineers are focused on network anomalies analysis and preventive activities.