Centralized Network Operations Center & State of the Art Solarwinds Monitoring for Credit Union

Centralized Network Operations Center & State of the Art Solarwinds Monitoring for Credit Union

San Diego County Credit Union is San Diego's largest locally-owned financial institution.

“With over $8.3 billion in assets, SDCCU is a not-for-profit credit union wholly owned and operated by its members. Unlike most other financial institutions, like big banks, San Diego County Credit Union does not issue stock or pay dividends to outside stockholders. Instead, earnings are returned to members in the form of lower loan rates, higher dividends on deposits or lower fees.

“SDCCU serves over 400,000 members and continues to thrive as one of San Diego's premier financial institutions. With 43 branch locations in San Diego, Riverside and Orange County, SDCCU is conveniently located in our communities serving consumers through superior product and service offerings. SDCCU is proud to have been voted BEST Credit Union 19 years straight as determined by the San Diego Union-Tribune Readers Poll.”

For the Information Technology department, supporting three data centers (two active and one passive), 200-300 servers and over 1,000 users across 43 branch offices required a lot of effort. A variety of monitoring tools was in place, but the coverage they provided was inaccurate and incomplete. There were frequent outages that weren’t detected or reported, and many “false positive” outages were reported, resulting in support resources chasing down problems that didn’t actually exist.

Configured alerts often didn’t provide useful troubleshooting information, and each incident was triggering large numbers of alerts. Discovery and cleanup was a manual, resource-intensive process. In addition, there wasn’t any monitoring that would provide an accurate measure of the User Experience.

The Vice President of Information Technology had a vision of a centralized Network Operations Center (NOC)  with a staff focused on monitoring the network, proactively identifying potential failures and managing network and server issues. System discovery would identify changes to the network, and an accurate device dependency configuration would curb excessive alerts by automatically linking alerts from different components on the same device. Finally, real-time dashboards with accurate and well-defined information would enable rapid discovery and troubleshooting of issues.

Solution:

Implement a Network Operations Center (NOC) to provide a “single pane of glass” view into the health of the network, servers and other back-end infrastructure equipment.

From the three incumbent monitoring solutions, SDCCU selected SolarWinds to provide the tools for monitoring and reporting on the network status. In addition, the data collected may provide indicators of impending failures and enable Network Administrators to take proactive to avoid outages or slowdowns. However, the existing SolarWinds implementation was not highly available and had many single points of failure.

Aspire Live’s consultants worked with the alerting and scripting tools in SolarWinds to identify, clean up and eliminate stale or invalid items in the SolarWinds database that had been generating bogus alerts. Additional scripts were created to detect and report on critical issues, and, in some cases, automatically fix them. Consultants designed node dependency criteria based on information external to SolarWinds and used group stats rollup to create dependencies for alerting and reporting. Custom alerts were implemented that followed rollup status based on NOC incident response policies.

Network monitoring based on Cisco’s NetFlow technology was implemented to provide insight into network resource consumption.  Through analyzing flow data, a picture of network traffic flow and volume was generated, enabling administrators to see where network traffic was coming from and going to and how much traffic was being generated.

The SolarWinds Netflow Traffic Analyzer is a software-based data collector that pulls NetFlow traffic data, correlates it into a useable format, and presents it to NOC administrators in a web-based interface. By using the Netflow Traffic Analyzer, administrators are able to:

1. Monitor network bandwidth & traffic patterns down to the interface level

2. Identify which users, applications & protocols were consuming the most bandwidth

3. Highlight the IP addresses of top talkers.

4. Provide real-time insights into who was on the network, what they were doing and how their actions impacting other users.

Aspire Live consultants worked with SDCCU resources to implement a large number of new alerts, NOC screens, dashboards, automation items and all required cleanup and configuration tasks. As part of the project, Aspire Live provided SDCCU with documentation on how the new design works and on how to leverage the new NOC framework to continue adding in future systems. Aspire Live was able to complete the project on schedule and required fewer overall consulting hours than originally planned.

Benefits

Network Operations Center (NOC) monitoring and reporting provides key IT resources with useful information about the issues detected, possible ways to resolve them and details about who might be affected by the issue.

In a given day, the NOC is monitoring events from over 2,500 devices, averaging over 10,000 events per hour, detecting and reporting on numerous issues across the organization including:

• Resource issues (CPU, RAM, Disk)

• Hardware issues (temperature, power)

• Network issues (interfaces, tunnels, routers, switches)

• ATM issues

• Storage system issues

• High-availability issues for core business systems and applications

• Web performance issues.

The NOC provides new screens and dashboards to administrators which provide an increased level of functionality.  For example, the new dashboards provide information on the temperature and power status for critical devices at the various locations.

In addition, the new dashboards enable direct access to new troubleshooting tools, reducing the time between detection and action. The newly-implemented alerts provide direct links to the associated NOC screens to enable quick drill-down troubleshooting of issues. Alerts also create tickets automatically to keep visibility on the quantity of issues and on the status of the remediation.

The credit union wanted to improve detection and management of network and infrastructure issues.

Industry

Finance

Services
Data Center & Virtualization
Infrastructure Implementation
Brands
Key Personnel