The University of Arizona

News

Friday, April 2, 2010 - 4:04pm
One weekend afternoon, Jeff Bishop was sitting in a restaurant when he received an email that server space was running low on the D2L database. Jeff, an analyst and developer for UITS Enterprise Applications, pulled out his cell phone and called the UITS Operations Center to look into it. Shortly afterwards a database administrator notified Operations that he had already resolved the issue.

If the database server had exceeded its capacity, D2L would not have been able to back up 24 hours worth of student data—a tremendous amount of work campus-wide to risk losing. Yet three different IT staffers had received an early warning, and fixed the issue before it became a problem.

UITS manages thousands of devices—servers, switches, routers, uninterruptible power supplies, remote environmental sensors, application load balancers, firewalls, and more. How can staff monitor the health and performance of so many pieces?

Meet EM7 , a performance monitoring and fault management system from ScienceLogic. EM7 "polls" every piece of equipment it's asked to monitor at regular intervals. It then compares the message it gets back to what it expects to get back, and alerts UITS staff if there's an unexpected response.

Is the charge on an uninterruptible power supply running low? Is a Web page hosted on a UITS server taking too long to load? Is a hard drive reaching capacity? Did a response not come back at all?

UITS staff are logged into EM7 Web pages that show them alerts when something goes wrong, and they can have alerts emailed to them as well. With inconsistencies in systems pinpointed and flagged so quickly, staff are often able to handle issues before they escalate into problems.


Photo by Natasha Kolosowsky, UITS
Chris Pierce, in UITS Network Operations, manages the EM7 program. Status windows keep him and the other analysts apprised of systems that need help with color-coded warnings.

The polls are set to go out at intervals—every five minutes, every 15 minutes, daily—based on how critical a system is and how likely it would be that there would be a rapidly forming problem with it. EM7 runs 24 hours a day, seven days a week, 365 days a year, and so do the UITS staff monitoring it.