In the attached child pages is information about IT Monitoring at Nu Skin. This includes both the current state of monitoring, some research on best practices and a strategy for where we should take monitoring. We want to move to more of a model where we manage to service levels not just responding to outage events.
Large enterprises should consider a multitier event management hierarchy, pushing some event processing and correlation out to the managed IT element at the bottom of the hierarchy to reduce the overflow of unnecessary events, using specialized event management tools to gain additional depth in specific IT domains at the middle tier of the hierarchy, and placing a general purpose manager of managers (MoMs) product at the top tier of the architecture to achieve a single, integrated view of events from a wide range of IT infrastructure elements.
Projects in the past have focused on tool selection not on tool implementation. We have purchased many different monitoring tools at Nu Skin, often with the desire to combine all of our monitoring into one giant tool. This has not been successful. In fact we have started several projects to select the perfect tool, bought the tool and then implemented only a few of our alerts in the new tool, without retireing any of our existing monitoring tools. We are good at selecting tools but not good at implementing them all the way.
We have a elaborate, decentralized, disorganized, ineffecient but effective, monitoring on our systems. Nu Skin has lots of different ways of monitoring our systems. So given our lack of monitoring strategy each group has implemented their own monitoring to ensure the health of the systems they are responsible for. We have a total of 17 ways we monitor IT systems (including the 7 tools that Carl S. was using).
Ownerhip of our IT systems groups for keeping their respective systems up is very high. What this really means is 99% of the time events are detected and fixed down in the individual groups before they ever cause a critial problem.
Promote the continuded use of low level event managers by the DBA, System Admin, Network groups, but add an general purpose manager to the top tier of the architecture to achieve a single, integrated view of events.
Specifially replace
with Nagio XI the commercial version
In the 2010 budget we have broken down as follows:
This would lower the annual support costs for tools, reduce the number of tools we use to monitor and provide a integrated view of critical events.
The money spent here mostly for implementation, not for licenses or support.Monitor