Browse by year:
How to Effectively & Efficiently Monitor Your Cloud Computing Infrastructure
Vikas Aggarwal
Tuesday, February 2, 2010
Software applications and the underlying computing infrastructure are essential enablers of business processes in today’s technology-dependent enterprise environment. Almost every single business activity - from taking an order from a customer to delivering purchased products - depends directly on the effective performance of the Information Technology (IT).

Organizations have used a variety of tools and approaches to monitor their networks and ensure optimum performance of their enterprise IT infrastructure. It is clear though that traditional network management tools have been stretched to the limit, given the widespread adoption of multi-tier applications, distributed computing, and Internet technologies in recent years. The arrival of cloud computing, the latest evolution in IT, creates a new set of challenges that require innovative monitoring tools to help businesses leverage its cost and efficiency benefits, while mitigating the risks of underperforming infrastructure. This article will discuss the limitations of traditional tools, and review the capabilities of the next generation tools that are able to address the increased management demands of modern IT infrastructure.

In simple terms, cloud computing uses virtualization, grid architectures, and software as a service application delivery, both inside and outside the boundaries of the enterprise network. This computing approach or ‘technology’ promises significant cost savings and business agility compared with traditional methods of computing. In a cloud environment, a business application, for example, may leverage a combination of in-house vanilla virtual machines, pre-built storefront virtual machines from an external cloud vendor (e.g. Yahoo), and an external application service (e.g. NetSuite). Furthermore, a variety of network elements and links make up the required infrastructure mix to ensure the effective functioning of the overall application. This type of complicated environment requires a new approach to network management and monitoring.

Traditional network management and monitoring systems have focused on measuring and monitoring technical metrics and trends of individual elements and components in the infrastructure. Although these systems enable the IT operations team to identify problem areas from a technical point of view for a given piece of infrastructure, significant gaps exist in determining the business impact of a specific problem. If a router and an external application service fail at the same time, legacy monitoring systems offer no way for the Network Operations Center (NOC) operator to determine which of these is more critical or which business services have been impacted by these failures. When an isolated issue occurs in the complex Web of new technologies - one that may impact one or more user-facing tasks in a business process - the current monitoring approaches are incapable of determining its impact on the business.

In order to ensure smooth running of business operations in a cloud environment, network management must move away from point monitoring of IT infrastructure to monitoring business service availability and performance. It must go beyond just looking at the performance of individual nodes or components to include a holistic service-oriented view.

To ensure greater reliability of essential processes and systems in a virtual environment, Business Service Management (BSM) systems can help enterprises connect business processes with IT operations to achieve a more holistic perspective. By connecting the worlds of IT and business, BSM solutions are able to identify the affected business processes or services when problems occur in the complex, distributed, and virtual IT infrastructure. BSM solutions enable preemptive and rapid identification of business issues, accurate identification of root causes and quick resolution of problems.

Although the case for BSM is clear, the path forward has not been an easy one, given the solution options that have been available in the marketplace. Older generation network monitoring products are not able to unify fault or event, performance management, and BSM within one system, and thus force businesses to deploy and integrate multiple systems to get an end-to-end view. This cumbersome approach involves linking multiple disparate applications across different layers and domains of infrastructure and business services. These solutions contain a confusing array of complicated features, require specialized application-specific expertise to install, integrate and manage, and involve execution of complex projects to complete an implementation. All of these add up to a significant investment in the initial deployment and ongoing administrative support, resulting in extremely high total cost of ownership.

Fortunately, innovative solutions have emerged to deliver advanced BSM capabilities required by the enterprise, pre-integrated with the necessary underlying fault or event and performance management capabilities. These next generation BSM systems leverage two key technical advantages that enable them to effectively support cloud computing environments.

The first, open and extensible APIs or data-capture plug-ins for integrating with external systems allow for the easy addition of custom monitors to capture availability and performance data from any element within the cloud computing infrastructure, whether it’s a new external Web service or a virtual machine.

The second, sometimes referred to as creating ‘service containers’, involve grouping an organization’s IT infrastructure to create logical, business-oriented views of the overall physical and virtualized computing network. The ability to link applications and the cloud computing infrastructure with business services creates service containers that allow enterprise network administrators monitor for multiple elements of the infrastructure, generate reports on service containers, get uptime information and realtime status for services, and receive alerts if services fail or exceed defined thresholds.

Next generation BSM systems provide the ability to define container severities to support varying business needs and objectives. Users can specify rules to indicate when a container is identified as being in an undesirable state. For example, if there are two redundant network paths between two end points, this can be specified in a business container. If there is a virtual server farm behind a load balancer and an outage of some of the virtual servers does not affect the supported business service, this can also be specified in a business container. Similarly, if there is a single SaaS application that supports the same business service, the business container can be defined to indicate the status of the business service as being ‘critical’ if the synthetic test transaction with SaaS application fails.

As mentioned earlier, business processes are increasingly dependent on a complex mix of IT infrastructure and applications that extend beyond the boundaries of the enterprise network. A new set of IT management challenges have emerged in the light of the rapid adoption of cloud computing technologies, such as virtualization, grid architectures and SaaS. To ensure smooth business operations, organizations need to deploy advanced BSM solutions that overcome the limitations of legacy network management tools by providing realtime visibility into the availability and performance of business services.

The author is Founder and CEO, Zyrion Inc
Share on LinkedIn