Img_3_lg

Tap In and Ganglia Integration

What is Ganglia?

Ganglia is an open source, distributed, scalable monitoring system that is primarily used in grid computing. It relies on a design of hierarchical clusters. A low overhead module which gathers monitoring metrics is installed on each node in the cluster. The node forwards these metrics to the cluster level, which aggregates the metrics. Cluster level metrics are then aggregated to provide grid level metrics.

The results are a set of performance graphs and metrics which are build using RRDTool. Reports showing the aggregated metrics are shown at the grid, and the user can drill down to a cluster or individual node.

A live demo of Ganglia reports showing UC Berkley’s Grid can be accessed at http://monitor.millennium.berkeley.edu.

What are the benefits of Ganglia?

The distributed module installed on each node is designed to be high performance so the overhead associated with Ganglia monitoring are very low. Metrics can be sampled at a higher rate (several times a minute) than other monitoring tools with little impact on the monitored system.

Ganglia has been ported to many different operating systems so implementation is fairly straightforward. It has been implemented in many sites so the system is proven to be robust.

Because of the high capacity characteristics, Ganglia can support management of grids containing thousands of nodes.

What are the disadvantages of Ganglia?

Because the metric measurements are in the compiled code, Ganglia monitoring is more rigid than other monitoring systems. It is more difficult to implement new monitoring metrics than in other monitoring systems which are more script-oriented, like Nagios. This is the trade-off for getting the high performance.

Ganglia is easiest to deploy if all you need is to monitor operating system level metrics that are provided in the standard Ganglia implementation. If you need more custom metrics, you will need to create your own metric gathering process and integrate it into Ganglia. This requires some programming skill.

This is also true for Ganglia reports. The standard reports work well, but if you want to create a custom report based on Ganglia data, you may need to create your own web application to use the RRD data.

Ganglia is good for providing performance graphs, but does not provide any alerting or automated actions. As such, Ganglia may be a good tool for systems administrators to perform diagnostics and performance tuning, but not operations personnel that need to view exception conditions.

Ganglia does not provide active service level monitoring external from the node - for example, executing http checks against a web server. From this perspective, it is only aware of metrics available from it’s internal monitoring module.

From a cloud perspective, it is unaware of cloud or virtualized meta-data. Reports are based on host name, which may be dynamic in a cloud environment. Data from instances that have been stopped may be lost since the host names are transient.

What benefits does the Tap In–Ganglia Integration service provide?

Tap In system’s Cloud Management Service can integrate with Ganglia to provide the following benefits.

Alerts can be generated by monitoring any Ganglia metric. These alerts allow operators to easily see nodes that may have service-affecting problems. A glance at Tap In’s alert console can provide visibility across the entire Ganglia grids. Notifications and other automated actions may also be triggered from these alerts.

Additional monitored metrics can be created and monitored which are based on the Ganglia metrics. For example, a “swap percent free” metric can be created and monitored which is computed from the Ganglia “swap total” and “swap free” metrics. Graph reports can be generated from these metrics.

The Tap In monitoring service is quickly and easily implemented. Because Tap In’s service is deployed in the cloud, no hardware or software installation is required. No additional overhead is incurred on the monitored nodes for this monitoring since the Ganglia metrics are monitored.

Tap In’s service can include cloud or virtualized meta-data if the associated Tap In cloud agents are installed. This allows Tap In reports to incorporate cloud meta-data in alerts or reports.

How does this integration solution work?

The Tap In Cloud Management Service is deployed in the Amazon EC2 cloud. This management server can receive events from Tap In events and can perform active monitoring checks. Any Nagios plugin can be deployed on the agent or server. Tap In’s client console application provides a view of monitoring events received by the server. Web reports provide historical views of this event data.

A special Ganglia monitoring module is deployed with the management server. When the service is started, the Ganglia gmetad ports are monitored to auto-discover the grid environment and metric data is gathered. The operator specifies the alert thresholds that are applied on either a grid, cluster or host level.

Since no software needs to be installed, the Tap In service can be deployed in minutes. To customize the Ganglia monitoring module, the administrator jus completes a few web pages. No coding is required.

For additional information, please contact us at info@tapinsystems.com.

 
Tap In Cloud Management Service Features and Benefits Use Cases Tap In CloudControl Service
Event Management Architecture Managed Technologies Viewers Integrating Amazon CloudWatch Integrating 3tera Applogic Integrating GoGrid Integrating OpSource Cloud Process Automation
About Tap In Systems Management Contact
Documentation Downloads Technical Articles Technical Wiki Site Forum