JOURNAL ARTICLE

REMO: Resource-Aware Application State Monitoring for Large-Scale Distributed Systems

Abstract

To observe, analyze and control large scale distributed systems and the applications hosted on them, there is an increasing need to continuously monitor performance attributes of distributed system and application states. This results in application state monitoring tasks that require fine-grained attribute information to be collected from relevant nodes efficiently. Existing approaches either treat multiple application state monitoring tasks independently and build ad-hoc monitoring trees for each task, or construct a single static monitoring tree for multiple tasks. We argue that a careful planning of multiple application state monitoring tasks by jointly considering multi-task optimization and node level resource constraints can provide significant gains in performance and scalability. In this paper, we present REMO, a REsource-aware application state MOnitoring system. REMO produces a forest of optimized monitoring trees through iterations of two phases, one phase exploring cost sharing opportunities via estimation and the other refining the monitoring plan through resource-sensitive tree construction. Our experimental results include those gathered by deploying REMO on a BlueGene/P rack running IBM's large-scale distributed streaming system - System S. Using REMO running over 200 monitoring tasks for an application deployed across 200 nodes results in a 35%-45% decrease in the percentage error of collected attributes compared to existing schemes.

Keywords:
Computer science Scalability Distributed computing Task (project management) Resource (disambiguation) Tree (set theory) State (computer science) IBM Node (physics) Scale (ratio) System monitoring Real-time computing Database Computer network Systems engineering

Metrics

27
Cited By
9.79
FWCI (Field Weighted Citation Impact)
29
Refs
0.98
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Cloud Computing and Resource Management
Physical Sciences →  Computer Science →  Information Systems
Data Stream Mining Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Peer-to-Peer Network Technologies
Physical Sciences →  Computer Science →  Computer Networks and Communications

Related Documents

JOURNAL ARTICLE

Resource-Aware Application State Monitoring

Shicong MengSrinivas KashyapChitra VenkatramaniLing Liu

Journal:   IEEE Transactions on Parallel and Distributed Systems Year: 2012 Vol: 23 (12)Pages: 2315-2329
JOURNAL ARTICLE

DRACO: Distributed Resource-aware Admission Control for large-scale, multi-tier systems

Domenico CotroneoRoberto NatellaStefano Rosiello

Journal:   Journal of Parallel and Distributed Computing Year: 2024 Vol: 192 Pages: 104935-104935
JOURNAL ARTICLE

Resource-Aware Distributed Scheduling Strategies for Large-Scale Computational Cluster/Grid Systems

Siva ViswanathanBharadwaj VeeravalliThomas G. Robertazzi

Journal:   IEEE Transactions on Parallel and Distributed Systems Year: 2007 Vol: 18 (10)Pages: 1450-1461
© 2026 ScienceGate Book Chapters — All rights reserved.