Publications

Stats

View publication

Title The Diagnosis-Effective Sampling of Application Traces
Authors Arnak Poghosyan, Ashot Harutyunyan, Edgar Davtyan, Karen Petrosyan, Nelson Baloian
Publication date July 2024
Abstract Distributed tracing is cutting-edge technology used for
monitoring, managing, and troubleshooting native cloud applications. It
offers a more comprehensive and continuous observability, surpassing
traditional logging methods, and is indispensable for navigating modern
complex software architectures. However, the sheer volume of generated
traces is staggering in distributed applications, and the direct storage and
utilization of every trace is impractical due to associated operational
costs. This entails a sampling strategy to select which traces warrant
storage and analysis. Historically, sampling methods have included a
rate-based approach, often relying heavily on a manual configuration. There
is a need for a more intelligent approach, and we propose a hierarchical
sampling methodology to address multiple requirements concurrently. Initial
rate-based sampling mitigates the overwhelming volume of traces, as no
further analysis can be performed on this level. In the next stage, more
nuanced analysis is facilitated based on the previous foundation,
incorporating information regarding trace properties and ensuring the
preservation of vital process details even under extreme conditions. This
comprehensive approach not only aids in the visualization and
conceptualization of applications but also enables more targeted analysis in
later stages. As we delve deeper into the sampling hierarchy, the technique
becomes tailored to specific purposes, such as the simplification of
application troubleshooting. In this context, the sampling strategy
prioritizes the retention of erroneous traces from dominant processes, thus
facilitating the identification and resolution of underlying issues. The
focus of this paper is to reveal the impact of sampling on troubleshooting
efficiency. Leveraging intelligent and explainable artificial intelligence
solutions enables the detection of malfunctioning microservices and provides
transparent insights into root causes. We advocate for using rule-induction
systems, which offer explainability and efficacy in decision-making
processes. By integrating advanced sampling techniques with
machine-learning-driven intelligence, we empower organizations to navigate
the complexities of large-scale distributed cloud environments
effectively.
Pages article 5779
Volume 14
Journal name Applied Sciences
Publisher Molecular Diversity Preservation International (Basel, Switzerland)
Reference URL View reference page