A Realistic and Reliable Approach for Real Time Monitoring of Infrastructure
Abstract
Rapid advancements in the field of site reliability engineering in the past few decades have made it an indispensable part of how humans create scalable and highly reliable software systems. The inherent need of a reliable site Infrastructure Monitoring System brings with it a great set of advantages in terms of monitoring availability, latency, performance and efficiency of the services. The main objective of this work is to design and implement an intuitive Real-Time Infrastructure Monitoring System (RIMS) that acts as a one-stop shop for keeping track of overall system’s health, services, downtimes and availability. RIMS extract a wide variety of machine log data and transforms it into consistent and presentable format for the end-users. Data-driven business decisions using RIMS monitoring technique were reached and 98.46% availability of Site Systems was captured using all-time data. The learning curve of RIMS aims for higher SLOs over a period of its functioning.