Understanding observability and building an observability pipeline
With complexity of systems growing, Observability is becoming more important. To understand the need, let us take an example of an e-commerce company using microservices. Core functions like user management, product showcase, order processing, and payment are separate services. With services spread across environments, identifying issues like slowdowns, failures, or errors can be challenging. Companies need a way to quickly detect, diagnose, and fix these problems. This is where observability helps.
In this article, we will understand the basics of observability and how we can build an observability pipeline using Fluent Bit.
What is observability?
Observability helps us understand how a system works by looking at its output. In control theory, it means how well we can figure out the internal state of a system from its external outputs. It helps engineers improve systems based on the data they produce.
Some key components in observability are:
Logs- These are records that describe events that have occurred with the system. Logs provide details of what has happened along with timestamps, errors and other system events.
Metrics - Metrics help in measuring various aspects of the system performance and health over time. It includes data such as CPU usage, memory consumption, etc.
Traces - Traces track the lifecycle of a request as it moves through various services and components of a system. It helps in understanding the behavior of distributed systems.
For observability, we need to build a pipeline and in this article we will see how to do it using Fluent Bit.
What is Fluent Bit?
Fluent Bit is a fast and lightweight Telemetry agent for Logs, Metrics, and Traces for Linux, macOS, Windows, and BSD family operating systems. It allows to collect log events or metrics from different sources, process them and deliver them to different backends such as Fluentd, Elasticsearch, Splunk, DataDog, Kafka, New Relic, Azure services, AWS services, Google services, NATS, InfluxDB or any custom HTTP end-point.
Some benefits of using Fluent bit are:
It can read from local files and network devices, and can scrape metrics in the Prometheus format from the server.
It has built-in reliability, which means if we hit a network or server outage
we will be able to resume from where we left off without data loss.Fluent Bit can send data to a multitude of locations, including popular destinations like Splunk, Elasticsearch, OpenSearch, Kafka, and more.
Understanding what observability pipeline is.
Let us go back to our ecommerce example, for days such as end of season sale
, and Flipkart's big billion days
, there is a huge spike in traffic, which can lead to performance issues. To tackle this, an observability pipeline is built. It is a system that collects, processes, and analyzes data from various sources, including logs, metrics, and traces, to provide insights into the performance and behavior of a distributed system. It helps the company to monitor their applications in real-time, detect anomalies, and troubleshoot issues faster.
Building an observability pipeline.
Now that we know why an observability pipeline is needed and how it helps. Let us try and build one using Fluent Bit.
The first step is to identify what data to collect and determine output targets.
Data collection:
Logs: System logs, application logs, error logs.Metrics: CPU usage, memory usage, request count, error rate, response times.
Output Targets:
Monitoring Tools: Prometheus (for metrics).
Analytics Platforms: Elasticsearch (for logs).
The next step is to install Fluent Bit. Use this command below to install in Linux based systems.
sudo apt-get install td-agent-bit
There are multiple ways to install Fluent Bit, learn more here.
The next step is to configure Fluent Bit for data collection. To collect logs from system and applications, we need to configure inputs in the Fluent Bit configuration file (
fluent-bit.conf
) as shown below:[INPUT] Name tail Path /var/log/syslog Parser syslog [INPUT] Name tail Path /var/log/myapp.log Parser json
Fluent Bit can also be used to scrape metrics. For that we need to use a tool like Node Exporter, which can be scraped by Prometheus directly.
Here’s how we configure Fluent Bit to collect metrics:
[INPUT] Name dummy Tag dummy.metrics Rate 1
Note: For demo purposes, we have used dummy metrics.
The next step is to configure parsers and filters.
Parsers: Parsers help in interpreting the log formats. For system logs, we might use the built-in
syslog
parser, and for JSON logs, thejson
parser. This helps in structuring the data properly for further processing.[PARSER] Name syslog Format regex Regex ^\<(?<pri>[0-9]+)\>(?<time>[^ ]+) (?<host>[^ ]+) (?<ident>[^:]+): (?<message>.+)$ [PARSER] Name json Format json
Filters: Filters add enrichment or transform data. This can update the logs with additional context, making them more useful for analysis.
For example, here's how we can add hostname or modify message content:
[FILTER] Name modify Match * Add hostname ${HOSTNAME}
Now that the input is setup, the next step is to integrate output. Will see how to do it with Elasticsearch and Prometheus.
InstallingElasticsearch:
Elasticsearch can be installed on local machine or using a managed service like Amazon Elasticsearch Service.
To install locally, use the command:
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.10.1-amd64.deb sudo dpkg -i elasticsearch-7.10.1-amd64.deb
Elasticsearch setup:
[OUTPUT] Name es Match * Host localhost Port 9200 Index fluentbit Type _doc Logstash_Format On
Installing Prometheus:
Prometheus can be installed using precompiled binaries and docker images or using Docker.
Prometheus setup:
FluentBit does not natively support outputting metrics to Prometheus, but it can export metrics that Prometheus can scrape:
[SERVICE] HTTP_Server On HTTP_Listen 0.0.0.0 HTTP_Port 2020
Once the above steps are done, the final part is to perform tests. We can do that by injecting test logs and metrics into our inputs and ensuring they appear correctly in Elasticsearch and Prometheus. We can check the outputs in Elasticsearch Kibana and Prometheus dashboard to ensure data integrity and proper indexing.
Optimizing the pipeline.
Now that we have learnt how to build an observability pipeline, it is important to know how to optimize performance and secure for maintaining efficiency and integrity.
To improve performance, we can optimize input and output plugin configurations, streamline data parsing, and manage memory and CPU resources through appropriate buffering and batching. To improve security, we can implement encryption for data in transit, and use secure authentication methods.
Conclusion.
In this article, we have learnt all about observability, with real-life examples, use cases and how we can build an observability pipeline for our use case.
Here are some resources to learn more about observability: