Computer scienceSystem administration and DevOpsCI/CD processes

Overview of Prometheus

8 minutes read

Modern production environments include multiple servers and containers that run application services, databases, and supporting infrastructure. As traffic flows in from many users, finding the root cause of an issue—whether it's an application error, resource bottleneck, or latency spike—can become challenging. To understand what's happening across your stack, you need a unified view of application-level errors, system resource usage, response times, and other metrics.

Let's see how Prometheus can help.

What is Prometheus?

In simple terms, Prometheus is an event monitoring and alerting system written in Go. It pulls metrics from monitored systems (called targets) at regular intervals and stores the data as time-series data. The metrics include a timestamp and optional key-value pairs known as labels.

Prometheus has various components and features:

HTTP server — the main Prometheus server scrapes data and stores it in a time series database.
Client libraries — enable your code to emit Prometheus metrics directly, which can then be scraped.
Exporters — help instrument third-party systems that you cannot directly monitor with Prometheus.
Push gateway — short-lived jobs that terminate before Prometheus scrapes them can send metrics to the push gateway before exit, and Prometheus will collect metrics from there.
Alert manager — processes alerts from the Prometheus server and performs various operations, including sending them to configured receivers, such as email, Slack, and others.
Service discovery — helps Prometheus find and scrape endpoints that expose metrics. This is particularly useful for dynamic environments, like Kubernetes and AWS EC2 instances, where targets scale up and down.
PromQL — an SQL-like language for querying time-series data. You can use it to aggregate, filter, and compute rates. It includes expressive functions to analyze metrics (e.g., calculating 95th percentiles, sums, averages, or detecting spikes).
Storage — Prometheus uses a local time-series database that stores data in compressed two-hour blocks. For long-term or scalable storage, you can also write collected metrics to external TSDBs such as Thanos.

Here's a diagram showing various components of Prometheus:

Components of Prometheus

The main Prometheus server scrapes data from various targets, the push gateway, and discovered targets, storing it as time series data. You can query or visualize this data through Grafana or other API clients. If you’ve configured alerting, Prometheus sends alerts to an alert manager when specific thresholds are reached, which performs various actions such as grouping, silencing, or even forwarding these alerts to your team via Slack, PagerDuty, or Email.

Installing Prometheus

To get Prometheus up and running, simply download the compressed file, extract it, and run the prometheus or prometheus.exe (for Windows) binary. To verify, you can run the following commands:

./prometheus --help # to view all available options
# output
usage: prometheus [<flags>]

The Prometheus monitoring server
# ...

./prometheus --version
# output
prometheus, version 3.4.1 (branch: HEAD, revision: ...)
  build user:       root@16f976c24db1
  build date:       20250531-10:44:38
  go version:       go1.24.3
  platform:         linux/amd64
  tags:             netgo,builtinassets,stringlabels

You must run these commands in the directory where you extracted the file. To run Prometheus from any location, add it to the system PATH. If you have Docker installed, you can use these commands instead:

# persistent volume
docker volume create prometheus-data

# run Prometheus Docker container
docker run \
    -p 9090:9090 \
    -v prometheus-data:/prometheus \
    prom/prometheus

First, we create a persistent volume to store Prometheus data. Then, we run a command that pulls the latest version of Prometheus from Docker Hub and runs the container.

You need to install additional components separately from the main Prometheus server. These include alert managers, the push gateway, exporters, and other components. All these components come as pre-compiled binaries that you can run directly. Visit the downloads page to see all available components.

Configuring Prometheus

We've seen that Prometheus pulls metrics from targets and can send alerts. But how does it know which targets to scrape data from or where to send alerts? How long should it wait before pulling data from targets? You configure these and other options using a special YAML file. For example, the prometheus.yaml file you found when you extracted the compressed file earlier.

Let's examine some options you may configure in this file:

global:
  scrape_interval: 15s
  scrape_timeout: 10s
  evaluation_interval: 30s

This sets the default settings that apply to all instances. Here, we specify the interval at which Prometheus should pull metrics, the timeout period before Prometheus cancels a probe request, and how often it should evaluate alerting or monitoring rules. Some options, like the scrape interval, can be overridden at the job level.

external_labels:
  monitor: 'docker-monitor'

This instructs Prometheus to attach a fixed, "external" label to all metrics and alerts that this Prometheus instance sends out. This helps you identify which instance sent which metrics/alerts when you view them in Alertmanager or other Prometheus instances.

rule_files:
  - 'alert.rules.yml'

rule_files directs Prometheus to one or more files containing monitoring and alerting rules. Prometheus re-evaluates these rules every 30 seconds (based on the evaluation_interval we set). In this case, Prometheus will load rules defined in alert.rules.yml. In that file, you might find:

groups:
  - name: hyper
    rules:
      - alert: HighCPUUsage
        expr: node_cpu_seconds_total{job="node_exporter"} > 0.9
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "CPU usage is >90% on {{ $labels.instance }}"

This block defines a single alert named HighCPUUsage that triggers if any node_cpu_seconds_total{job="node_exporter"} metric stays above 0.9 for 5 continuous minutes. When triggered, it tags the alert with severity=warning and adds a human-readable summary "CPU usage is >90% on <instance>" (where <instance> is replaced by the actual host label).

Next, let's examine an alert configuration:

alerting:
  alertmanagers:
    - static_configs:
        - targets: ['alertmanager:9093']

This configuration tells Prometheus where to send triggered alerts. alertmanagers lists alert manager instances. Prometheus sends any triggered alerts (based on your rules) to these endpoints. With static_configs, we statically list alert targets. In this case, we specify an alert manager instance running at alertmanager:9093.

When an alert defined in alert.rules.yml triggers (e.g., CPU > 90% for 5 minutes), Prometheus sends the alert as JSON to http://alertmanager:9093/api/v1/alerts. Alertmanager will take it from there based on how you configured it.

Now, let's look at targets. You configure these using the scrape_configs key:

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'node_exporter'
    static_configs:
      - targets: ['node-exporter:9100']

Each - job_name: ... block defines one "scrape job." A scrape job consists of targets (endpoints) that expose metrics in Prometheus's exposition format. Prometheus periodically (every scrape_interval) makes an HTTP request to each target's /metrics endpoint (by default) and collects available metrics. Here, we are scraping data from Prometheus itself and Node exporter, which monitors a Unix host.

In summary, this configuration tells Prometheus to:

Scrape itself and Node Exporter every 15 seconds.
Use alert rules from alert.rules.yml and evaluate them every 30 seconds.
Send triggered alerts to Alertmanager at alertmanager:9093.
Tag every metric and alert with monitor: 'docker-monitor' for identification.

Other important configurations include:

Storage: parameters for storage, including retention periods, storage paths, and policies.
Service discovery — settings for service discovery mechanisms (like Docker or Kubernetes service discovery). These settings vary by service.

There are many more options you can define in the configuration file. Check the documentation to view all options or for reference.

A closer look at metrics

Prometheus metrics are divided into four primary types: counter, gauge, histogram, and summary. Each type serves a specific monitoring purpose. Let's examine them in detail.

A counter is a cumulative metric that only increases or resets to zero on restart. It's ideal for tracking totals like requests served, tasks completed, or errors encountered. You might see this metric:

http_requests_total{method="POST", handler="/api"}

This metric tracks the total number of HTTP POST requests to the /api endpoint. Using PromQL, you can analyze the rate of increase over time:

rate(http_requests_total[5m])

This query calculates the per-second rate of HTTP requests over the last 5 minutes. The next one, a gauge, represents a single value that can increase or decrease. It works best for measuring current values like CPU or memory usage, and for counting variable quantities like concurrent requests. Here's an example:

memory_usage_bytes{instance="hyper-server"}

This metric shows the current memory usage in bytes for hyper-server. To find the maximum memory usage over a period, you can use PromQL:

max_over_time(memory_usage_bytes[1h])

This query returns the maximum memory usage recorded in the last hour. To understand your system's latency, a histogram can help. It categorizes observed values into predefined, cumulative buckets, enabling analysis of value distributions over time. You get a sum of all values and a total observation count.

http_request_duration_seconds_bucket{le="0.5", method="GET"}

This metric counts the number of HTTP GET requests that took less than or equal to 0.5 seconds. To calculate the 95th percentile latency:

histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le))

This query estimates the 95th percentile of request durations over the last 5 minutes.

The final type is the summary metric. It calculates quantiles (like median or 90th percentile) on the client side. Like a histogram, it tracks both the total count and the sum of observations. It's useful for monitoring metrics where precise quantile estimation is necessary. Here's a metric that represents the 99th percentile of RPC durations:

rpc_duration_seconds{quantile="0.99"}

Since quantiles are calculated on the client side, you cannot directly aggregate across multiple instances. However, you can still analyze the sum and count:

sum(rate(rpc_duration_seconds_sum[5m])) / sum(rate(rpc_duration_seconds_count[5m]))

This query computes the average RPC duration over the last 5 minutes.

Using Prometheus

Now that we understand Prometheus's architecture, configuration, and metric types, let's explore how to use it in your application. Prometheus collects metrics by exposing them over HTTP and scraping them at regular intervals.

The first step is to instrument your application. If you have access to the source code, you can use client libraries (available for Go, Java, Python, and other languages) to define metrics like counters, gauges, histograms, and summaries. The application then serves these metrics at the /metrics endpoint. Here's an example for a Flask application:

from flask import Flask, request, Response
from prometheus_client import Counter, generate_latest, CONTENT_TYPE_LATEST

app = Flask(__name__)

# request counter for total HTTP requests, labeled by method and endpoint
REQUEST_COUNTER = Counter(
    "app_http_requests_total",
    "Total HTTP requests",
    ["method", "endpoint"]
)

@app.after_request
def record_request(response):
    # increment counter for each request
    REQUEST_COUNTER.labels(method=request.method, endpoint=request.path).inc()
    return response

@app.route("/hello")
def hello():
    return "Hello, Prometheus!"

@app.route("/metrics")
def metrics():
    # expose all registered metrics to Prometheus
    return Response(generate_latest(), mimetype=CONTENT_TYPE_LATEST)

if __name__ == "__main__":
    app.run(host="0.0.0.0", port=8000)

REQUEST_COUNTER tracks the total number of HTTP requests, labeled by method and endpoint. When any route returns a response, after_request increments the counter. The /metrics endpoint returns all metrics in Prometheus format.

For services you cannot modify, you should deploy an exporter (e.g., node_exporter for host metrics, mysqld_exporter for MySQL, or any of the others, depending on your use case) that converts stats into Prometheus format.

The next step is to add your app or exporter to your scrape_config in prometheus.yml. Here's how to do it for the Flask app:

scrape_configs:
  - job_name: "flask_app"
    static_configs:
      - targets: ["localhost:8000"]

Prometheus will pull metrics from http://localhost:8000/metrics every scrape_interval (e.g., 15s). You can view the metrics through the built-in expression browser, or if you use Grafana, simply add Prometheus as a data source (http://:9090). Then, use PromQL (e.g., rate(http_requests_total[5m])) in panels to visualize request rates, latency distributions, CPU/memory, and more.

Keep in mind that Prometheus might not meet all your monitoring needs. For instance, if you need to track individual users and tracing, you might need to consider other tracing and observability tools. However, Prometheus excels at handling time-series metrics (CPU, memory, request rates, custom counters) and querying these metrics.

Conclusion

Prometheus streamlines monitoring for modern distributed applications by pulling time-series metrics from configured targets at regular intervals. Its flexible data model combines counters, gauges, histograms, and summaries to capture everything from HTTP request rates and error counts to CPU utilization and memory consumption.

With PromQL, you can answer crucial questions like "which instance has the highest 95th-percentile latency?" or "how many errors occurred in our payment services in the last 15 minutes?" Computing rates, percentiles, and aggregates in real time helps you detect anomalies and trigger alerts quickly.

Key features include:

Service discovery — automatically discover targets in dynamic environments (e.g., Kubernetes, AWS EC2, Docker Swarm).
Exporters — integrate with third-party systems (e.g., MySQL, Redis, Linux hosts) to expose metrics in Prometheus format.
Push gateway — a place for short-lived jobs to send their metrics.
Client libraries — instrument your own code to expose various custom metrics.
Alertmanager — have Prometheus push alerts to here, which can then route notifications to Slack, email, and others.

To get started, add scrape_configs—a list of targets (such as your instrumented application, exporters, and push gateway) in your prometheus.yml file. You can then add threshold-based alert managers and use Grafana for visualizing metrics.

While Prometheus isn't a replacement for distributed tracing or centralized log aggregation, it excels at collecting high-cardinality, real-time metrics. This capability helps you monitor your services' performance in real time, enabling proactive troubleshooting and continuous optimization.

3 learners liked this piece of theory. 1 didn't like it. What about you?

Report a typo