Setting Up Prometheus + Grafana to Monitor Your Scraping Fleet

The Scraper

Last updated on May 5, 2026

Setup Guides

You added alerting. You check the logs. But you have no idea whether your scrapers are healthy until a downstream team files a ticket. By then, you've been serving stale data for six hours.

This guide builds a Prometheus + Grafana monitoring stack for a scraping fleet, one that catches soft blocks, data quality degradation, and proxy failures before they reach production.


Prerequisites

  • Docker and Docker Compose

  • Python scraper(s) exposing a metrics endpoint (or adaptable to do so)

  • 30 minutes


Architecture


[Scraper Workers] expose /metrics  [Prometheus]  [Grafana dashboards + alerts]
                                           
                                    [Alertmanager]
                                           
                                  [PagerDuty / Slack]


Prometheus scrapes your scraper workers' /metrics endpoints on a configurable interval. Grafana queries Prometheus for dashboards and alert evaluation.


Step 1: Instrument Your Scrapers

Add metrics to your scraper workers using the prometheus_client library:


# metrics.py
from prometheus_client import (
    Counter, Histogram, Gauge, Summary,
    start_http_server, REGISTRY
)

# Request metrics
requests_total = Counter(
    'scraper_requests_total',
    'Total HTTP requests made',
    ['target', 'status_code']
)

request_duration = Histogram(
    'scraper_request_duration_seconds',
    'HTTP request duration',
    ['target'],
    buckets=[0.1, 0.5, 1.0, 2.0, 5.0, 10.0, 30.0]
)

# Data quality metrics
records_extracted = Counter(
    'scraper_records_extracted_total',
    'Records successfully extracted',
    ['target']
)

schema_failures = Counter(
    'scraper_schema_failures_total',
    'Schema validation failures',
    ['target', 'field']
)

price_anomalies = Counter(
    'scraper_price_anomalies_total',
    'Price range guard violations',
    ['target', 'category']
)

null_field_rate = Gauge(
    'scraper_null_field_rate',
    'Proportion of records with null for a given field (soft block indicator)',
    ['target', 'field']
)

# Data freshness
change_rate = Gauge(
    'scraper_data_change_rate',
    'Proportion of records that changed vs. last run',
    ['target']
)

# Proxy metrics
proxy_success_rate = Gauge(
    'scraper_proxy_success_rate',
    'Success rate for current proxy pool',
    ['target', 'proxy_pool']
)

active_sessions = Gauge(
    'scraper_active_sessions',
    'Number of active browser/HTTP sessions',
    ['target']
)

# Job metrics
job_duration = Summary(
    'scraper_job_duration_seconds',
    'Total job duration',
    ['target', 'job_type']
)

last_successful_run = Gauge(
    'scraper_last_successful_run_timestamp',
    'Unix timestamp of last successful run',
    ['target']
)

def start_metrics_server(port: int = 8000):
    """Start Prometheus metrics HTTP server."""
    start_http_server(port)
    print(f"Metrics server started on :{port}/metrics")


Integrate into your scraper:


# In your scraper worker
from metrics import (
    requests_total, request_duration, records_extracted,
    schema_failures, null_field_rate, last_successful_run,
    start_metrics_server
)
import time

start_metrics_server(8000)

async def fetch_and_extract(url: str, target: str):
    start = time.time()
    try:
        html = await fetch_page(url)
        requests_total.labels(target=target, status_code='200').inc()
    except httpx.HTTPStatusError as e:
        requests_total.labels(target=target, status_code=str(e.response.status_code)).inc()
        raise
    finally:
        request_duration.labels(target=target).observe(time.time() - start)

    record = parse_product_page(url, html)
    if record:
        records_extracted.labels(target=target).inc()
    else:
        schema_failures.labels(target=target, field='parse_failure').inc()

    return record



Step 2: Docker Compose Setup


# docker-compose.yml
version: '3.8'

services:
  prometheus:
    image: prom/prometheus:v2.51.0
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
      - ./alerts.yml:/etc/prometheus/alerts.yml
      - prometheus_data:/prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.retention.time=30d'

  grafana:
    image: grafana/grafana:10.4.0
    ports:
      - "3000:3000"
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=your_password
    volumes:
      - grafana_data:/var/lib/grafana
      - ./grafana/dashboards:/etc/grafana/provisioning/dashboards
      - ./grafana/datasources:/etc/grafana/provisioning/datasources
    depends_on:
      - prometheus

  alertmanager:
    image: prom/alertmanager:v0.27.0
    ports:
      - "9093:9093"
    volumes:
      - ./alertmanager.yml:/etc/alertmanager/alertmanager.yml

volumes:
  prometheus_data:
  grafana_data:



Step 3: Prometheus Configuration


# prometheus.yml
global:
  scrape_interval: 30s
  evaluation_interval: 30s

alerting:
  alertmanagers:
    - static_configs:
        - targets: ['alertmanager:9093']

rule_files:
  - 'alerts.yml'

scrape_configs:
  # Scraper worker metrics
  - job_name: 'scraper_workers'
    static_configs:
      - targets:
          - 'scraper-worker-1:8000'
          - 'scraper-worker-2:8000'
          - 'scraper-worker-3:8000'
    # Or use service discovery:
    # file_sd_configs:
    #   - files: ['scraper_targets.json']



Step 4: Alerting Rules


# alerts.yml
groups:
  - name: scraper_alerts
    rules:

      # Stale data: scraper hasn't completed a successful run in 2 hours
      - alert: ScraperStaleData
        expr: (time() - scraper_last_successful_run_timestamp) > 7200
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Scraper {{ $labels.target }} hasn't run successfully in 2h"
          description: "Last successful run was {{ $value | humanizeDuration }} ago"

      # High null rate: likely soft block or schema change
      - alert: HighNullFieldRate
        expr: scraper_null_field_rate > 0.15
        for: 10m
        labels:
          severity: critical
        annotations:
          summary: "High null rate on {{ $labels.target }}.{{ $labels.field }}"
          description: "{{ $value | humanizePercentage }} of records have null {{ $labels.field }} — possible soft block"

      # Price anomaly spike
      - alert: PriceAnomalySpike
        expr: rate(scraper_price_anomalies_total[10m]) > 0.1
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Price anomaly rate spike on {{ $labels.target }}"

      # Proxy success rate degradation
      - alert: ProxySuccessRateLow
        expr: scraper_proxy_success_rate < 0.7
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Proxy pool success rate below 70% for {{ $labels.target }}"
          description: "Current rate: {{ $value | humanizePercentage }}"

      # Schema failures sudden jump suggests layout change
      - alert: SchemaFailureSpike
        expr: rate(scraper_schema_failures_total[5m]) > 1
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Schema failure rate elevated on {{ $labels.target }}"



Step 5: Alertmanager Configuration


# alertmanager.yml
global:
  slack_api_url: 'https://hooks.slack.com/services/YOUR/WEBHOOK/URL'

route:
  group_by: ['alertname', 'target']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 1h
  receiver: 'slack-scraper-alerts'

  routes:
    - match:
        severity: critical
      receiver: 'pagerduty-oncall'

receivers:
  - name: 'slack-scraper-alerts'
    slack_configs:
      - channel: '#scraper-alerts'
        title: 'Scraper Alert: {{ .GroupLabels.alertname }}'
        text: '{{ range .Alerts }}{{ .Annotations.description }}{{ end }}'

  - name: 'pagerduty-oncall'
    pagerduty_configs:
      - routing_key: 'YOUR_PAGERDUTY_KEY'



Step 6: Grafana Dashboard

Provision a dashboard automatically by adding a JSON file to ./grafana/dashboards/. Key panels to include:

Panel 1: Request Success Rate by Target


rate(scraper_requests_total{status_code="200"}[5m]) /
rate(scraper_requests_total[5m])


Panel 2: Null Field Rate Heatmap


scraper_null_field_rate


Panel 3: Data Change Rate (Freshness)


scraper_data_change_rate


Panel 4: Proxy Pool Success Rate


scraper_proxy_success_rate


Panel 5: Records Extracted per Minute


rate(scraper_records_extracted_total[5m]) * 60



Screenshot: Grafana dashboard with 5-panel layout — success rate, null rates, change rate, proxy health, throughput


Common Pitfalls

Pitfall 1: Scraping Prometheus instead of pushing. For short-lived jobs (one-shot scripts, not long-running services), push metrics to a Pushgateway rather than expecting Prometheus to scrape a /metrics endpoint that may be gone before the next scrape interval.

Pitfall 2: High cardinality labels. Don't use URL as a label value, that creates one time series per unique URL, which can be millions. Keep label cardinality bounded: target, status_code, field, proxy_pool.

Pitfall 3: Alert fatigue. Start with only the critical alerts (stale data, high null rate). Add more only after the critical ones have proven useful. Too many alerts = ignored alerts.


Conclusion

A working Prometheus + Grafana stack for scraping takes a few hours to set up and prevents the class of failures where everything looks fine until it isn't. The null rate alert alone is worth the setup time, it's the metric that catches soft blocks before your downstream consumers do.

Pair this with clean proxy infrastructure from Evomi and you'll see the proxy success rate metrics staying consistently high, a baseline that makes the anomalies meaningful when they do appear. You can test Evomi's proxies with a completely free trial.

You added alerting. You check the logs. But you have no idea whether your scrapers are healthy until a downstream team files a ticket. By then, you've been serving stale data for six hours.

This guide builds a Prometheus + Grafana monitoring stack for a scraping fleet, one that catches soft blocks, data quality degradation, and proxy failures before they reach production.


Prerequisites

  • Docker and Docker Compose

  • Python scraper(s) exposing a metrics endpoint (or adaptable to do so)

  • 30 minutes


Architecture


[Scraper Workers] expose /metrics  [Prometheus]  [Grafana dashboards + alerts]
                                           
                                    [Alertmanager]
                                           
                                  [PagerDuty / Slack]


Prometheus scrapes your scraper workers' /metrics endpoints on a configurable interval. Grafana queries Prometheus for dashboards and alert evaluation.


Step 1: Instrument Your Scrapers

Add metrics to your scraper workers using the prometheus_client library:


# metrics.py
from prometheus_client import (
    Counter, Histogram, Gauge, Summary,
    start_http_server, REGISTRY
)

# Request metrics
requests_total = Counter(
    'scraper_requests_total',
    'Total HTTP requests made',
    ['target', 'status_code']
)

request_duration = Histogram(
    'scraper_request_duration_seconds',
    'HTTP request duration',
    ['target'],
    buckets=[0.1, 0.5, 1.0, 2.0, 5.0, 10.0, 30.0]
)

# Data quality metrics
records_extracted = Counter(
    'scraper_records_extracted_total',
    'Records successfully extracted',
    ['target']
)

schema_failures = Counter(
    'scraper_schema_failures_total',
    'Schema validation failures',
    ['target', 'field']
)

price_anomalies = Counter(
    'scraper_price_anomalies_total',
    'Price range guard violations',
    ['target', 'category']
)

null_field_rate = Gauge(
    'scraper_null_field_rate',
    'Proportion of records with null for a given field (soft block indicator)',
    ['target', 'field']
)

# Data freshness
change_rate = Gauge(
    'scraper_data_change_rate',
    'Proportion of records that changed vs. last run',
    ['target']
)

# Proxy metrics
proxy_success_rate = Gauge(
    'scraper_proxy_success_rate',
    'Success rate for current proxy pool',
    ['target', 'proxy_pool']
)

active_sessions = Gauge(
    'scraper_active_sessions',
    'Number of active browser/HTTP sessions',
    ['target']
)

# Job metrics
job_duration = Summary(
    'scraper_job_duration_seconds',
    'Total job duration',
    ['target', 'job_type']
)

last_successful_run = Gauge(
    'scraper_last_successful_run_timestamp',
    'Unix timestamp of last successful run',
    ['target']
)

def start_metrics_server(port: int = 8000):
    """Start Prometheus metrics HTTP server."""
    start_http_server(port)
    print(f"Metrics server started on :{port}/metrics")


Integrate into your scraper:


# In your scraper worker
from metrics import (
    requests_total, request_duration, records_extracted,
    schema_failures, null_field_rate, last_successful_run,
    start_metrics_server
)
import time

start_metrics_server(8000)

async def fetch_and_extract(url: str, target: str):
    start = time.time()
    try:
        html = await fetch_page(url)
        requests_total.labels(target=target, status_code='200').inc()
    except httpx.HTTPStatusError as e:
        requests_total.labels(target=target, status_code=str(e.response.status_code)).inc()
        raise
    finally:
        request_duration.labels(target=target).observe(time.time() - start)

    record = parse_product_page(url, html)
    if record:
        records_extracted.labels(target=target).inc()
    else:
        schema_failures.labels(target=target, field='parse_failure').inc()

    return record



Step 2: Docker Compose Setup


# docker-compose.yml
version: '3.8'

services:
  prometheus:
    image: prom/prometheus:v2.51.0
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
      - ./alerts.yml:/etc/prometheus/alerts.yml
      - prometheus_data:/prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.retention.time=30d'

  grafana:
    image: grafana/grafana:10.4.0
    ports:
      - "3000:3000"
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=your_password
    volumes:
      - grafana_data:/var/lib/grafana
      - ./grafana/dashboards:/etc/grafana/provisioning/dashboards
      - ./grafana/datasources:/etc/grafana/provisioning/datasources
    depends_on:
      - prometheus

  alertmanager:
    image: prom/alertmanager:v0.27.0
    ports:
      - "9093:9093"
    volumes:
      - ./alertmanager.yml:/etc/alertmanager/alertmanager.yml

volumes:
  prometheus_data:
  grafana_data:



Step 3: Prometheus Configuration


# prometheus.yml
global:
  scrape_interval: 30s
  evaluation_interval: 30s

alerting:
  alertmanagers:
    - static_configs:
        - targets: ['alertmanager:9093']

rule_files:
  - 'alerts.yml'

scrape_configs:
  # Scraper worker metrics
  - job_name: 'scraper_workers'
    static_configs:
      - targets:
          - 'scraper-worker-1:8000'
          - 'scraper-worker-2:8000'
          - 'scraper-worker-3:8000'
    # Or use service discovery:
    # file_sd_configs:
    #   - files: ['scraper_targets.json']



Step 4: Alerting Rules


# alerts.yml
groups:
  - name: scraper_alerts
    rules:

      # Stale data: scraper hasn't completed a successful run in 2 hours
      - alert: ScraperStaleData
        expr: (time() - scraper_last_successful_run_timestamp) > 7200
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Scraper {{ $labels.target }} hasn't run successfully in 2h"
          description: "Last successful run was {{ $value | humanizeDuration }} ago"

      # High null rate: likely soft block or schema change
      - alert: HighNullFieldRate
        expr: scraper_null_field_rate > 0.15
        for: 10m
        labels:
          severity: critical
        annotations:
          summary: "High null rate on {{ $labels.target }}.{{ $labels.field }}"
          description: "{{ $value | humanizePercentage }} of records have null {{ $labels.field }} — possible soft block"

      # Price anomaly spike
      - alert: PriceAnomalySpike
        expr: rate(scraper_price_anomalies_total[10m]) > 0.1
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Price anomaly rate spike on {{ $labels.target }}"

      # Proxy success rate degradation
      - alert: ProxySuccessRateLow
        expr: scraper_proxy_success_rate < 0.7
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Proxy pool success rate below 70% for {{ $labels.target }}"
          description: "Current rate: {{ $value | humanizePercentage }}"

      # Schema failures sudden jump suggests layout change
      - alert: SchemaFailureSpike
        expr: rate(scraper_schema_failures_total[5m]) > 1
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Schema failure rate elevated on {{ $labels.target }}"



Step 5: Alertmanager Configuration


# alertmanager.yml
global:
  slack_api_url: 'https://hooks.slack.com/services/YOUR/WEBHOOK/URL'

route:
  group_by: ['alertname', 'target']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 1h
  receiver: 'slack-scraper-alerts'

  routes:
    - match:
        severity: critical
      receiver: 'pagerduty-oncall'

receivers:
  - name: 'slack-scraper-alerts'
    slack_configs:
      - channel: '#scraper-alerts'
        title: 'Scraper Alert: {{ .GroupLabels.alertname }}'
        text: '{{ range .Alerts }}{{ .Annotations.description }}{{ end }}'

  - name: 'pagerduty-oncall'
    pagerduty_configs:
      - routing_key: 'YOUR_PAGERDUTY_KEY'



Step 6: Grafana Dashboard

Provision a dashboard automatically by adding a JSON file to ./grafana/dashboards/. Key panels to include:

Panel 1: Request Success Rate by Target


rate(scraper_requests_total{status_code="200"}[5m]) /
rate(scraper_requests_total[5m])


Panel 2: Null Field Rate Heatmap


scraper_null_field_rate


Panel 3: Data Change Rate (Freshness)


scraper_data_change_rate


Panel 4: Proxy Pool Success Rate


scraper_proxy_success_rate


Panel 5: Records Extracted per Minute


rate(scraper_records_extracted_total[5m]) * 60



Screenshot: Grafana dashboard with 5-panel layout — success rate, null rates, change rate, proxy health, throughput


Common Pitfalls

Pitfall 1: Scraping Prometheus instead of pushing. For short-lived jobs (one-shot scripts, not long-running services), push metrics to a Pushgateway rather than expecting Prometheus to scrape a /metrics endpoint that may be gone before the next scrape interval.

Pitfall 2: High cardinality labels. Don't use URL as a label value, that creates one time series per unique URL, which can be millions. Keep label cardinality bounded: target, status_code, field, proxy_pool.

Pitfall 3: Alert fatigue. Start with only the critical alerts (stale data, high null rate). Add more only after the critical ones have proven useful. Too many alerts = ignored alerts.


Conclusion

A working Prometheus + Grafana stack for scraping takes a few hours to set up and prevents the class of failures where everything looks fine until it isn't. The null rate alert alone is worth the setup time, it's the metric that catches soft blocks before your downstream consumers do.

Pair this with clean proxy infrastructure from Evomi and you'll see the proxy success rate metrics staying consistently high, a baseline that makes the anomalies meaningful when they do appear. You can test Evomi's proxies with a completely free trial.

You added alerting. You check the logs. But you have no idea whether your scrapers are healthy until a downstream team files a ticket. By then, you've been serving stale data for six hours.

This guide builds a Prometheus + Grafana monitoring stack for a scraping fleet, one that catches soft blocks, data quality degradation, and proxy failures before they reach production.


Prerequisites

  • Docker and Docker Compose

  • Python scraper(s) exposing a metrics endpoint (or adaptable to do so)

  • 30 minutes


Architecture


[Scraper Workers] expose /metrics  [Prometheus]  [Grafana dashboards + alerts]
                                           
                                    [Alertmanager]
                                           
                                  [PagerDuty / Slack]


Prometheus scrapes your scraper workers' /metrics endpoints on a configurable interval. Grafana queries Prometheus for dashboards and alert evaluation.


Step 1: Instrument Your Scrapers

Add metrics to your scraper workers using the prometheus_client library:


# metrics.py
from prometheus_client import (
    Counter, Histogram, Gauge, Summary,
    start_http_server, REGISTRY
)

# Request metrics
requests_total = Counter(
    'scraper_requests_total',
    'Total HTTP requests made',
    ['target', 'status_code']
)

request_duration = Histogram(
    'scraper_request_duration_seconds',
    'HTTP request duration',
    ['target'],
    buckets=[0.1, 0.5, 1.0, 2.0, 5.0, 10.0, 30.0]
)

# Data quality metrics
records_extracted = Counter(
    'scraper_records_extracted_total',
    'Records successfully extracted',
    ['target']
)

schema_failures = Counter(
    'scraper_schema_failures_total',
    'Schema validation failures',
    ['target', 'field']
)

price_anomalies = Counter(
    'scraper_price_anomalies_total',
    'Price range guard violations',
    ['target', 'category']
)

null_field_rate = Gauge(
    'scraper_null_field_rate',
    'Proportion of records with null for a given field (soft block indicator)',
    ['target', 'field']
)

# Data freshness
change_rate = Gauge(
    'scraper_data_change_rate',
    'Proportion of records that changed vs. last run',
    ['target']
)

# Proxy metrics
proxy_success_rate = Gauge(
    'scraper_proxy_success_rate',
    'Success rate for current proxy pool',
    ['target', 'proxy_pool']
)

active_sessions = Gauge(
    'scraper_active_sessions',
    'Number of active browser/HTTP sessions',
    ['target']
)

# Job metrics
job_duration = Summary(
    'scraper_job_duration_seconds',
    'Total job duration',
    ['target', 'job_type']
)

last_successful_run = Gauge(
    'scraper_last_successful_run_timestamp',
    'Unix timestamp of last successful run',
    ['target']
)

def start_metrics_server(port: int = 8000):
    """Start Prometheus metrics HTTP server."""
    start_http_server(port)
    print(f"Metrics server started on :{port}/metrics")


Integrate into your scraper:


# In your scraper worker
from metrics import (
    requests_total, request_duration, records_extracted,
    schema_failures, null_field_rate, last_successful_run,
    start_metrics_server
)
import time

start_metrics_server(8000)

async def fetch_and_extract(url: str, target: str):
    start = time.time()
    try:
        html = await fetch_page(url)
        requests_total.labels(target=target, status_code='200').inc()
    except httpx.HTTPStatusError as e:
        requests_total.labels(target=target, status_code=str(e.response.status_code)).inc()
        raise
    finally:
        request_duration.labels(target=target).observe(time.time() - start)

    record = parse_product_page(url, html)
    if record:
        records_extracted.labels(target=target).inc()
    else:
        schema_failures.labels(target=target, field='parse_failure').inc()

    return record



Step 2: Docker Compose Setup


# docker-compose.yml
version: '3.8'

services:
  prometheus:
    image: prom/prometheus:v2.51.0
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
      - ./alerts.yml:/etc/prometheus/alerts.yml
      - prometheus_data:/prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.retention.time=30d'

  grafana:
    image: grafana/grafana:10.4.0
    ports:
      - "3000:3000"
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=your_password
    volumes:
      - grafana_data:/var/lib/grafana
      - ./grafana/dashboards:/etc/grafana/provisioning/dashboards
      - ./grafana/datasources:/etc/grafana/provisioning/datasources
    depends_on:
      - prometheus

  alertmanager:
    image: prom/alertmanager:v0.27.0
    ports:
      - "9093:9093"
    volumes:
      - ./alertmanager.yml:/etc/alertmanager/alertmanager.yml

volumes:
  prometheus_data:
  grafana_data:



Step 3: Prometheus Configuration


# prometheus.yml
global:
  scrape_interval: 30s
  evaluation_interval: 30s

alerting:
  alertmanagers:
    - static_configs:
        - targets: ['alertmanager:9093']

rule_files:
  - 'alerts.yml'

scrape_configs:
  # Scraper worker metrics
  - job_name: 'scraper_workers'
    static_configs:
      - targets:
          - 'scraper-worker-1:8000'
          - 'scraper-worker-2:8000'
          - 'scraper-worker-3:8000'
    # Or use service discovery:
    # file_sd_configs:
    #   - files: ['scraper_targets.json']



Step 4: Alerting Rules


# alerts.yml
groups:
  - name: scraper_alerts
    rules:

      # Stale data: scraper hasn't completed a successful run in 2 hours
      - alert: ScraperStaleData
        expr: (time() - scraper_last_successful_run_timestamp) > 7200
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Scraper {{ $labels.target }} hasn't run successfully in 2h"
          description: "Last successful run was {{ $value | humanizeDuration }} ago"

      # High null rate: likely soft block or schema change
      - alert: HighNullFieldRate
        expr: scraper_null_field_rate > 0.15
        for: 10m
        labels:
          severity: critical
        annotations:
          summary: "High null rate on {{ $labels.target }}.{{ $labels.field }}"
          description: "{{ $value | humanizePercentage }} of records have null {{ $labels.field }} — possible soft block"

      # Price anomaly spike
      - alert: PriceAnomalySpike
        expr: rate(scraper_price_anomalies_total[10m]) > 0.1
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Price anomaly rate spike on {{ $labels.target }}"

      # Proxy success rate degradation
      - alert: ProxySuccessRateLow
        expr: scraper_proxy_success_rate < 0.7
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Proxy pool success rate below 70% for {{ $labels.target }}"
          description: "Current rate: {{ $value | humanizePercentage }}"

      # Schema failures sudden jump suggests layout change
      - alert: SchemaFailureSpike
        expr: rate(scraper_schema_failures_total[5m]) > 1
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Schema failure rate elevated on {{ $labels.target }}"



Step 5: Alertmanager Configuration


# alertmanager.yml
global:
  slack_api_url: 'https://hooks.slack.com/services/YOUR/WEBHOOK/URL'

route:
  group_by: ['alertname', 'target']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 1h
  receiver: 'slack-scraper-alerts'

  routes:
    - match:
        severity: critical
      receiver: 'pagerduty-oncall'

receivers:
  - name: 'slack-scraper-alerts'
    slack_configs:
      - channel: '#scraper-alerts'
        title: 'Scraper Alert: {{ .GroupLabels.alertname }}'
        text: '{{ range .Alerts }}{{ .Annotations.description }}{{ end }}'

  - name: 'pagerduty-oncall'
    pagerduty_configs:
      - routing_key: 'YOUR_PAGERDUTY_KEY'



Step 6: Grafana Dashboard

Provision a dashboard automatically by adding a JSON file to ./grafana/dashboards/. Key panels to include:

Panel 1: Request Success Rate by Target


rate(scraper_requests_total{status_code="200"}[5m]) /
rate(scraper_requests_total[5m])


Panel 2: Null Field Rate Heatmap


scraper_null_field_rate


Panel 3: Data Change Rate (Freshness)


scraper_data_change_rate


Panel 4: Proxy Pool Success Rate


scraper_proxy_success_rate


Panel 5: Records Extracted per Minute


rate(scraper_records_extracted_total[5m]) * 60



Screenshot: Grafana dashboard with 5-panel layout — success rate, null rates, change rate, proxy health, throughput


Common Pitfalls

Pitfall 1: Scraping Prometheus instead of pushing. For short-lived jobs (one-shot scripts, not long-running services), push metrics to a Pushgateway rather than expecting Prometheus to scrape a /metrics endpoint that may be gone before the next scrape interval.

Pitfall 2: High cardinality labels. Don't use URL as a label value, that creates one time series per unique URL, which can be millions. Keep label cardinality bounded: target, status_code, field, proxy_pool.

Pitfall 3: Alert fatigue. Start with only the critical alerts (stale data, high null rate). Add more only after the critical ones have proven useful. Too many alerts = ignored alerts.


Conclusion

A working Prometheus + Grafana stack for scraping takes a few hours to set up and prevents the class of failures where everything looks fine until it isn't. The null rate alert alone is worth the setup time, it's the metric that catches soft blocks before your downstream consumers do.

Pair this with clean proxy infrastructure from Evomi and you'll see the proxy success rate metrics staying consistently high, a baseline that makes the anomalies meaningful when they do appear. You can test Evomi's proxies with a completely free trial.

Author

The Scraper

Engineer and Webscraping Specialist

About Author

The Scraper is a software engineer and web scraping specialist, focused on building production-grade data extraction systems. His work centers on large-scale crawling, anti-bot evasion, proxy infrastructure, and browser automation. He writes about real-world scraping failures, silent data corruption, and systems that operate at scale.

Like this article? Share it.
You asked, we answer - Users questions:

In This Article