Log Collection Guide
This guide covers log configuration, formats, and querying. For the complete observability setup including OpenTelemetry, metrics, and Grafana dashboards, see Monitoring Guide.
Table of Contents
Quick Start
Enable OTLP Export (Recommended)
# Point to OpenTelemetry Collector (local or Grafana Cloud)
export OTLP_ENDPOINT=http://localhost:4317
wealth run
Logs, metrics, and traces are exported via OTLP gRPC. For complete OpenTelemetry setup, see Monitoring Guide - OpenTelemetry Metrics.
Local-Only Mode
If OTLP_ENDPOINT is not set, logs output to stdout/stderr:
# Optional: Enable file logging
export WEALTH__OBSERVABILITY__LOG_FILE=/tmp/wealth.log
wealth run
Log Query Examples
Basic Queries
# All logs from wealth bot
{job="wealth-bot"}
# Filter by log level
{job="wealth-bot"} |= "ERROR"
{job="wealth-bot"} |= "WARN"
{job="wealth-bot"} |= "INFO"
# Filter by level label (if parsed)
{job="wealth-bot", level="ERROR"}
{job="wealth-bot", level="INFO"}
# Filter by module
{job="wealth-bot"} |= "wealth::strategy"
{job="wealth-bot"} |= "wealth::execution"
{job="wealth-bot"} |= "wealth::market_data"
# Search for specific text
{job="wealth-bot"} |= "WebSocket"
{job="wealth-bot"} |= "arbitrage"
{job="wealth-bot"} |= "position"
Advanced Queries
# Errors in the last hour
{job="wealth-bot", level="ERROR"} [1h]
# Rate of errors per minute
rate({job="wealth-bot", level="ERROR"}[5m])
# Count logs by level
sum by (level) (count_over_time({job="wealth-bot"}[1h]))
# Logs containing correlation_id
{job="wealth-bot"} |~ "correlation_id=\\w+"
# WebSocket connection issues
{job="wealth-bot"} |~ "WebSocket.*error|WebSocket.*failed|WebSocket.*timeout"
# Order execution logs
{job="wealth-bot"} |= "Executing arbitrage" or |= "Order placed"
# Strategy-related logs
{job="wealth-bot", module=~"wealth::strategy.*"}
Event-Based Queries (Recommended)
All key log messages include a structured event field for precise filtering:
# Parse JSON and filter by event type
{service="wealth-bot"} | json | event="opportunity_detected"
{service="wealth-bot"} | json | event="arbitrage_executed"
{service="wealth-bot"} | json | event="position_close_succeeded"
# Trade skipped events (all use _skipped suffix)
{service="wealth-bot"} | json | event="quantity_validation_failed_skipped"
{service="wealth-bot"} | json | event="precision_mismatch_skipped"
{service="wealth-bot"} | json | event="unhedged_positions_skipped"
{service="wealth-bot"} | json | event="insufficient_balance_skipped"
# All skipped events
{service="wealth-bot"} | json | event=~".*_skipped"
# All position lifecycle events
{service="wealth-bot"} | json | event=~"position_.*"
# WebSocket and connection events
{service="wealth-bot"} | json | event=~"websocket_.*"
# Circuit breaker activity
{service="wealth-bot"} | json | event=~"circuit_breaker_.*"
# Error events requiring attention
{service="wealth-bot"} | json | event="unhedged_position_detected"
{service="wealth-bot"} | json | event="size_discrepancy_detected"
# Count opportunities vs executions
sum by (event) (count_over_time({service="wealth-bot"} | json | event=~"opportunity_detected|arbitrage_executed" [1h]))
See Loki JSON Parsing Guide for the complete list of 150+ event types.
Pattern Matching
# Extract values from logs using regex
{job="wealth-bot"}
| regexp "correlation_id=(?P<cid>\\w+)"
| line_format "{{.cid}}: {{.message}}"
# Parse structured data
{job="wealth-bot"}
| pattern `<_> <level> <module>: <message>`
| level = "ERROR"
Creating a Logs Dashboard
1. Log Volume Panel
Query:
sum(rate({job="wealth-bot"}[1m])) by (level)
Visualization: Time series graph showing log rate by level
2. Error Rate Panel
Query:
sum(rate({job="wealth-bot", level="ERROR"}[5m]))
Visualization: Stat panel with alert threshold at > 0
3. Recent Errors Table
Query:
{job="wealth-bot", level="ERROR"}
Visualization: Logs panel (table view) Options: Show time, level, and message columns
4. Log Level Distribution
Query:
sum by (level) (count_over_time({job="wealth-bot"}[1h]))
Visualization: Pie chart
5. Module Activity
Query:
sum by (module) (count_over_time({job="wealth-bot"}[1h]))
Visualization: Bar chart
Log File Rotation
To prevent log files from growing too large:
Using logrotate (Linux)
Create /etc/logrotate.d/wealth:
/tmp/wealth*.log {
daily
rotate 7
compress
delaycompress
missingok
notifempty
create 0640 thiras thiras
postrotate
# Send SIGHUP to reload (if bot supports it)
# pkill -HUP -f wealth || true
endscript
}
Test configuration:
sudo logrotate -d /etc/logrotate.d/wealth
sudo logrotate -f /etc/logrotate.d/wealth
Using truncate
Simple script to truncate logs periodically:
#!/bin/bash
# truncate-logs.sh
LOG_FILE="/tmp/wealth.log"
MAX_SIZE_MB=100
if [ -f "$LOG_FILE" ]; then
SIZE=$(du -m "$LOG_FILE" | cut -f1)
if [ "$SIZE" -gt "$MAX_SIZE_MB" ]; then
echo "Truncating $LOG_FILE (${SIZE}MB > ${MAX_SIZE_MB}MB)"
> "$LOG_FILE"
fi
fi
Add to crontab:
# Run every hour
0 * * * * /path/to/truncate-logs.sh
Alerting on Logs
Create Alert Rule in Grafana
- Go to Alerting → Alert rules → New alert rule
- Set query:
sum(rate({job="wealth-bot", level="ERROR"}[5m])) > 0 - Set evaluation interval: 1m
- Set condition: Alert when query result > 0
- Add notification channel (email, Slack, etc.)
Common Alert Rules
High Error Rate:
sum(rate({job="wealth-bot", level="ERROR"}[5m])) > 0.1
WebSocket Connection Failures:
sum(count_over_time({job="wealth-bot"} |= "WebSocket" |= "failed" [5m])) > 3
No Logs Received (Bot Down):
absent_over_time({job="wealth-bot"}[5m]) == 1
Troubleshooting
Pull Model (Promtail)
No logs appearing in Loki
-
Check Promtail is running:
docker compose ps promtail docker compose logs promtail -
Verify log file exists and is readable:
ls -lah /tmp/wealth.log tail -f /tmp/wealth.log -
Check Promtail positions file:
docker compose exec promtail cat /tmp/positions.yaml -
Test Loki directly:
curl -s http://localhost:3100/loki/api/v1/label/job/values
Logs not parsing correctly
-
Check log format matches regex:
# Example log line echo "2025-11-10T01:23:45.123456Z INFO wealth::strategy: Message" | \ grep -oP '^\S+\s+\w+\s+[\w:]+:\s+.*$' -
View Promtail debug logs:
docker compose logs promtail | grep -i error
Push Model (Loki Direct)
No logs appearing in Loki via push
-
Check Loki is running:
docker compose ps loki docker compose logs loki # Check Loki health curl http://localhost:3100/ready -
Verify Loki endpoint is reachable:
# Test HTTP endpoint (note: the /loki/api/v1/push path is added by the library) curl -v http://localhost:3100/ready -
Check bot is configured correctly:
# Verify environment variable is set echo $OTLP_ENDPOINT # Should see startup message when running bot: # "OpenTelemetry initialized with endpoint: http://localhost:4317" -
Check Loki logs for errors:
docker compose logs loki | grep -i error docker compose logs loki | grep -i "push" -
Test OpenTelemetry Collector health:
# Check OTLP receiver is responding curl http://localhost:13133/ # Check metrics being exported to Prometheus curl http://localhost:8889/metrics | grep wealth
Push connection timeouts
-
Check network connectivity:
# Test OTLP gRPC endpoint telnet localhost 4317 # Or check if port is listening nc -zv localhost 4317 -
Check Docker network:
docker network inspect wealth_monitoring -
Check OpenTelemetry Collector configuration:
# View collector logs for errors docker compose logs otel-collector # Verify collector config (in compose.yml) docker compose config | grep -A 20 otel-collector
Logs delayed or missing
-
Check OTLP export is working:
- OpenTelemetry batches logs before sending
- Default batch timeout is 10 seconds
- Check bot logs for OTLP export errors
-
Monitor OpenTelemetry Collector:
# Check collector is receiving telemetry docker compose logs otel-collector | grep -i "logs" # Check collector metrics curl http://localhost:8888/metrics | grep otelcol_receiver -
Verify labels are correct:
# Check available labels in Loki curl http://localhost:3100/loki/api/v1/labels # Check values for 'service' label curl http://localhost:3100/loki/api/v1/label/service/values
General Issues
Performance issues
-
Check Loki disk usage:
docker compose exec loki df -h /loki -
Limit log retention in Loki config:
- Edit Loki config to set retention period
- Default: unlimited (until disk full)
Advanced: JSON Logging
For better log parsing and indexing, JSON logging is supported. This is configured automatically when using OTLP export.
Update Promtail Config
In compose.yml, update the pipeline_stages:
pipeline_stages:
- json:
expressions:
timestamp: timestamp
level: level
message: message
module: target
span: span
correlation_id: fields.correlation_id
- labels:
level:
module:
- timestamp:
source: timestamp
format: RFC3339Nano
Log Retention
Loki stores logs with automatic compaction. Configure retention in compose.yml:
loki:
command:
- -config.file=/etc/loki/local-config.yaml
- -config.expand-env=true
environment:
- LOKI_RETENTION_PERIOD=30d
Or create custom Loki config with retention limits.
Best Practices
- Use Loki direct push for production - Lower latency, simpler setup than OTLP
- Keep file logging for debugging - Hybrid mode provides redundancy
- Use structured logging - Include correlation_id, operation, etc.
- Set appropriate log levels - Use DEBUG for development, INFO for production
- Create dashboards - Visualize key metrics from logs
- Set up alerts - Get notified of critical errors
- Index important fields - Add labels for common filters (level, module)
- Monitor Loki performance - Check ingestion rate and query latency
- Configure log retention - Balance storage costs with retention needs
- Use correlation IDs - Automatically included in logs for tracing
Comparison: Pull vs Push
| Aspect | Pull (Promtail) | Push (Loki Direct) |
|---|---|---|
| Setup Complexity | Simple | Simpler (no Promtail needed) |
| Latency | 5-10 seconds | < 1 second |
| Disk I/O | Required (log files) | Optional |
| Network Efficiency | Lower (file polling) | Higher (batched HTTP) |
| Reliability | File-based buffering | In-memory buffering |
| Scalability | One agent per host | Direct to Loki |
| Dependencies | Promtail service | None (built into bot) |
| Production Ready | ✓ | ✓✓ (recommended) |
Migration Path: Pull → Push
-
Phase 1: Enable OpenTelemetry OTLP export
# Keep existing file logging if desired export WEALTH_LOG_FILE=/tmp/wealth.log # Add OTLP endpoint export OTLP_ENDPOINT=http://localhost:4317 wealth run -
Phase 2: Verify OTLP export in Grafana
- Check logs appear in Loki via Grafana Explore
- Verify metrics in Prometheus
- Check traces in Tempo
- Confirm correlation between logs/metrics/traces
-
Phase 3: Disable file logging (optional)
# Remove file logging for OTLP-only mode unset WEALTH_LOG_FILE # Keep OTLP export export OTLP_ENDPOINT=http://localhost:4317 wealth run -
Phase 4: Production deployment
# Ensure all observability services are running docker compose up -d # Configure bot for OTLP export OTLP_ENDPOINT=http://localhost:4317 export OTEL_RESOURCE_ATTRIBUTES="service.name=wealth-bot,deployment.environment=production" wealth run
Related Documentation
- Monitoring Guide - Complete observability with OpenTelemetry, metrics, and dashboards
- Grafana Cloud Setup - Production Grafana setup
- Loki JSON Parsing - Complete event types reference
- Troubleshooting - Common issues