nomad-temporal-jobs

Temporal Workflows Automated Backups Trivy Scanning Node Cleanup Registry GC Prometheus Metrics OpenTelemetry
Temporal workflow workers for infrastructure automation
Three independent Temporal workers handle backup orchestration, container vulnerability scanning, and orphaned data cleanup across Nomad client nodes — with the cleanup worker also reclaiming Docker registry storage via a saga-style garbage-collect.
- Automated nightly backups of Nomad, Consul, and PostgreSQL with S3 offsite replication and configurable retention
- Vulnerability scanning of all running container images with parallel batched Trivy scans and CVE persistence
- Orphaned data directory cleanup across Nomad nodes with dry-run safety and grace period filtering
- Docker registry garbage collection that scales the registry offline, runs GC, and always scales it back via saga compensation
Key Features
Nomad Raft, Consul Raft, and PostgreSQL snapshots with S3 upload and retention cleanup.
Discover images from Nomad, batch parallel scans via Trivy, persist CVE results to PostgreSQL.
SSH to each Nomad client node, identify orphaned directories, and remove stale data safely.
Reclaim Docker registry storage with a saga that never leaves the registry offline.
garbage-collect over SSH, then scales back to 1. The scale-back is a deferred compensation on a disconnected context — it always fires, even if GC fails or the workflow is cancelled. Reports blobs deleted and bytes reclaimed.Every activity traced end-to-end with Tempo export and service graph edges.
Temporal SDK metrics via Tally-Prometheus bridge exposed on :9090/metrics.
JSON slog output with service identity for Alloy/Loki collection.