🗺️ Platform Deployment Map

last verified 2026-06-28
PROD CPU req 91% · mem 77% STAGING 16 GB box
📡 Open Live Status → 🧭 Product Roadmap → Real-time service health & recent deploys at status.dloizides.com; the product roadmap (flowchart · gantt · kanban) lives here too. This page is the static topology companion.

What runs where live topology

graph TB
    subgraph internet["🌐 Public Internet — *.dloizides.com (real LE TLS)"]
        users["End users / testers"]
    end

    subgraph prod["🟢 PROD — Hetzner prod (4 vCPU / 8 GB · k3s) — CPU req 94% / mem 83%"]
        ingress["Traefik ingress + LE certs
ALL public URLs terminate here"] subgraph prodapis["Product APIs (+ BFF) → shared saas-db"] kefiapi["kefi-api / bff-kefi"] tenant["tenant-api"] quest["questioner-api"] onlinemenu["onlinemenu-api"] content["content-api"] payment["payment-api"] notif["notification-api (MT consumer)"] end subgraph prodweb["Web / SPA / static"] kefiweb["kefi-web SPA · kefi-marketing"] kefiland["kefi-landings (vestigial/crashloop)"] erevna["erevna-web · katalogos-web"] statics["~35 static apps/games · HQ"] end kc["keycloak + keycloak-db"] subgraph prodstate["Stateful infra (hibernate-able → 0/0)"] saasdb[("saas-db (shared; *-db aliases)")] rmq["rabbitmq"] sw["seaweedfs (S3)"] maddy["maddy (mail)"] pouenidb[("poueni-postgres")] dynalux[("dynalux-db")] end amlmkt["aml-marketing (static landing — stays)"] promagent["prometheus AGENT (remote-write)"] pubjob["kefi publish Job: build-site→kaniko→rollout
CPU req 500m→200m (now schedules)"] gproxy["selector-less Service + EndpointSlice
grafana.* · analytics.* · aml-* → staging NodePort"] end subgraph staging["🟡 STAGING — staging (WireGuard-only · 8 vCPU / 16 GB)"] registry["Docker registry (staging)"] graf["Grafana + Prometheus(store) + Loki + Alertmanager"] umami["Umami + umami-db"] amlstg["AML (migrated 2026-06-28): aml-screening+aml-postgres ·
aml-identity+identity-postgres · aml-ner — NodePort"] swstg["seaweedfs (E2E report store)"] subgraph e2e["E2E runners (CronJobs) — run HERE, TARGET prod"] e2ekefi["kefi-lifecycle 03:00 (deadline 1800)"] e2epoueni["poueni-reset 03:40"] e2efull["full suite 04:00"] end end users -->|HTTPS| ingress ingress --> prodapis ingress --> prodweb ingress --> kc ingress -->|analytics.* · grafana.* · aml-*| gproxy gproxy -.->|WireGuard ~100ms| umami gproxy -.->|WireGuard ~100ms| graf gproxy -.->|WireGuard ~100ms| amlstg promagent -.->|remote-write WG| graf kefiapi -->|create Job| pubjob pubjob -->|push image| registry e2e -.->|test over public DNS| ingress e2e -->|reports| swstg prod -.->|pull images| registry classDef off fill:#3d1414,stroke:#f85149,color:#ffd7d5; classDef fix fill:#13331b,stroke:#3fb950,color:#aff5b4; class kefiland off; class pubjob fix; class amlstg fix;
live (prod) runs on staging, public URL proxied from prod hibernate-able / scheduled sleep off by design

Deployment inventory status by host

🟢 PROD — Hetzner prod (public)

Hetzner prod · 4 vCPU / 8 GB · real LE TLS
  • tenant · kefi · questioner · onlinemenu · content · payment · notification — product APIs + BFFs → shared saas-db
  • keycloak (+ keycloak-db) — auth; all realms
  • kefi-web · erevna-web · katalogos-web — product SPAs
  • ~35 static apps / games · hq.dloizides.com
  • saas-db · rabbitmq · seaweedfs · maddy · poueni-postgres · dynalux-db — stateful; scaled to 0/0 when hibernated → APIs go "Running but not Ready"
  • aml-marketing — static AML landing (stays on prod)
  • aml-screening.* · aml-identity.* — AML stack MIGRATED to staging 2026-06-28; public URLs proxied from prod
  • kefi-landings — vestigial; SPA replaced it (empty nginx root → crashloop)
  • kefi publish build Jobs — kaniko on Publish click; CPU req 200m (was 500m → Pending)
  • prometheus AGENT — scrapes prod, remote-writes to staging
  • grafana.* · analytics.* — public URL here, app on staging (WG)

🟡 STAGING — staging (WireGuard-only)

WireGuard-only · 8 vCPU / 16 GB · no public TLS
  • Docker registry (staging) — prod pulls images from here
  • Grafana · Prometheus (store/query) · Loki · Alertmanager — prod agent remote-writes here; grafana.dloizides.com proxied from prod
  • Umami + umami-db — migrated 2026-06-27; analytics.dloizides.com proxied from prod
  • AML stack — migrated 2026-06-28: aml-screening+aml-postgres, aml-identity+identity-postgres, aml-ner; aml-screening/identity.dloizides.com proxied from prod (via NodePort)
  • seaweedfs — E2E canary report store
  • E2E CronJobs (target prod over public DNS) — kefi-lifecycle 03:00 · poueni-reset 03:40 · full suite 04:00 · all share one canary lock

Status & recent changes 2026-06-28

✅ kefi self-serve publish — FIXED

Publish build pods needed 500m CPU but only ~250m free → stuck Pending → sites never built. Lowered build-site+kaniko requests to 200m + restarted kefi-api. Verified: build Completes in 39s, lifecycle E2E passes.

✅ prod-from-staging E2E runners — REPAIRED

Re-spaced schedules (canary-lock races), kefi deadline 900→1800s, image pins, imagePullPolicy: IfNotPresent. Full prod suite running now for a complete green/red.

⚠️ prod is CPU-request saturated (94%)

Like memory before it. The 200m publish fix is a margin fix — if CPU tightens, publish breaks again. Strategic lever: offload AML→staging (same proxy pattern as Umami), gated on the test window.

⚠️ Hibernation gotcha

Products "Running but not Ready" = a stateful workload at 0/0. First check: kubectl get statefulset -n dloizides → scale the DB/broker to 1 (DB-first).