No description
  • Go 97.6%
  • Go Template 1.5%
  • Dockerfile 0.6%
  • Makefile 0.3%
Find a file
statevault 1421d7cb10 chore(ci): pin shared-ci-shared:v1, consolidate on shared-go-base:v1
Three changes bundled:

1. Float shared-ci-shared from :v1.0.9 (the last surviving exact-pin
   after the morning's "keep last 3 versions" prune) to :v1. Same
   pattern as the other consumers — survives future prunes.

2. Swap vet / test from docker.io/library/golang:1.25-alpine to
   loco/shared-go-base:v1.

3. Swap helm-lint / helm-publish from alpine/helm:latest to
   loco/shared-go-base:v1. helm v3.16.4 is baked into shared-go-base,
   so we replace two upstream pulls with one local-registry pull.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 12:20:15 +02:00
.woodpecker chore(ci): pin shared-ci-shared:v1, consolidate on shared-go-base:v1 2026-05-13 12:20:15 +02:00
chart/tofu-state chore(release): v0.8.5 [skip ci] 2026-05-13 08:21:47 +00:00
.gitignore chore(gitignore): exclude tofu-state-*.tgz build artifact 2026-05-04 12:17:51 +02:00
bao.go fix(handler): forward bao 4xx status codes (e.g., 403 → 403, not 500) 2026-05-01 19:07:16 +02:00
bao.yml chore: fill bao.yml + bump chart to 0.8.0 + fix image registry default 2026-05-10 07:31:24 +02:00
CHANGELOG.md chore(release): v0.8.5 [skip ci] 2026-05-13 08:21:47 +00:00
Containerfile Add OpenTelemetry tracing with Loki correlation 2026-04-03 11:36:14 +02:00
coverage.out Fix Bao namespace: use X-Vault-Namespace header instead of URL path 2026-04-03 20:32:48 +02:00
go.mod Add Prometheus metrics and switch to logfmt logging 2026-04-04 00:30:39 +02:00
go.sum Add Prometheus metrics and switch to logfmt logging 2026-04-04 00:30:39 +02:00
handler.go fix(handler): forward bao 4xx status codes (e.g., 403 → 403, not 500) 2026-05-01 19:07:16 +02:00
handler_test.go fix(handler): forward bao 4xx status codes (e.g., 403 → 403, not 500) 2026-05-01 19:07:16 +02:00
main.go Remove CI trigger comment 2026-04-06 03:09:17 +02:00
Makefile Initial: OpenTofu HTTP state backend with OpenBao KV v2 2026-04-02 23:43:30 +02:00
metrics.go feat(metrics): per-type resource gauges, size, info, managed/data split; fix lock metric bug 2026-04-12 10:54:09 +02:00
otel.go Fix SonarCloud issues: extract constants, reduce complexity 2026-04-03 20:16:16 +02:00
README.md docs(readme): infra/qubes-incus migrated; remove legacy-keyspace note 2026-05-10 07:34:49 +02:00
session.go Add session traces: link LOCK→GET→POST→UNLOCK into one Tempo trace 2026-04-03 20:11:15 +02:00
sonar-project.properties Fix SonarCloud issues: extract types, constants, reduce complexity 2026-04-02 23:58:34 +02:00

tofu-state

OpenTofu HTTP state backend backed by OpenBao KV v2.

Deployments. Two instances run:

  • Cluster-internal on the k8s openbao qube at 10.90.0.12:8223 (local mesh IP) — used by the k8s-build-env deploy pipeline and other cluster-side consumers.
  • Workstation-local at localhost:8223 via a qrexec tunnel to the same openbao qube — used by tofu apply from the admin workstation.

The talos-hcloud-cluster (thc) platform stack and k8s-build-env both use this backend for their root modules (platform/talos-stack, platform/talos-platform, platform/talos-bootstrap, identity/k8s-build-env).

What it does

Implements the OpenTofu HTTP backend protocol — GET/POST/DELETE for state, LOCK/UNLOCK with CAS-based locking. State is stored in OpenBao's states/ KV v2 mount. Locks are stored alongside state as {path}.lock entries.

Architecture

tofu plan/apply
  -> HTTP backend (GET/POST/LOCK/UNLOCK)
    -> tofu-state service
         localhost:8223      (workstation, via qrexec to openbao qube)
         10.90.0.12:8223     (cluster, via WG mesh)
      -> OpenBao KV v2 (states/ mount, root namespace)
  • Service token: authenticates to OpenBao with a scoped orphan token (BAO_TOKEN)
  • API token: callers authenticate to the service with a shared token (API_TOKEN) via Bearer or Basic auth
  • CAS locking: cas=0 on lock creation — fails with 409 if lock exists, returns current lock holder info
  • Max 2-level paths: <group>/<project> enforced in handler.go — depth >2 returns 400. Convention is <bao-scope-group>/<project>, e.g. platform/talos-stack, mesh/service-machine.
  • Tracing: full OTEL instrumentation, traces link LOCK->GET->POST->UNLOCK into one Tempo trace

Active States

Path Project
platform/talos-platform THC platform (Hetzner, Talos, service-machine)
platform/talos-stack THC stack (operators + cluster-scoped services)
platform/talos-bootstrap THC vault bootstrap (one-time + DR re-runs)
identity/k8s-build-env Forgejo + Woodpecker on the cluster
mesh/service-machine Mesh hub (HAProxy, Unbound, transit, Zot)
images/qubes-incus distrobuilder LXC images (zot, devpi, odoo, scan)
workloads/opencloud OCIS at cloud.loop-coop.net

All paths follow the canonical <group>/<project> convention. Past migrations (e.g. infra/qubes-incus → images/qubes-incus on 2026-05-10) used the API-level copy/delete dance documented in ~/aienv/docs/runbooks/tofu-state-ops.md §F.

Configuration

Env var Required Default Description
BAO_ADDR yes http://localhost:8200 OpenBao API address
BAO_NAMESPACE no "" (root) OpenBao namespace (leave empty)
BAO_TOKEN yes - Service token for KV access
API_TOKEN yes - Shared token for caller auth
LISTEN_ADDR no :8080 Listen address
OTEL_EXPORTER_OTLP_ENDPOINT no - OTLP gRPC endpoint for traces

OpenTofu Backend Config

Workstation (qrexec tunnel to the openbao qube):

backend "http" {
  address        = "http://localhost:8223/<group>/<project>"
  lock_address   = "http://localhost:8223/<group>/<project>"
  unlock_address = "http://localhost:8223/<group>/<project>"
}

Cluster-internal (WG mesh):

backend "http" {
  address        = "http://10.90.0.12:8223/<group>/<project>"
  lock_address   = "http://10.90.0.12:8223/<group>/<project>"
  unlock_address = "http://10.90.0.12:8223/<group>/<project>"
}

Auth via env: TF_HTTP_USERNAME=tofu TF_HTTP_PASSWORD=<API_TOKEN>. On the workstation, the tofu-state bao-scope renders both — run bao-exec tofu-state -- tofu apply (the bare tofu shim does this automatically).

Helm Chart

The chart lives under chart/tofu-state/ and is published to Forgejo's OCI registry by CI on every push to main (semver bump driven by bump.sh from shared-ci-shared).

helm install tofu-state \
  oci://git.loop-coop.net/projects/tofu-state \
  --version <chart-version>

Key values (chart/tofu-state/values.yaml):

  • image.repository — defaults to harbor.loop-coop.net/projects/tofu-state; override to git.loop-coop.net/loco/platform-tofu-state for the actual published image.
  • config.baoAddr — default http://openbao-active.openbao.svc:8200 (cluster service).
  • config.otelEndpoint — default http://alloy.monitoring.svc:4317.
  • externalSecret.enabled=true — pull BAO_TOKEN (infra/tofu-state/service) and API_TOKEN (infra/tofu-state/api) from OpenBao via ESO.
  • ingress.enabled — chart supports it (host: state.example.com placeholder), but production uses qrexec + WG mesh, so leave disabled.
  • serviceMonitor.enabled=true — Prometheus metrics scraping.

CI/CD

Single pipeline at .woodpecker/ci.yml runs on push and pull_request. Steps: vet → test → helm-lint → audit (govulncheck) → bao-checks → sbom + cve-scan (advisory) → bump-prepare → build-image (kaniko) → helm-package → release-gate.

Release path is gated on event=push branch=main — a single push to main triggers bump.sh prepare (semver from Conventional Commit subjects), tag, container build, helm package, and Forgejo release. There is no separate release.yml and tags are not the trigger.

Artifact Location
Container git.loop-coop.net/loco/platform-tofu-state:<tag>
Helm chart oci://git.loop-coop.net/projects/tofu-state:<version>
Release git.loop-coop.net/loco/platform-tofu-state/releases

Development

go build .          # build
go test ./...       # 37 tests
go vet ./...        # lint
helm lint chart/tofu-state/