No description
  • HCL 95.1%
  • Shell 4.9%
Find a file
statevault 5c62686e83 docs(opencloud): record 2026-05-15 incident + new admin/persistence model
- Admin role now granted via libregraph appRoleAssignments (not
  OC_ADMIN_USER_ID — that env collides with basic-auth claim
  synthesis).
- New § IDM persistence: explains the decomposeds3 split-brain
  (blobs in S3, metadata on local PVC), why dropping the data PVC
  on 2026-05-12 wiped users + orphaned 43 MB of S3 content, and the
  fix (5 Gi PVC + Longhorn backup-critical labels on both PVCs via
  commit 99f62f5 in platform).
- Stefan's current internal id captured (32c3b690…) since it's now
  stable.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-15 18:45:46 +02:00
docs docs: AIenv space folder conventions 2026-05-09 18:33:48 +02:00
scripts feat: scaffold OpenCloud deployment on Talos 2026-04-29 07:55:50 +02:00
values fix(opencloud): re-enable data PVC — IDM boltdb lives there 2026-05-15 18:02:43 +02:00
vendor/opencloud-helm fix(opencloud): disable importer web extension (Trivy CRIT) 2026-05-03 18:55:43 +02:00
.gitignore feat: scaffold OpenCloud deployment on Talos 2026-04-29 07:55:50 +02:00
bao.yml feat(bao): add bao.yml manifest at repo root 2026-05-01 18:46:34 +02:00
CLAUDE.md docs(opencloud): record 2026-05-15 incident + new admin/persistence model 2026-05-15 18:45:46 +02:00
data.tf feat: scaffold OpenCloud deployment on Talos 2026-04-29 07:55:50 +02:00
database.tf feat: scaffold OpenCloud deployment on Talos 2026-04-29 07:55:50 +02:00
dns.tf feat: scaffold OpenCloud deployment on Talos 2026-04-29 07:55:50 +02:00
ingress.tf feat: scaffold OpenCloud deployment on Talos 2026-04-29 07:55:50 +02:00
locals.tf feat: scaffold OpenCloud deployment on Talos 2026-04-29 07:55:50 +02:00
minio.tf feat: scaffold OpenCloud deployment on Talos 2026-04-29 07:55:50 +02:00
namespace.tf feat: scaffold OpenCloud deployment on Talos 2026-04-29 07:55:50 +02:00
oidc.tf feat: scaffold OpenCloud deployment on Talos 2026-04-29 07:55:50 +02:00
opencloud.tf chore(storage): drop opencloud-data PVC, shrink config to 2Gi 2026-05-12 06:49:32 +02:00
outputs.tf feat: scaffold OpenCloud deployment on Talos 2026-04-29 07:55:50 +02:00
providers.tf feat: scaffold OpenCloud deployment on Talos 2026-04-29 07:55:50 +02:00
README.md docs(opencloud): reflect autoprovision-on + seed-admin UUID semantics 2026-05-15 16:29:22 +02:00
secrets.tf feat: scaffold OpenCloud deployment on Talos 2026-04-29 07:55:50 +02:00
variables.tf chore(storage): drop opencloud-data PVC, shrink config to 2Gi 2026-05-12 06:49:32 +02:00
versions.tf chore(state): migrate backend infra/opencloud → workloads/opencloud 2026-05-01 17:49:23 +02:00
web-extensions.tf fix(opencloud): bump 2.1.0→6.1.0 + codify external-sites + disable importer 2026-05-04 11:46:49 +02:00

OpenCloud on Talos / Hetzner

Self-contained OpenTofu project that deploys OpenCloud (Go ocis successor) at cloud.loop-coop.net on the Talos k8s cluster, with user data on the service-machine MinIO and SSO through Loop Portal.

The shape of this project mirrors ~/work/.attic/legacy/nextcloud-decommissioned-2026-04-29/ so the two deployments can be reasoned about together. Where OpenCloud's architecture forces a divergence (no Postgres in the core path, no external Redis surface, public-client OIDC), CLAUDE.md calls it out.

Stack

        ┌────────────────────────────────────┐
        │  cloud.loop-coop.net  (TLS @ ingress-nginx, cert-manager letsencrypt)
        └────────────────────────────────────┘
                          │
                          ▼
            ┌─────────────────────────┐
            │  OpenCloud StatefulSet  │  pinned to loco-wo-big-0
            │   image: opencloudeu/   │  config PVC + data PVC (Longhorn)
            │       opencloud-rolling │
            └─────────────────────────┘
              │       │           │
              │       │           └────► Loop Portal OIDC (id.loop-coop.net)
              │       │                    public client + PKCE
              │       │
              │       └────► NATS internal (chart-bundled, in-pod)
              │
              ▼
   ┌──────────────────────┐    ┌─────────────────────────┐
   │  service-machine     │    │   shared-pg (CNPG)      │
   │     MinIO            │    │   `opencloud` database  │
   │     bucket: opencloud│    │   (reserved; consumed   │
   │                      │    │    by OnlyOffice when   │
   │                      │    │    enable_onlyoffice=1) │
   └──────────────────────┘    └─────────────────────────┘

Apply

./scripts/vendor-chart.sh   # first time only — clones the chart at the pinned SHA
./scripts/apply.sh init
./scripts/apply.sh plan
./scripts/apply.sh apply

The apply wrapper stacks the bao scopes (tofu-state, tofu-hcloud-privileged, k8s-bao) and reads MinIO root creds from the qubes bao. Don't run tofu directly.

The OpenCloud chart isn't published to a public helm repo (the advertised OCI path on GHCR returns 403 unauthenticated). We vendor it locally — see CLAUDE.md for the bump procedure.

Pre-requisites

All applied as of 2026-04-29 — listed here for fresh deploy / DR reference:

  1. OIDC client registered in Loop Portal vault (talos-hcloud-cluster/stack/bootstrap, PR #57). vault_identity_oidc_client.opencloud with three SPA redirect URIs (/, /oidc-callback.html, /oidc-silent-redirect.html) and public: true set in portal_oidc.clients_json so the token endpoint accepts PKCE without a client_secret.

  2. Local-bao mirrortofu apply in ~/aienv/openbao/tofu/ (or targeted on vault_kv_secret_v2.portal_mirror["oidc"]) pushes the new clients_json to local bao; mesh-agent on service-machine re-renders within ~60s and reload-or-restarts loop-portal.

  3. Loop Portal v0.13.0+ deployed on service-machine (qubeos/service-machine images/loop-portal.yaml LOOP_PORTAL_VERSION, image rebuilt + LXC recreated). Older builds (≤v0.12) reject SPA token requests with invalid_client.

  4. HAProxy CORS rule on Loop Portal OIDC endpoints (qubeos/service-machine d5f8a93). The SPA fetches /.well-known/openid-configuration, /.well-known/jwks.json, /oauth2/token, and /oauth2/userinfo directly; without ACAO Firefox blocks them.

Verify

# DNS
dig +short cloud.loop-coop.net

# TLS
echo | openssl s_client -connect cloud.loop-coop.net:443 -servername cloud.loop-coop.net 2>/dev/null \
  | openssl x509 -noout -subject -issuer -dates

# App
curl -fsS https://cloud.loop-coop.net/ocs/v1.php/cloud/capabilities -H "OCS-APIRequest: true" | head -20

# Pod state
kubectl -n opencloud get pod,svc,ingress,pvc

# Object store activity
mc ls service-machine/opencloud --recursive | head

Toggles

Variable Default Effect
enable_tika true Apache Tika sidecar for full-text search
enable_web_extensions true Master switch for the chart's web-app extensions (drawio, external-sites, importer, json-viewer, progress-bars, unzip, pastebin).
enable_pastebin true Pastebin web extension (vendor patch — see CLAUDE.md). Requires enable_web_extensions=true. Stores snippets in each user's .space/pastebin/ WebDAV folder.
external_sites 4-entry default Sites surfaced by the External Sites extension; see Web extensions below. Empty list = upstream default (no sites).
enable_onlyoffice false OnlyOffice documentserver. Uses shared-pg for coAuthoring.
enable_collabora false Collabora CODE alternative to OnlyOffice.

Set in a *.auto.tfvars file (gitignored, since .gitignore excludes *.tfplan but you may want to commit a tracked enable.tfvars).

Web extensions

OpenCloud surfaces extra web apps from the opencloud-eu/web-extensions repo. The chart handles seven of them; everything is enabled by default while enable_web_extensions = true.

App What it does Backend
Draw.io Diagram editor (preferred over mermaid/ASCII for diagrams in shared spaces) Client-side
External Sites Adds external URLs to the navigation menu (configured via var.external_sites — see below) Client-side
Importer Bulk file/folder upload Native (WebDAV)
JSON Viewer Pretty-printed JSON preview Client-side
Progress Bars Upload/download progress indicator Client-side
Unzip Extract .zip archives in place Native (WebDAV)
Pastebin Text-snippet sharing via public links; files land in .space/pastebin/ Native (WebDAV) — no extra service

To add or remove a single extension without disabling the whole set, edit the webExtensions.extensions.<name>.enabled block in values/opencloud.yaml.tftpl.

External Sites

The external_sites variable renders into a ConfigMap (opencloud-external-sites-config) that the OpenCloud pod's init-web-extensions step overlays onto the upstream-shipped config.json. Schema follows web-app-external-sites:

external_sites = [
  {
    name     = "Forgejo"
    url      = "https://git.loop-coop.net"
    target   = "external"            # "external" (new tab) or "embedded" (iframe)
    color    = "#FB923C"             # icon background hex
    icon     = "git-branch"          # Remix Icon name
    priority = 10                    # menu ordering (lower = higher in list)
  },
  # ...
]

After editing var.external_sites and running tofu apply, the ConfigMap updates but the pod doesn't auto-roll. Force a re-init:

kubectl -n opencloud rollout restart statefulset/opencloud

Setting external_sites = [] removes the ConfigMap mount and falls back to the upstream-shipped (empty) config.json — the External Sites extension stays installed but the menu is empty.

Admin

Stefan is the platform admin, designated via OC_ADMIN_USER_ID in the chart's env override (opencloud.env in the values template). The settings service re-applies the admin role on each pod start; promote or demote by editing the env value and rolling the deployment.

App store, admin settings, and other admin-only views are gated on this role.

The current OC_ADMIN_USER_ID value is the chart's seed-admin id (admin@example.org, basic-auth via the opencloud-admin k8s Secret). Stefan's internal UUID is regenerated on first OIDC login when PROXY_AUTOPROVISION_ACCOUNTS=true (the default since 2026-05-15) and the IDM's opencloud-opencloud-config PVC is fresh — bump the env to the new UUID once it surfaces in proxy logs (opaque_id:"<UUID>" on the first auth call) to grant stefan admin. See CLAUDE.md § Admin user for the full recovery sequence.

Backup

The opencloud-pg-backup MinIO bucket and opencloud-pg-backup service account are pre-provisioned. Wire them into the shared-pg CNPG cluster's Cluster.spec.backup.barmanObjectStore (additional endpoint) when the opencloud database has real data — the moment OnlyOffice is enabled or future features land.

For the OpenCloud user-data MinIO bucket, the backup story is the same as Nextcloud's: rely on the cluster's existing offsite MinIO replication. No app-side dump is required since metadata is in NATS+S3.

  • ~/work/.attic/legacy/nextcloud-decommissioned-2026-04-29/ — sibling project; same plumbing pattern.
  • ~/work/platform/talos-hcloud-cluster/ — cluster + Loop Portal + OIDC client registration.
  • ~/aienv/openbao/tofu/ — OpenBao policies and qube-mesh PKI.