No description
  • Python 81.5%
  • HTML 14.2%
  • Shell 1.6%
  • HCL 1.1%
  • Dockerfile 0.8%
  • Other 0.7%
Find a file
claude 03a33f508c chore(deps): bump caddy 2.10 → 2.11.3-alpine in qubes-mesh-proxy
Dependency-Track flagged caddy:2.10-alpine with 46 HIGH severity
findings in the base image; 2.11.3-alpine cuts those to ~16. Drop-in
replacement — Caddyfile + mTLS-proxy behaviour unchanged.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-15 15:30:32 +02:00
.woodpecker chore(ci): swap python:3.13-slim → shared-python-base-3.13:v1 2026-05-13 13:20:02 +02:00
datasette feat: rebrand to Die Bingerin + ratsinfo.bingerinnen.de + hcloud DNS 2026-05-11 07:49:57 +02:00
docker fix(entrypoint): route bekanntmachungen + enrich-niederschrift CLI commands 2026-05-14 15:25:10 +02:00
docs feat(oparl): read-only OParl 1.1 emitter at /oparl/v1.1 2026-05-14 16:44:12 +02:00
helm/bingen-ratsinfo chore(deps): bump caddy 2.10 → 2.11.3-alpine in qubes-mesh-proxy 2026-05-15 15:30:32 +02:00
migrations feat(verwaltung): politische Verantwortung — Dezernat → Amt 2026-05-14 19:27:49 +02:00
public feat: rebrand to Die Bingerin + ratsinfo.bingerinnen.de + hcloud DNS 2026-05-11 07:49:57 +02:00
scripts refactor(helm): drive cronjob schedules from the Source registry 2026-05-14 18:01:42 +02:00
src/bingen_ratsinfo fix(pdf): leave a note where a table got dropped + tighten filter 2026-05-15 06:06:59 +02:00
tests fix(pdf): leave a note where a table got dropped + tighten filter 2026-05-15 06:06:59 +02:00
tofu feat: rebrand to Die Bingerin + ratsinfo.bingerinnen.de + hcloud DNS 2026-05-11 07:49:57 +02:00
.dockerignore init: bingen-ratsinfo skeleton 2026-05-10 21:09:55 +02:00
.gitignore feat(web): build Tailwind statically, eliminate runtime CDN behaviour 2026-05-11 20:02:21 +02:00
alembic.ini init: bingen-ratsinfo skeleton 2026-05-10 21:09:55 +02:00
bao.yml fix(bao): states paths must carry the 'states/' prefix 2026-05-10 21:46:51 +02:00
CHANGELOG.md chore(release): v2.19.5 [skip ci] 2026-05-15 04:09:48 +00:00
Dockerfile feat(web): build Tailwind statically, eliminate runtime CDN behaviour 2026-05-11 20:02:21 +02:00
pyproject.toml feat(web): service-hardening Wave 4 — psycopg pool, shared httpx, statement timeout 2026-05-11 18:43:28 +02:00
README.md feat!: drop /presse + /haushalt features and their tables 2026-05-13 08:46:39 +02:00
tailwind.config.js feat(web): build Tailwind statically, eliminate runtime CDN behaviour 2026-05-11 20:02:21 +02:00

bingen-ratsinfo

Status (2026-05-13): re-installed. Live at https://ratsinfo.bingerinnen.de on chart v1.6.2, web-only — all crawl/ingest CronJobs are intentionally enabled: false in values-prod.yaml (flip individually when ready to resume ingestion). CI builds + publishes chart and image; no auto-deploy.

Reactivation recipe (after a prior uninstall):

kubectl create ns bingen-ratsinfo                        # NB: ns name = bingen-ratsinfo
kubectl -n bingen-ratsinfo create secret docker-registry forgejo-pull \
  --docker-server=git.loop-coop.net \
  --docker-username=claude \
  --docker-password="$(bao-get forgejo FORGEJO_TOKEN)" \
  --docker-email=ratsinfo@bingerin.de
helm install ratsinfo oci://git.loop-coop.net/loco/charts/bingen-ratsinfo \
  --version <latest> -n bingen-ratsinfo -f helm/bingen-ratsinfo/values-prod.yaml

The OpenBao k8s-auth role is ns-bingen-ratsinfo and binds the namespace name bingen-ratsinfo — installing into any other ns fails with "namespace not authorized" on SecretStore. The forgejo-pull docker-registry secret is not templated by the chart and disappears with the namespace on uninstall, so it must be recreated before pods can pull the image.

DB bingen_ratsinfo on shared-pg and the vault tree secret/workloads/bingen-ratsinfo/* survive uninstalls untouched.

Scope shrink (2026-05-13): the VRM press feed (/presse) and the bingen.de Haushaltspläne (/haushalt) features were removed in migration 0013 — UI routes, CronJobs (press, haushalt), parsers, CLI subcommands, and the press_article / budget_document / budget_table tables are gone. Re-introducing them would require reverting 0013, restoring data from a pre-0013 backup, and re-implementing the deleted modules.

Scraper, structured store, FastAPI web frontend and analytics pipeline for the Bingen am Rhein municipal council information system (ALLRIS net 3.9.5, https://www.sitzungsdienst-bingen.de/bi/).

Operated by Die Bingerin — a non-profit civic initiative for municipal cooperation. Public site: https://ratsinfo.bingerinnen.de.

Why

Bingen has not enabled an OParl module on its ALLRIS instance, so the only structured access to council data is HTML scraping plus PDF text extraction. We do that politely (identifying User-Agent, per-host throttle, content-addressed mirror for official works) and re-expose the result via a FastAPI site for citizens and a structured database for analytical work.

What's in the repo

src/bingen_ratsinfo/
├── crawl.py             ALLRIS calendar + details + Vorlage + Beschluss extractors
├── parsers/             HTML / iCal / Beschluss-PDF parsers
├── db.py                psycopg3 helpers + pool for the web process
├── pdf.py               media-api integration (Tika + OCR fallback)
├── blob.py              S3 mirror for mirrorable PDF types
├── notify.py            Atom feed + Markdown digest generators
├── store.py             SQL upsert helpers (Person, Membership, …)
├── schema.py            OParl-shape dataclasses
└── web/                 FastAPI app, OIDC auth, security headers, OTel
migrations/              alembic; canonical + `enrich` schema
helm/bingen-ratsinfo/    chart: Deployment + CronJobs + ingresses + NP
docs/                    architecture, hardening plan, operations runbook

Status (2026-05-11)

The app is live at https://ratsinfo.bingerinnen.de and https://ratsinfo.dev.loop-coop.net (mesh-internal, mTLS) on the THC cluster.

Working today:

  • Calendar crawl (iCal + HTML monthly backfill)
  • TOPs + Vorlagen + file ingest via to010/to020/vo020
  • PDF full-text extraction via media-api (Tika → OCR fallback)
  • People (Stadtrat / Ausschüsse / Fraktionen) with period-aware memberships
  • Beschluss vorschlag extraction from Vorlage PDFs (Beschlussempfehlung: mining; enacted Beschlüsse are not published by ALLRIS-Bingen)
  • FastAPI web frontend with OIDC (loop-portal), JSON logs, OTel tracing, tight CSP (no 'unsafe-inline') + HSTS + the rest of the modern header set, rate-limit on hot routes, SSRF guard on the file proxy, audit log, idle-timeout enforcement, psycopg pool with 5 s statement timeout, shared httpx.AsyncClient — full operational posture, see docs/service-hardening-plan.md for the per-gap map

Enrichment effort paused (2026-05-11): the L1 party-website crawler shipped, ran, and was removed — the Bingen fraction sites don't link member subpages and produce mostly meta-description noise, so the data wasn't worth the maintenance. The architecture in docs/enrichment-architecture.md is kept as a forward-looking design record. L4 (voting alignment) is independently blocked by ALLRIS-Bingen not publishing Niederschriften for Stadtrat / Ausschüsse meetings.

Local development

scripts/dev.sh port-forwards the cluster's shared-pg Postgres and serves the FastAPI app at http://127.0.0.1:8765 with auto-reload. Reads land in prod data, writes are technically possible but flagged in the script's header — treat the session as read-mostly.

scripts/dev.sh up        # port-forward + uvicorn --reload
scripts/dev.sh status    # show pf + uvicorn state
scripts/dev.sh logs      # tail uvicorn output
scripts/dev.sh psql      # psql shell against the port-forwarded DB
scripts/dev.sh migrate   # alembic upgrade head (use with care — prod)
scripts/dev.sh down      # stop pf + uvicorn

First run creates .venv/ and installs -e '.[dev,web,postgres,pdf]'; subsequent runs reuse it. Code changes hot-reload.

Auth is off by default in dev because the script doesn't set BINGEN_OIDC_CLIENT_ID; every request flows through as the anonymous user.

CLI

The package installs a bingen-ratsinfo entry point. The full subcommand list (each is also wired as a CronJob in the chart):

bingen-ratsinfo crawl                       # iCal forward window
bingen-ratsinfo crawl --year 2025           # HTML backfill for one year
bingen-ratsinfo details --since 2025-01-01  # TOPs + Vorlagen + Beschluss
bingen-ratsinfo extract                     # PDFs → full_text via media-api
bingen-ratsinfo mirror                      # re-download mirrored PDFs
bingen-ratsinfo people                      # Stadtrat + Ausschüsse + Fraktionen
bingen-ratsinfo discover --term "Klima"     # find missing Vorlagen via search
bingen-ratsinfo feed --out public/feed.atom
bingen-ratsinfo digest --since 2026-05-01 --out public/digest.md
bingen-ratsinfo topics --topic traffic_mobility
bingen-ratsinfo list --upcoming

DB connection: BINGEN_DB_DSN (psycopg DSN). Throttle: BINGEN_DELAY=N seconds between ALLRIS requests (default 5). media-api: BINGEN_MEDIA_API_URL (default http://localhost:8096).

Data model (Postgres)

Table Contents
meeting meetings (Stadtrat + Ausschüsse + Beiräte)
agenda_item TOPs per meeting + Beschluss text + kind (vorschlag/unbekannt) + vote counts
paper Vorlagen / Anträge / Beschlussvorlagen with Aktenzeichen
file mirrored PDFs (Vorlage, Beschluss, Niederschrift, Bekanntmachung, …) + sha256 + S3 path + full_text
person, organization, membership OParl-shape people + groups + period-aware memberships
legislative_period Wahlperioden seeded (201419, 201924, 202429)

Topic definitions live in src/bingen_ratsinfo/topics.yaml.

Architecture

ALLRIS (HTML + PDF)
   │
   │  httpx — rate-limited 5s/host, identifying UA, iso-8859-1 decode
   ▼
┌─────────────────────────────────────────────────────────────┐
│  Canonical schema  (Postgres / CNPG shared-pg)              │
│  meetings · agenda_items · papers · files · people · …      │
│  + Beschlussempfehlung mined from Vorlage PDFs              │
└─────────────────────────────────────────────────────────────┘
   │                                       ▲
   │ Tika / OCR (media-api)                │
   ▼                                       │
file.full_text  ─────────────────▶  agenda_item.beschluss_text
   │
   ▼
FastAPI web service  ────▶  https://ratsinfo.bingerinnen.de
   │                         (OIDC via loop-portal · OTel · CSP · rate-limit)
   ▼
events.jsonl + Atom feed + Markdown digest

OParl vocabulary mapping: docs/oparl-mapping.md. Enrichment layer model: docs/enrichment-architecture.md. Web service hardening: docs/service-hardening-plan.md. Cluster operations: docs/operations.md.

Etiquette

  • Identifying User-Agent (with contact email).
  • Global throttle: 5 s between ALLRIS requests (override via BINGEN_DELAY=N). ALLRIS-Bingen is a small municipal site — we're guests.
  • iCal > HTML where possible: one iCal request returns ~6-12 months of forward window.
  • Incremental: SHA-based content-hash diff avoids re-fetching unchanged Vorlagen.
  • Cron schedules avoid 08:00-18:00 local where practical.
  • Only mirror official works (§ 5 UrhG): Beschlüsse, Niederschriften, Vorlagen, Bekanntmachungen, Einladungen, Tagesordnungen. Anlagen are linked, not mirrored.

License

AGPL-3.0-or-later.