- Python 81.5%
- HTML 14.2%
- Shell 1.6%
- HCL 1.1%
- Dockerfile 0.8%
- Other 0.7%
Dependency-Track flagged caddy:2.10-alpine with 46 HIGH severity findings in the base image; 2.11.3-alpine cuts those to ~16. Drop-in replacement — Caddyfile + mTLS-proxy behaviour unchanged. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
||
|---|---|---|
| .woodpecker | ||
| datasette | ||
| docker | ||
| docs | ||
| helm/bingen-ratsinfo | ||
| migrations | ||
| public | ||
| scripts | ||
| src/bingen_ratsinfo | ||
| tests | ||
| tofu | ||
| .dockerignore | ||
| .gitignore | ||
| alembic.ini | ||
| bao.yml | ||
| CHANGELOG.md | ||
| Dockerfile | ||
| pyproject.toml | ||
| README.md | ||
| tailwind.config.js | ||
bingen-ratsinfo
Status (2026-05-13): re-installed. Live at https://ratsinfo.bingerinnen.de on chart v1.6.2, web-only — all crawl/ingest CronJobs are intentionally
enabled: falseinvalues-prod.yaml(flip individually when ready to resume ingestion). CI builds + publishes chart and image; no auto-deploy.Reactivation recipe (after a prior uninstall):
kubectl create ns bingen-ratsinfo # NB: ns name = bingen-ratsinfo kubectl -n bingen-ratsinfo create secret docker-registry forgejo-pull \ --docker-server=git.loop-coop.net \ --docker-username=claude \ --docker-password="$(bao-get forgejo FORGEJO_TOKEN)" \ --docker-email=ratsinfo@bingerin.de helm install ratsinfo oci://git.loop-coop.net/loco/charts/bingen-ratsinfo \ --version <latest> -n bingen-ratsinfo -f helm/bingen-ratsinfo/values-prod.yamlThe OpenBao k8s-auth role is
ns-bingen-ratsinfoand binds the namespace namebingen-ratsinfo— installing into any other ns fails with "namespace not authorized" on SecretStore. Theforgejo-pulldocker-registry secret is not templated by the chart and disappears with the namespace on uninstall, so it must be recreated before pods can pull the image.DB
bingen_ratsinfoon shared-pg and the vault treesecret/workloads/bingen-ratsinfo/*survive uninstalls untouched.Scope shrink (2026-05-13): the VRM press feed (
/presse) and the bingen.de Haushaltspläne (/haushalt) features were removed in migration 0013 — UI routes, CronJobs (press,haushalt), parsers, CLI subcommands, and thepress_article/budget_document/budget_tabletables are gone. Re-introducing them would require reverting 0013, restoring data from a pre-0013 backup, and re-implementing the deleted modules.
Scraper, structured store, FastAPI web frontend and analytics pipeline for the Bingen am Rhein municipal council information system (ALLRIS net 3.9.5, https://www.sitzungsdienst-bingen.de/bi/).
Operated by Die Bingerin — a non-profit civic initiative for municipal cooperation. Public site: https://ratsinfo.bingerinnen.de.
Why
Bingen has not enabled an OParl module on its ALLRIS instance, so the only structured access to council data is HTML scraping plus PDF text extraction. We do that politely (identifying User-Agent, per-host throttle, content-addressed mirror for official works) and re-expose the result via a FastAPI site for citizens and a structured database for analytical work.
What's in the repo
src/bingen_ratsinfo/
├── crawl.py ALLRIS calendar + details + Vorlage + Beschluss extractors
├── parsers/ HTML / iCal / Beschluss-PDF parsers
├── db.py psycopg3 helpers + pool for the web process
├── pdf.py media-api integration (Tika + OCR fallback)
├── blob.py S3 mirror for mirrorable PDF types
├── notify.py Atom feed + Markdown digest generators
├── store.py SQL upsert helpers (Person, Membership, …)
├── schema.py OParl-shape dataclasses
└── web/ FastAPI app, OIDC auth, security headers, OTel
migrations/ alembic; canonical + `enrich` schema
helm/bingen-ratsinfo/ chart: Deployment + CronJobs + ingresses + NP
docs/ architecture, hardening plan, operations runbook
Status (2026-05-11)
The app is live at https://ratsinfo.bingerinnen.de and https://ratsinfo.dev.loop-coop.net (mesh-internal, mTLS) on the THC cluster.
Working today:
- Calendar crawl (iCal + HTML monthly backfill)
- TOPs + Vorlagen + file ingest via to010/to020/vo020
- PDF full-text extraction via media-api (Tika → OCR fallback)
- People (Stadtrat / Ausschüsse / Fraktionen) with period-aware memberships
- Beschluss vorschlag extraction from Vorlage PDFs
(
Beschlussempfehlung:mining; enacted Beschlüsse are not published by ALLRIS-Bingen) - FastAPI web frontend with OIDC (loop-portal), JSON logs, OTel
tracing, tight CSP (no
'unsafe-inline') + HSTS + the rest of the modern header set, rate-limit on hot routes, SSRF guard on the file proxy, audit log, idle-timeout enforcement, psycopg pool with 5 s statement timeout, sharedhttpx.AsyncClient— full operational posture, seedocs/service-hardening-plan.mdfor the per-gap map
Enrichment effort paused (2026-05-11): the L1 party-website
crawler shipped, ran, and was removed — the Bingen fraction sites
don't link member subpages and produce mostly meta-description
noise, so the data wasn't worth the maintenance. The architecture
in docs/enrichment-architecture.md is kept as a forward-looking
design record. L4 (voting alignment) is independently blocked by
ALLRIS-Bingen not publishing Niederschriften for Stadtrat /
Ausschüsse meetings.
Local development
scripts/dev.sh port-forwards the cluster's shared-pg Postgres and
serves the FastAPI app at http://127.0.0.1:8765 with auto-reload.
Reads land in prod data, writes are technically possible but flagged
in the script's header — treat the session as read-mostly.
scripts/dev.sh up # port-forward + uvicorn --reload
scripts/dev.sh status # show pf + uvicorn state
scripts/dev.sh logs # tail uvicorn output
scripts/dev.sh psql # psql shell against the port-forwarded DB
scripts/dev.sh migrate # alembic upgrade head (use with care — prod)
scripts/dev.sh down # stop pf + uvicorn
First run creates .venv/ and installs -e '.[dev,web,postgres,pdf]';
subsequent runs reuse it. Code changes hot-reload.
Auth is off by default in dev because the script doesn't set
BINGEN_OIDC_CLIENT_ID; every request flows through as the
anonymous user.
CLI
The package installs a bingen-ratsinfo entry point. The full
subcommand list (each is also wired as a CronJob in the chart):
bingen-ratsinfo crawl # iCal forward window
bingen-ratsinfo crawl --year 2025 # HTML backfill for one year
bingen-ratsinfo details --since 2025-01-01 # TOPs + Vorlagen + Beschluss
bingen-ratsinfo extract # PDFs → full_text via media-api
bingen-ratsinfo mirror # re-download mirrored PDFs
bingen-ratsinfo people # Stadtrat + Ausschüsse + Fraktionen
bingen-ratsinfo discover --term "Klima" # find missing Vorlagen via search
bingen-ratsinfo feed --out public/feed.atom
bingen-ratsinfo digest --since 2026-05-01 --out public/digest.md
bingen-ratsinfo topics --topic traffic_mobility
bingen-ratsinfo list --upcoming
DB connection: BINGEN_DB_DSN (psycopg DSN). Throttle: BINGEN_DELAY=N
seconds between ALLRIS requests (default 5). media-api:
BINGEN_MEDIA_API_URL (default http://localhost:8096).
Data model (Postgres)
| Table | Contents |
|---|---|
meeting |
meetings (Stadtrat + Ausschüsse + Beiräte) |
agenda_item |
TOPs per meeting + Beschluss text + kind (vorschlag/unbekannt) + vote counts |
paper |
Vorlagen / Anträge / Beschlussvorlagen with Aktenzeichen |
file |
mirrored PDFs (Vorlage, Beschluss, Niederschrift, Bekanntmachung, …) + sha256 + S3 path + full_text |
person, organization, membership |
OParl-shape people + groups + period-aware memberships |
legislative_period |
Wahlperioden seeded (2014–19, 2019–24, 2024–29) |
Topic definitions live in src/bingen_ratsinfo/topics.yaml.
Architecture
ALLRIS (HTML + PDF)
│
│ httpx — rate-limited 5s/host, identifying UA, iso-8859-1 decode
▼
┌─────────────────────────────────────────────────────────────┐
│ Canonical schema (Postgres / CNPG shared-pg) │
│ meetings · agenda_items · papers · files · people · … │
│ + Beschlussempfehlung mined from Vorlage PDFs │
└─────────────────────────────────────────────────────────────┘
│ ▲
│ Tika / OCR (media-api) │
▼ │
file.full_text ─────────────────▶ agenda_item.beschluss_text
│
▼
FastAPI web service ────▶ https://ratsinfo.bingerinnen.de
│ (OIDC via loop-portal · OTel · CSP · rate-limit)
▼
events.jsonl + Atom feed + Markdown digest
OParl vocabulary mapping: docs/oparl-mapping.md.
Enrichment layer model: docs/enrichment-architecture.md.
Web service hardening: docs/service-hardening-plan.md.
Cluster operations: docs/operations.md.
Etiquette
- Identifying User-Agent (with contact email).
- Global throttle: 5 s between ALLRIS requests (override via
BINGEN_DELAY=N). ALLRIS-Bingen is a small municipal site — we're guests. - iCal > HTML where possible: one iCal request returns ~6-12 months of forward window.
- Incremental: SHA-based content-hash diff avoids re-fetching unchanged Vorlagen.
- Cron schedules avoid 08:00-18:00 local where practical.
- Only mirror official works (§ 5 UrhG): Beschlüsse, Niederschriften, Vorlagen, Bekanntmachungen, Einladungen, Tagesordnungen. Anlagen are linked, not mirrored.
License
AGPL-3.0-or-later.