No description
  • Python 99%
  • Dockerfile 1%
Find a file
statevault 1ac6e65fc4 chore: re-trigger CI after shared-python-base:v1 publish
Previous pipeline (with the shared-python-base:v1 swap) failed at
the lint step because the base image didn't exist yet — ci-shared #6
hit a Woodpecker dedup race during the three-tag burst and got
unstuck only via base-python-v1.0.1 (commit ea55fbd).

The base image is now in the registry; retry should be clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 12:22:29 +02:00
.woodpecker chore(ci): pin shared-ci-shared:v1, swap to shared-python-base:v1 2026-05-13 12:03:54 +02:00
docs feat: initial web-api — search + fetch + extract coordinator 2026-05-09 21:01:34 +02:00
tests feat(search): add web_image_search MCP tool + REST route 2026-05-10 15:29:00 +02:00
web_api feat(search): add web_image_search MCP tool + REST route 2026-05-10 15:29:00 +02:00
.gitignore feat: initial web-api — search + fetch + extract coordinator 2026-05-09 21:01:34 +02:00
bao.yml feat: initial web-api — search + fetch + extract coordinator 2026-05-09 21:01:34 +02:00
CHANGELOG.md chore(release): v0.3.0 [skip ci] 2026-05-10 13:35:11 +00:00
CLAUDE.md feat(search): add web_image_search MCP tool + REST route 2026-05-10 15:29:00 +02:00
Containerfile feat: initial web-api — search + fetch + extract coordinator 2026-05-09 21:01:34 +02:00
pytest.ini feat: initial web-api — search + fetch + extract coordinator 2026-05-09 21:01:34 +02:00
README.md feat(search): add web_image_search MCP tool + REST route 2026-05-10 15:29:00 +02:00
requirements-dev.txt feat: initial web-api — search + fetch + extract coordinator 2026-05-09 21:01:34 +02:00
requirements.txt feat: initial web-api — search + fetch + extract coordinator 2026-05-09 21:01:34 +02:00

web-api

Local search + fetch + extract coordinator for the aienv stack. Replaces the legacy ollama-qube web-fetch service (port :8081, retired).

claude/flex qube ──qrexec tunnel──▶ api qube :8097 (web-api)
                                        │
                                        ├── http://localhost:8098  SearXNG (loopback)
                                        │       └── Google / Brave / DDG / Bing / Wikipedia
                                        │
                                        └── outbound HTTPS via sys-wg → page fetch + Trafilatura
  • Federated search through a local SearXNG container (no provider API keys needed in v1).
  • Boilerplate-stripped Markdown via Trafilatura.
  • sqlite cache for both searches and pages (TTLs configurable). Keeps re-runs of the same research cheap and stops Claude from re-hitting upstreams on retry.
  • One-shot research endpoint: search → fetch top-N → bundle.
  • Image search through SearXNG categories=images — preserves img_src / thumbnail_src / resolution / img_format.
  • MCP server mounted at /mcp — four tools: web_search, web_image_search, web_fetch, web_research.

Not a web crawler. No JS execution (yet — Playwright fallback is a v2 candidate). No write surface. Only http(s) URLs accepted.

Endpoints

Route Method Notes
/health GET version + searxng probe + cache row counts
/v1/search GET ?q=&engines=&count=&language=&safesearch=&fresh=
/v1/images/search GET ?q=&engines=&count=&language=&safesearch=&fresh= — image-shaped results
/v1/fetch GET ?url=&max_chars=&fresh=
/v1/research POST JSON body, see web_api/app.py
/v1/cache/stats GET row counts
/mcp/ * FastMCP Streamable HTTP

Full design: docs/design.md.

Run locally

pip install -r requirements.txt
SEARXNG_URL=http://localhost:8098 CACHE_DB=/tmp/web-api.sqlite \
  python3 -m web_api

Tests:

pip install -r requirements.txt pytest pytest-asyncio respx
pytest tests/

Deploy

Quadlet at ~/aienv/api/services/web-api/web-api.container pulls git.loop-coop.net/loco/workloads-web-api:latest and depends on searxng.service. SearXNG settings live alongside in ~/aienv/api/services/searxng/.

Versioning + release

Same flow as media-api / mem0-api-server — Conventional Commits, push to main, bump.yml tags, release.yml builds + pushes to the local Forgejo registry, api qube auto-pulls via Pull=newer.