DecentWeb HTTP Bridge — Gap Analysis and Design
Version: 0.2
Status: Draft
Purpose: Review the current decentweb implementation as a foundation for an
HTTP bridge that any browser can use to read and publish content on the DecentWeb
network. Identify gaps, misalignments, and concrete changes needed to produce a
usable browser-accessible node.
Companion document: decentweb-design.md (protocol goals and layer definitions).
1. What the Current Implementation Provides
https://github.com/diatribes/decentweb
The implementation is a DHT node daemon (dwd) and a CLI tool (dw), written in C
with Bazel. It covers the protocol stack reasonably well at the low level:
| What works | Where |
|---|---|
| Ed25519 keypair generation and storage | dw keygen, src/identity |
| btpk magnet address generation and parsing | dw_identity_magnet, dw_identity_parse_magnet |
| BEP-44 mutable item publish and resolve | src/feed, dwd |
| DHT node with routing-table persistence | dwd, libdht |
| Content bundle packing, hashing (SHA1), parsing | src/content/bundle.c |
| Feed manifest packing and parsing | src/content/manifest.c |
| Hash-verified fetch from a content cache | src/content/transport_http.c |
Full write path: dw publish | apps/dw |
Full read path: dw get | apps/dw, dwd get mode |
Unix control socket IPC (RESOLVE, PUBLISH) | dwd |
| Feed liveness: periodic BEP-44 re-announce | dwd re-publish loop |
What the implementation does not provide is any HTTP interface that a browser
can talk to. srv serves raw files out of a docroot by hash — it is the local
content cache half of the bridge — but there is no request handler that takes a
btpk magnet from the URL, resolves it over the DHT, downloads the bundle, and
returns sanitized HTML to a browser. There is no feed reader view, no subscription
management, and no HTML sanitization pipeline. The gap between “protocol node plus
local cache” and “HTTP bridge” is that connecting request-handling layer.
2. The Intended Request Flow
A browser request through the bridge looks like this:
| |
The bridge is a local caching proxy between the browser and the DecentWeb network. It fetches and verifies content on behalf of the browser, stores it locally in its docroot, and serves sanitized HTML. The browser only ever talks to the bridge over HTTP; it never touches the DHT or the content network.
Content that has already been fetched is served directly from the local docroot without hitting the network again. The docroot is the bridge’s durable content store: it survives restarts and grows as new content is fetched.
Publishers have no HTTP server requirement in this model. A publisher runs dwd
to participate in the DHT and push BEP-44 updates. Readers’ bridges discover and
fetch the content. The publisher’s hosting obligation ends at DHT participation.
3. Design Goal Alignment
The following table assesses each non-negotiable design goal from decentweb-design.md
against the current state and flags where the bridge must take care.
3.1 Core Properties
| Goal | Current state | Bridge concern |
|---|---|---|
| Zero outbound requests from content | Not enforced. Bundles are written to disk as-is; raw HTML may contain external src/href. | The bridge must sanitize all HTML before serving it to a browser. Without this the goal fails the moment a browser renders a page. |
| No hosting cost for publishers | Met by the model: publishers only need DHT access; the reader’s bridge caches content locally. | The current fetch transport requires an existing HTTP cache to retrieve content from. Until direct DHT/BitTorrent piece transfer is implemented, a bootstrap mirror is still needed. This is an interim implementation gap, not a design flaw. |
| No central index | Met at the protocol layer (DHT). | The bridge’s local SQLite index is per-instance and not shared — this is correct. |
| Content integrity without CAs | Met: each fetch is hash-verified; the feed is Ed25519-signed. | The bridge should surface this to the browser (e.g. a visible “signature verified” indicator on each post). |
| Natural content expiry | Met by the model: DHT liveness and swarm seeder attrition handle expiry. Locally cached content in docroot persists until the operator clears it. | The bridge cache is the operator’s own machine; operator controls retention. This is the right place for the decision. |
| Format simplicity | Bundle packing accepts any file names; no format enforcement at pack or serve time. | The bridge must refuse to serve or must transcode any file outside the allowed set (HTML, CSS, WebP, WebM, WOFF2). |
| No required old-web dependency | Partially violated in the current implementation: dw get requires an HTTP mirror URL to be passed on the CLI. The btpk magnet alone is not sufficient to fetch content today. | Direct DHT/BitTorrent download is the goal. The mirror bootstrap is a known interim step, not a permanent dependency. |
3.2 Secondary Goals
| Goal | Current state | Bridge concern |
|---|---|---|
| Author identity portability | Keypair is a local file; no server-side custody. | A bridge with a publishing flow needs encrypted server-side key storage and a key-export mechanism. |
| Reader anonymity by default | No accounts needed to read. | The bridge server itself observes what content the user fetches. This is inherent to the bridge model (the bridge is the swarm peer, not the browser). It must not log reading behaviour unnecessarily, and this limitation should be surfaced in the UI. |
| Graceful degradation | A feed that goes offline simply fails to resolve. No broken-link equivalent yet. | The bridge should show the last-known feed state with a “last fetched at X” indicator rather than a bare error. |
4. Gaps in the Current Implementation
The following are missing from the current implementation entirely and are required for a usable browser-facing bridge.
4.1 No request handler connecting the browser to the DHT
srv serves files from docroot by hash. dwd resolves and publishes BEP-44
items. Nothing connects them in response to a browser request. The bridge needs a
request handler that:
- Accepts a btpk magnet or bundle hash in the URL path.
- Checks the local docroot cache.
- On a cache miss, asks
dwdto resolve the feed, fetches the manifest and bundle, verifies hashes, stores results in docroot. - Runs the content sanitization pass.
- Returns rendered HTML to the browser.
This is the bridge’s core missing component. Everything else builds on top of it.
4.2 Single post per feed
The current manifest schema:
| |
A feed has exactly one content bundle. There is no post history, no title, no
timestamp, no follows list. A feed reader model — a list of posts per feed, unread
counts, a post view for each — cannot be implemented on top of this schema.
This is the most fundamental schema gap. Nothing above the protocol layer works as a feed reader until a feed can hold multiple posts.
4.3 No manifest metadata
The manifest has no author-provided metadata: no feed title, no post titles, no publication timestamps, no description. The bridge would have no content to display in a feed list or post list beyond raw hashes.
4.4 No subscription persistence or background refresh
There is no subscription list and no background polling loop. dw get is a one-shot
command. A browser user has no way to see new content appear without manually running
CLI commands. The bridge needs a background thread that periodically resolves each
followed feed’s BEP-44 item, fetches new bundles, and updates the local index.
4.5 No local index or search
Layer 4 (discovery) is a stub. There is no SQLite database, no FTS5 index, and no subscription store. Search across fetched content and the feed/post list views both require a persistent local store.
4.6 No content sanitization pipeline
Before serving any bundle HTML to a browser, the bridge must:
- Strip all
<script>tags and event handler attributes (on*) - Strip or rewrite all
srcandhrefattributes pointing to remote URLs - Validate CSS against the permitted property set
- Reject or strip file types outside the permitted set (HTML, CSS, WebP, WebM, WOFF2)
None of this exists. dw_bundle_extract writes files to disk unchanged. Serving
raw bundle HTML to a browser would allow any external src attribute in the content
to make outbound requests, directly violating the core design guarantee.
4.7 No key management for the publishing flow
Key management is entirely manual: generate a file, keep it, pass its path on the CLI. For a bridge with a browser-based publish UI, this is not workable. The bridge needs encrypted server-side key storage (a key derived from a user passphrase protecting the Ed25519 seed in SQLite) and a key-export flow so users can back up and migrate their identity.
4.8 No btpk address display or QR code
The current implementation prints the magnet string on stdout. The bridge needs to present the user’s btpk address as a QR code and a copyable link, and to accept a pasted or scanned magnet on a subscribe/discover page. QR generation does not require JavaScript; a server-side generator can emit an inline SVG.
4.9 No multi-mirror fallback in the read path
dw get takes exactly one mirror URL on the command line. The manifest’s mirrors
list is used only as a fallback in a limited sense. The bridge should try each mirror
in the list in order before reporting failure. When the bridge publishes a feed it
should add its own accessible URL to the manifest’s mirrors list so other bridge
instances can retrieve cached content from it.
5. Implementation Choices That Do Not Align With the Design Goals
5.1 btpk encoding: hex vs. base32
identity.h (lines 3–8) documents the tension explicitly:
The DecentWeb article says “base32” but also claims BEP-46 compatibility, and BEP-46 uses hex, so we use hex.
decentweb-design.md (§3, §7) uses magnet:?xs=urn:btpk:<base32-encoded-public-key>.
BEP 46 (the draft that introduced btpk) specifies base32. Standard BitTorrent
clients (libtorrent-rasterbar and derivatives) produce and expect base32 btpk
magnets. A 64-character hex key and a 52-character base32 key are not the same
URL — a bridge using hex cannot exchange addresses with any client that follows
BEP 46 as written.
The implementation should switch to base32. The comment in identity.h should
become the resolved decision record, not an open note.
5.2 Content fetch requires an existing HTTP cache
The current transport:
| |
The cache server (srv) is the bridge’s own docroot server — there is no publisher
hosting burden here. But to do the first fetch of any content that is not already in
any accessible cache, the bridge currently has no mechanism: it cannot retrieve
content directly from the DHT/swarm when no HTTP cache has it yet. This is the gap
that BitTorrent piece transfer fills. Until then, the first fetch of any new content
depends on an accessible peer that is already serving it over HTTP.
The transport interface is pluggable (dw_content_fetch_http is one implementation).
Adding a BitTorrent transport behind the same seam is the correct next step and is
the only path to making the “no required old-web dependency” goal fully true.
5.3 SHA1 as the content hash
Bundle and manifest hashes are SHA1 (20 bytes), consistent with BitTorrent’s infohash width and BEP-44’s value field. This is not a correctness problem today, but SHA1 collision attacks exist in practice. A content network whose trust model rests entirely on hash-equality verification is worth moving to a collision-resistant hash (SHA-256, BLAKE3) before the wire format is finalised. This is a breaking change and needs coordination with the spec documents. Truncated SHA-256 (first 20 bytes) fits the BEP-44 value size constraint.
5.4 The Unix control socket protocol is too narrow
The current IPC protocol accepts two commands:
RESOLVE <pubkey-hex>→OK <hash>/NONE/ERRPUBLISH <keyfile> <hash>→OK/ERR
A bridge request handler needs substantially more from dwd:
- Subscribe to a feed (persist it, begin polling)
- Unsubscribe
- List subscriptions with status
- Fetch a specific bundle hash from the network into docroot
- Trigger an immediate feed refresh
- Report DHT ready state, peer count, uptime
The current protocol is a single line in each direction into a fixed buffer — not framed, not versioned, no structured data. It would need to be replaced with newline-delimited JSON or a small HTTP/1.1 API over the Unix socket before a bridge request handler can drive it.
5.5 Maintained-feeds list is not persisted
dwd keeps a linked list of feeds to re-publish (g_feeds) in memory only. A
restart loses it, and any feed that was maintained expires from the DHT roughly two
hours later. A bridge serving real users must survive restarts without silently
expiring its published feeds from the network.
5.6 Whole bundles held in memory
dw_bundle_parse and dw_manifest_parse read the entire payload into memory and
hold it as a bencode tree. For CLI use this is fine. For a bridge serving concurrent
requests, large bundles will exhaust memory. The bridge should enforce a maximum
bundle size, reject manifests where the referenced hash arrives with a
Content-Length exceeding the limit, and stream to disk rather than buffering fully
before verification.
5.7 IPv4-only DHT
dwd creates an IPv4-only UDP socket and skips IPv6 peers. IPv6 DHT participation
(BEP-32) is standard on modern public BitTorrent networks. A bridge node that cannot
participate in the IPv6 DHT is a second-class peer with reduced reach.
6. HTTP Bridge Architecture
Given the current implementation, the bridge adds one new component (dwh, or an
embedded HTTP handler in dwd) between srv/docroot and the browser:
| |
dwh is a new process that owns the browser-facing HTTP server, SQLite, and the
sanitization pipeline. It drives dwd via an extended IPC protocol. dwd remains
the DHT peer and is responsible for fetching content into the shared docroot. srv
continues to serve the docroot by hash — used both by dwd (as a peer-facing
mirror for other bridge instances) and by dwh (as the local cache read path).
7. Prioritised Recommendations
Changes ordered by impact on a browser user’s experience. Items in the same tier can be done in parallel.
Tier 1 — Prerequisites: nothing useful works without these
1a. Extend the manifest schema for multiple posts
The manifest must support a list of posts before a feed reader can exist. Proposed
minimal extension (backward-compatible: a manifest without posts falls back to the
current single-content behaviour):
| |
The content field should be kept for v1 read compatibility and deprecated in v2.
follows enables the trust-graph discovery path (Layer 4).
1b. Decide and commit to a btpk encoding
Pick base32 (spec-correct, BT-client-interoperable) or hex (current) and change
everything to match. The comment in identity.h should become the canonical decision
record. Recommendation: base32, for interoperability with any client that implements
BEP 46 as written.
1c. Add a content sanitization pass
Before serving any HTML to a browser, the bridge must:
- Parse the HTML (a minimal recursive-descent parser over the permitted element set is sufficient — a full DOM is not needed).
- Strip
<script>,<iframe>,<object>,<embed>, remote<link rel="stylesheet">, and allon*attributes. - Rewrite
<a href="...">for external destinations to open in a new context, with a visible “external link” marker. - Strip
src="http..."attributes on<img>,<video>,<source>. - Reject files with extensions outside {
.html,.css,.webp,.webm,.woff2}.
This is the enforcement mechanism for the “zero outbound requests from content” guarantee.
Tier 2 — Core UX: a usable feed reader
2a. Add SQLite for state
The bridge needs at minimum: feeds (pubkey, display title, last-fetched), posts
(feed key, bundle hash, title, timestamp), fts_index (FTS5 virtual table over post
content), and keypairs (for publishing users, encrypted with a passphrase-derived
key).
2b. Background feed refresh loop
A thread in dwh that:
- Reads the subscribed feed list from SQLite.
- On a configurable interval (default 15 minutes), resolves each feed’s BEP-44 item.
- Fetches the manifest and any new bundle hashes not yet in the local post store.
- Updates the SQLite index.
Until this exists, the bridge shows a static snapshot that never updates.
2c. Persist the maintained-feeds list in dwd
dwd should write its maintained-feeds list to dwd.state alongside the routing
table and restore it on startup. A restart must not silently expire published feeds.
2d. Extend the IPC protocol
Replace the current single-line protocol with newline-delimited JSON. New commands the bridge needs:
| Command | Purpose |
|---|---|
SUBSCRIBE <pubkey> | Add feed to polling list |
UNSUBSCRIBE <pubkey> | Remove feed |
LIST_FEEDS | Return subscribed feeds with last-resolve status |
FETCH_BUNDLE <hash> | Fetch a bundle into docroot by hash |
STATUS | Return DHT ready state, peer count, uptime |
Tier 3 — Quality and completeness
3a. Serve the feed reader as clean, no-JS HTML
All bridge pages must be complete server-rendered HTML. No JavaScript dependency for any core function. Minimum views:
- Feed list (sidebar): followed feeds with unread post count
- Post list: most recent posts from selected feed or all feeds
- Post view: sanitized bundle HTML, served inline
- Search results: FTS5 query across indexed post content
- Discover: paste or type a btpk magnet to subscribe
- Publish: compose a post under a managed keypair
- Profile: user’s btpk address as QR code and copyable link
3b. btpk address as QR code
A server-side QR generator outputs an inline SVG or WebP. The profile page shows the
user’s btpk magnet as a QR code and as selectable text. The discover page accepts a
pasted or typed magnet and begins a subscription.
3c. Enforce a bundle size limit
Define a maximum bundle size (suggested: 50 MB) and enforce it at both fetch time
(reject before reading the body if Content-Length exceeds the limit) and pack time
(dw bundle). This prevents memory exhaustion in both dwd and dwh.
3d. Full multi-mirror fallback
When fetching a bundle or manifest, try every mirror in the list before reporting
failure. When the bridge publishes a feed, add its own accessible URL to the
manifest’s mirrors list so other bridge instances can retrieve cached content
from it.
Tier 4 — Correctness and reach
4a. BitTorrent piece transfer transport
The HTTP transport is an interim measure that requires a peer already serving the
content over HTTP. Direct BitTorrent download removes this dependency and completes
the “no required old-web dependency” goal. The pluggable transport seam in
content.h already exists for this purpose.
4b. Add IPv6 DHT support
Pass AF_INET6 alongside AF_INET to the libdht node. Create a second UDP socket
for IPv6 and run both in the select loop. Necessary for full DHT reach.
4c. Re-evaluate SHA1 as the content hash
Before the manifest schema is finalised, consider moving bundle and manifest hashes to SHA-256 or BLAKE3. This is a breaking wire change and needs coordination across the spec documents.
8. What the Bridge Does Not Change
The following constraints from decentweb-design.md are architectural. The bridge
can honour or violate them but cannot alter them:
- Read-time privacy is a bridge-side concern. The bridge is the swarm peer; it knows what the user fetches. This is inherent to the proxy model. The bridge must not log reading behaviour unnecessarily, and this limitation should be visible in the UI.
- Publisher identity is non-transferable. The bridge may manage key material on behalf of a user, but the key remains the canonical identity. Key backup and export are required features, not optional.
- Published is public. Once a bundle hash appears in a signed manifest and reaches the DHT, it cannot be retracted. The publish flow in the bridge UI should make this explicit before submission.
- No first-contact guarantee. When a user subscribes to a btpk address for the first time, the bridge cannot verify who controls it. The UI should show key continuity (first-seen date, unbroken signature chain) as the available trust signal, not a false “verified” indicator.
9. Open Questions
Language stack: resolved. The bridge is C99 throughout. Lua (via LuaJIT) is the templating layer for dynamic HTML responses — Lua server pages handle feed reader views, post rendering, and form responses, while C handles the protocol, DHT, content fetching, sanitization, and SQLite. No other language is in scope.
Single binary vs. two processes. The bridge could be
dwdextended to embed an HTTP server, or a separatedwhthat talks todwdover the socket. Two processes allows independent restarts but adds deployment complexity. Single-process is simpler to deploy but couples the HTTP server to the DHT node’s restart cycle.Multi-user vs. single-user bridge. For a self-hosted instance, single-user simplifies key storage and removes isolation concerns between users. The SQLite schema should accommodate multi-user from the start (a
userstable with FKs intokeypairsandsubscriptions) even if the first implementation only creates one user.Markdown vs. HTML authoring. Recommendation: convert Markdown to HTML at publish time so stored bundles are always canonical HTML. Optionally preserve the Markdown source as
source.mdin the bundle for the bridge’s edit flow.Key backup prompt timing. When should the bridge prompt the user to download their key backup — on first login, on first publish, or both? A user who never backs up and loses access to the bridge loses their publishing identity permanently.
10. Version History
| Version | Date | Notes |
|---|---|---|
| 0.1 | 2026-06-04 | Initial draft. Gap analysis against design doc; bridge architecture; prioritised recommendations. |
| 0.2 | 2026-06-04 | Removed SaaS stack references; corrected transport framing (bridge cache is reader-side; no publisher hosting burden); added explicit request flow diagram; reframed design goal alignment accordingly. |
| 0.3 | 2026-06-04 | Language stack resolved: C99 core, LuaJIT server pages for HTML templating. No C# or ASP.NET. |