Hosted ongabo.esvia theHypermedia Protocol

/debug/network 2026-05-26 23:24

Performance is still poor on discovery/sync, but not because of PutMany right now.

Key signals from http://localhost:58001/debug/network:

Main problem: connected_sync is very slow

Discovery latency / connected_sync
p50  7.95s
p90  17.36s
p99  21.18s
count 181

This is the user-visible “click document → wait for peers” phase. It waits for many peer syncs.

End-to-end discovery is also slow:

connected p50 17.43s, p99 34.62s
notfound  p50 6.36s,  p99 20.64s

Main per-peer issue: dial failures / cancellations

Sync outcomes:

ok                 1808
protocol_mismatch  206
dial_failed        2209
rpc_error           186
preempted          1190
putmany_failed       0

That means many peer sync attempts are wasted or canceled. The daemon is spending time trying peers that don’t produce useful results.

Dial tail is bad

dial p99 12.57s

So some peer connection attempts are very slow.

Reconcile RPC tail is also bad

reconcile_rpc p99 9.62s

Interesting detail:

new_conn p99     317ms
reused_conn p99  9.79s

That suggests the long tail is not just opening new connections. It may be remote peer/gateway response delay, stream layer behavior, overloaded peers, or requests waiting behind other work.

Not much actual blob fetching

Bitswap:

no fetches yet
putmany 0
putmany_failed 0

So current slow app performance is not “we are downloading/writing tons of blobs” in this session. It is mostly discovery/reconcile/dial overhead.

Wantlists are mostly tiny

Wantlist size:
<=1: 1766
<=10: 25
<=100: 17
mean=0

So RBSR usually concludes there is little/no missing content. That means we are spending seconds asking peers mostly to discover “nothing new”.

Server-side local load is okay-ish right now

Inbound reconcile server:

load_store p50 8ms, p99 216ms
total handler p99 86ms
limiter in_flight 0, waiting 0, rejected 0

So our daemon is not currently overloaded serving inbound reconcile. The slow side is outbound discovery against peers.

Big suspicious product-level issue

Discovery fanout is probably too high / too wasteful:

targetDiscoveryPeers = 30

If many peers are slow/unhelpful, the user waits for a lot of useless work.

Recap

Current app performance issue: discovery is slow because connected_sync waits on many peer sync attempts, with many dial_failed and preempted outcomes.

  • Not a current PutMany/SQLite-write bottleneck: bitswap and putmany are zero in this snapshot.

  • Most waste: asking many peers and getting no useful blobs.

  • Best fixes: reduce/dedupe discovery fanout, prioritize good peers/gateways, stop earlier, avoid waiting for all peers, and cache/dedupe concurrent discoveries.

Do you like what you are reading? Subscribe to receive updates.

Unsubscribe anytime