Performance is still poor on discovery/sync, but not because of PutMany right now.
Key signals from http://localhost:58001/debug/network:
Main problem: connected_sync is very slow
Discovery latency / connected_sync
p50 7.95s
p90 17.36s
p99 21.18s
count 181
This is the user-visible “click document → wait for peers” phase. It waits for many peer syncs.
End-to-end discovery is also slow:
connected p50 17.43s, p99 34.62s
notfound p50 6.36s, p99 20.64s
Main per-peer issue: dial failures / cancellations
Sync outcomes:
ok 1808
protocol_mismatch 206
dial_failed 2209
rpc_error 186
preempted 1190
putmany_failed 0
That means many peer sync attempts are wasted or canceled. The daemon is spending time trying peers that don’t produce useful results.
Dial tail is bad
dial p99 12.57s
So some peer connection attempts are very slow.
Reconcile RPC tail is also bad
reconcile_rpc p99 9.62s
Interesting detail:
new_conn p99 317ms
reused_conn p99 9.79s
That suggests the long tail is not just opening new connections. It may be remote peer/gateway response delay, stream layer behavior, overloaded peers, or requests waiting behind other work.
Not much actual blob fetching
Bitswap:
no fetches yet
putmany 0
putmany_failed 0
So current slow app performance is not “we are downloading/writing tons of blobs” in this session. It is mostly discovery/reconcile/dial overhead.
Wantlists are mostly tiny
Wantlist size:
<=1: 1766
<=10: 25
<=100: 17
mean=0
So RBSR usually concludes there is little/no missing content. That means we are spending seconds asking peers mostly to discover “nothing new”.
Server-side local load is okay-ish right now
Inbound reconcile server:
load_store p50 8ms, p99 216ms
total handler p99 86ms
limiter in_flight 0, waiting 0, rejected 0
So our daemon is not currently overloaded serving inbound reconcile. The slow side is outbound discovery against peers.
Big suspicious product-level issue
Discovery fanout is probably too high / too wasteful:
targetDiscoveryPeers = 30
If many peers are slow/unhelpful, the user waits for a lot of useless work.
Recap
Current app performance issue: discovery is slow because connected_sync waits on many peer sync attempts, with many dial_failed and preempted outcomes.
Not a current
PutMany/SQLite-write bottleneck: bitswap and putmany are zero in this snapshot.Most waste: asking many peers and getting no useful blobs.
Best fixes: reduce/dedupe discovery fanout, prioritize good peers/gateways, stop earlier, avoid waiting for all peers, and cache/dedupe concurrent discoveries.
Do you like what you are reading? Subscribe to receive updates.
Unsubscribe anytime