go: keepalive, exponential backoff, chain_id metadata, durability guarantees

Three related fixes that turn the go template into a client that
survives the full matrix of server restart, client restart, network
blip, half-open TCP, and long outages (hours → months) — without the
user writing a line of reconnect logic in process.go.

1. gRPC keepalive: Time=10s, Timeout=3s, PermitWithoutStream=true.
   Half-open TCP (silent server restart, resumed laptop, NAT drop)
   is detected within ~13s. Previously the OS TCP keepalive took
   ~2h to notice, leaving the client as a ghost stream while prime
   logged "no active gRPC connection" for every skipped transaction.

2. Exponential backoff with jitter on reconnect. Effective delay =
   min(max_backoff_seconds, reconnect_delay_seconds * 2^attempts)
   + random(0, reconnect_delay_seconds). The attempts counter resets
   after any session that runs healthy for 60+ seconds. Jitter
   desynchronises clients so a server restart doesn't trigger a
   thundering herd. New max_backoff_seconds config field, default 120.

3. Unified error signalling: the sender goroutine now tears down the
   stream's context when it hits a Send error. Previously only Recv
   errors triggered a reconnect — a stale stream where only Send was
   broken could sit there indefinitely.

Also: chain_id is a required config field now and goes in the
x-chain-id gRPC metadata header alongside x-api-key and
x-smart-contract-id. Prime rejects streams without it with "missing
chain ID", which was silently breaking every template-based client
until users discovered it the hard way. README documents the
durability contract so contract authors know they don't have to
reimplement any of it.
This commit is contained in:
2026-04-19 21:23:47 -04:00
parent 0634e66469
commit 2bc57c073d
3 changed files with 173 additions and 30 deletions

View File

@@ -4,6 +4,11 @@
# The gRPC server address to connect to
server_address: "localhost:50051"
# The public chain id on which this smart contract is registered.
# Sent as the x-chain-id gRPC metadata header — prime rejects streams
# without it.
chain_id: "your-chain-public-id"
# Your smart contract ID (provided by Dragonchain)
smart_contract_id: "your-smart-contract-id"
@@ -19,6 +24,12 @@ use_tls: false
# Number of worker goroutines for processing transactions concurrently
num_workers: 10
# Reconnect settings
reconnect_delay_seconds: 5
# Reconnect settings. The client uses exponential backoff with jitter:
# effective delay = min(max_backoff_seconds, reconnect_delay_seconds * 2^attempts) + random(0, reconnect_delay_seconds).
# Keep max_reconnect_attempts at 0 (infinite) unless you have a specific
# reason to stop — the client is designed to survive arbitrarily long
# outages and resume processing from the prime-side queue when the
# server returns.
reconnect_delay_seconds: 3
max_backoff_seconds: 120
max_reconnect_attempts: 0 # 0 = infinite retries