gRPC & Concurrency Tuning Guide¤

Architecture Overview¤

Deep dives: Retry & Fault Tolerance · Admission Queue · Concurrency Model

┌─────────────────────────────────────────────────┐
│  Layer 1: gRPC Server (accepts RPCs)            │
│  DIGITALKIN_MAX_CONCURRENT_RPCS                 │
│  DIGITALKIN_THREAD_POOL_WORKERS                 │
│  Server channel options (message size, pings)   │
├─────────────────────────────────────────────────┤
│  Layer 2: Task Manager (executes work)          │
│  DIGITALKIN_MAX_CONCURRENT_TASKS                │
│  DIGITALKIN_MAX_QUEUED_TASKS                    │
│  DIGITALKIN_ADMISSION_TIMEOUT                   │
│  DIGITALKIN_BACKPRESSURE_STRATEGY / _TIMEOUT    │
├─────────────────────────────────────────────────┤
│  Layer 3: Lifecycle (completion & cleanup)       │
│  DIGITALKIN_COMPLETION_TIMEOUT                  │
│  DIGITALKIN_STREAM_DRAIN_TIMEOUT                │
│  DIGITALKIN_SETUP_CACHE_MAX                     │
├─────────────────────────────────────────────────┤
│  Layer 4: Signal I/O (gRPC client calls out)    │
│  DIGITALKIN_GRPC_TIMEOUT                        │
│  DIGITALKIN_SIGNAL_* (batching, polling, retry) │
│  DIGITALKIN_GRPC_QUERY_MAX_RETRIES / _BACKOFF   │
│  DIGITALKIN_TOOL_RESOLVE_TIMEOUT                │
│  DIGITALKIN_CONFIG_SETUP_TIMEOUT                │
│  Client channel options (keepalive, retry, DNS) │
│  DIGITALKIN_GRPC_RETRY_* (channel retry policy) │
│  DIGITALKIN_GRPC_*_MS (keepalive, reconnect)    │
└─────────────────────────────────────────────────┘

Request flow:

Request arrives
  → MAX_CONCURRENT_RPCS gate (gRPC layer)
    → system_gate semaphore (MAX_CONCURRENT_TASKS + MAX_QUEUED_TASKS)
      → task_slot semaphore (MAX_CONCURRENT_TASKS)
        → actual execution

Layer 1: gRPC Async Server¤

Environment Variables¤

Variable	Default	Description
`DIGITALKIN_MAX_CONCURRENT_RPCS`	`cpu × 200`	How many RPCs the server accepts simultaneously. This is the front door. Each `StartModule` is a server-streaming RPC that stays open for the entire task duration (potentially minutes). Set this >= `MAX_CONCURRENT_TASKS + MAX_QUEUED_TASKS` so the server never rejects at the gRPC layer before the task manager can queue it.
`DIGITALKIN_THREAD_POOL_WORKERS`	`min(4, cpu)`	Migration thread pool for the async server. Used for running synchronous callbacks. For pure-async modules, 1-2 is enough. Only increase if you have sync blocking code.

Server Channel Options (hardcoded in `ServerConfig`)¤

Option	Value	Notes
`grpc.max_receive_message_length`	100 MB	Increase only for huge payloads.
`grpc.max_send_message_length`	100 MB	Same.
`grpc.keepalive_permit_without_calls`	`True`	Allows client pings even when idle. Leave on.
`grpc.http2.min_ping_interval_without_data_ms`	10000	Min 10s between client pings. Prevents GOAWAY.

Layer 2: Task Manager (Concurrency Control)¤

Environment Variables¤

Variable	Default	Description
`DIGITALKIN_MAX_CONCURRENT_TASKS`	`100`	How many tasks actually execute simultaneously. Each task consumes event loop time, gRPC client connections, memory. Too high → event loop starvation. Too low → underutilized hardware.
`DIGITALKIN_MAX_QUEUED_TASKS`	`0`	How many tasks wait in line. When > 0, enables the admission queue: excess tasks queue patiently instead of being rejected. Set this to absorb your expected burst size.
`DIGITALKIN_ADMISSION_TIMEOUT`	`5.0`	Fast-fail timeout (seconds) when both running and queued slots are full. Keep short (3-5s) — if the queue is full, waiting longer won't help.
`DIGITALKIN_TASK_WAIT_TIMEOUT`	`30`	Legacy: timeout waiting for a slot when queue is disabled (`MAX_QUEUED_TASKS=0`). With queue enabled, this is ignored.

How the Admission Queue Works¤

Full documentation: architecture/admission-queue.md — problem statement, two-phase flow diagrams, capacity planning, log analysis findings.

When DIGITALKIN_MAX_QUEUED_TASKS > 0, two-phase admission:

Phase 1 — Enter system gate (fast reject in ADMISSION_TIMEOUT seconds if running + queued >= total capacity)
Phase 2 — Wait for execution slot (patient wait, no timeout — freed by completing tasks)

When DIGITALKIN_MAX_QUEUED_TASKS = 0 (default): legacy single-semaphore behavior with TASK_WAIT_TIMEOUT.

Layer 3: Signal I/O (Client-Side gRPC)¤

Outbound Signals (SendSignals Batching)¤

Variable	Default	Description
`DIGITALKIN_GRPC_TIMEOUT`	`30`	Per-RPC timeout (seconds) for SendSignals. Under burst load, the services-provider slows down. Increase to 60s for safety under high concurrency.
`DIGITALKIN_SIGNAL_MAX_BATCH_SIZE`	`50`	Flush trigger. When this many signals accumulate, send immediately. Larger = fewer RPCs but bigger payloads and higher per-signal latency.
`DIGITALKIN_SIGNAL_FLUSH_INTERVAL`	`0.1`	Timer trigger (seconds). If batch doesn't fill in time, flush anyway. Lower = less latency. Higher = more batching efficiency.
`DIGITALKIN_SIGNAL_SEND_RETRIES`	`3`	Retry count for failed batch RPCs. With exponential backoff: 100ms → 200ms → 400ms.
`DIGITALKIN_SIGNAL_SEND_BACKOFF_MS`	`100`	Base backoff (ms) for retries. Doubles each attempt.

Inbound Signals (GetSignals Polling)¤

Variable	Default	Description
`DIGITALKIN_POLL_TIMEOUT`	`1`	Per-RPC timeout (seconds) for GetSignals. Short because polling is frequent.
`DIGITALKIN_SIGNAL_POLL_INTERVAL`	`1.0`	Max interval (seconds) between polls (ceiling). Exponential backoff caps here.
`DIGITALKIN_SIGNAL_INITIAL_POLL_INTERVAL`	`0.1`	Starting interval. Doubles each empty poll until hitting ceiling. Resets when signals arrive.
`DIGITALKIN_SIGNAL_QUEUE_SIZE`	`512`	Per-task signal buffer. If a task is slow to consume, signals queue here.

Servicer & Lifecycle¤

Variable	Default	Description
`DIGITALKIN_SETUP_CACHE_MAX`	`100`	Max cached setup configurations per module servicer. Avoids redundant GetSetup RPCs.
`DIGITALKIN_COMPLETION_TIMEOUT`	`300.0`	Timeout (seconds) waiting for a job to complete after streaming ends. If exceeded, the session is force-cleaned with `TIMEOUT` cancellation reason.
`DIGITALKIN_STREAM_DRAIN_TIMEOUT`	`300.0`	Timeout (seconds) waiting for a task's output stream to fully drain before cleanup. Prevents stale sessions when clients disconnect mid-stream.
`DIGITALKIN_BACKPRESSURE_STRATEGY`	`block`	What to do when all running slots are occupied: `block` (wait up to `BACKPRESSURE_TIMEOUT`) or `reject` (immediate failure).
`DIGITALKIN_BACKPRESSURE_TIMEOUT`	`300.0`	Max wait time (seconds) when `BACKPRESSURE_STRATEGY=block`. After this, the request is rejected.

Client Channel Options (env-configurable in `ClientConfig`)¤

Keepalive¤

Env Var	Option	Default	Why it matters
`DIGITALKIN_GRPC_KEEPALIVE_TIME_MS`	`grpc.keepalive_time_ms`	60000	Ping interval to detect dead connections. Lower for faster detection.
`DIGITALKIN_GRPC_KEEPALIVE_TIMEOUT_MS`	`grpc.keepalive_timeout_ms`	20000	Pong wait time. No response → connection dead → reconnect.
—	`grpc.keepalive_permit_without_calls`	`True`	Keep pinging even when no RPCs in flight. Not configurable.
`DIGITALKIN_GRPC_MIN_PING_INTERVAL_MS`	`grpc.http2.min_time_between_pings_ms`	30000	Min interval between HTTP/2 pings. Must be >= server's 10s minimum.

Reconnection¤

Env Var	Option	Default	Why it matters
`DIGITALKIN_GRPC_DNS_RESOLUTION_MS`	`grpc.dns_min_time_between_resolutions_ms`	500	Critical for Railway/containers. Re-resolve DNS after this interval.
`DIGITALKIN_GRPC_INITIAL_RECONNECT_MS`	`grpc.initial_reconnect_backoff_ms`	1000	First reconnect delay.
`DIGITALKIN_GRPC_MAX_RECONNECT_MS`	`grpc.max_reconnect_backoff_ms`	10000	Cap between reconnect attempts.
`DIGITALKIN_GRPC_MIN_RECONNECT_MS`	`grpc.min_reconnect_backoff_ms`	500	Floor for reconnect backoff.

Retry (Channel-Level)¤

Env Var	Option	Default	Description
—	`grpc.enable_retries`	`1`	Enables gRPC-native retry via service config. Not configurable.
`DIGITALKIN_GRPC_RETRY_MAX_ATTEMPTS`	`RetryPolicy.max_attempts`	5	Channel-level retry attempts.
`DIGITALKIN_GRPC_RETRY_INITIAL_BACKOFF`	`RetryPolicy.initial_backoff`	0.1s	Initial retry backoff.
`DIGITALKIN_GRPC_RETRY_MAX_BACKOFF`	`RetryPolicy.max_backoff`	10s	Max retry backoff.
`DIGITALKIN_GRPC_RETRY_BACKOFF_MULTIPLIER`	`RetryPolicy.backoff_multiplier`	2.0	Backoff multiplier.

App-Level Retry (`exec_grpc_query`)¤

Env Var	Default	Description
`DIGITALKIN_GRPC_QUERY_MAX_RETRIES`	2	App-level retry count for all gRPC client calls.
`DIGITALKIN_GRPC_QUERY_BACKOFF_BASE_MS`	50	Base backoff (ms), doubles per attempt.

Tool Resolution & Config Setup¤

Env Var	Default	Description
`DIGITALKIN_TOOL_RESOLVE_TIMEOUT`	10.0	Per-tool resolution timeout (seconds).
`DIGITALKIN_CONFIG_SETUP_TIMEOUT`	30.0	Config setup response wait (seconds).

Retry Architecture (Three Independent Layers)¤

Full documentation: architecture/resilience.md — problem statement, sequence diagrams, retryable vs non-retryable errors, before/after comparison.

RPC call
  → Layer A: gRPC service config retry (channel level, transparent)
      retryableStatusCodes: UNAVAILABLE, RESOURCE_EXHAUSTED, DEADLINE_EXCEEDED
      maxAttempts: DIGITALKIN_GRPC_RETRY_MAX_ATTEMPTS (default 5)
      backoff: DIGITALKIN_GRPC_RETRY_INITIAL_BACKOFF → DIGITALKIN_GRPC_RETRY_MAX_BACKOFF

  → Layer B: exec_grpc_query() app-level retry
      retryable: UNAVAILABLE, INTERNAL, DEADLINE_EXCEEDED
      max_retries: DIGITALKIN_GRPC_QUERY_MAX_RETRIES (default 2, 3 total)
      backoff: DIGITALKIN_GRPC_QUERY_BACKOFF_BASE_MS (default 50ms, doubles per attempt)

  → Layer C: SendSignals _flush() retry (batch-specific)
      retryable: DEADLINE_EXCEEDED, UNAVAILABLE, INTERNAL
      max_retries: 3 (4 total), backoff: 100ms → 800ms

Layer A retries transparently inside the channel. Layer B catches what A doesn't handle. Layer C is specific to the batched SendSignals path.

Recommended Configurations¤

Golden Rules¤

MAX_CONCURRENT_RPCS >= MAX_CONCURRENT_TASKS + MAX_QUEUED_TASKS — Otherwise gRPC rejects RPCs at the HTTP/2 layer before the task manager even sees them.
MAX_CONCURRENT_TASKS should be 20-50x fewer than MAX_CONCURRENT_RPCS — The server can hold thousands of open RPCs (cheap HTTP/2 streams). But each executing task consumes event loop cycles, gRPC client connections, DNS lookups, and LLM API calls.
200 concurrent tasks with a 3000 queue will outperform 800 concurrent tasks with no queue every time — Less concurrency = less event loop contention = faster individual tasks = higher sustained throughput.

Small Instance (4 vCPU, 8 GB)¤

# Server
DIGITALKIN_MAX_CONCURRENT_RPCS=1000
DIGITALKIN_THREAD_POOL_WORKERS=2

# Task management
DIGITALKIN_MAX_CONCURRENT_TASKS=100
DIGITALKIN_MAX_QUEUED_TASKS=1000
DIGITALKIN_ADMISSION_TIMEOUT=5.0

# Lifecycle
DIGITALKIN_COMPLETION_TIMEOUT=300.0
DIGITALKIN_STREAM_DRAIN_TIMEOUT=300.0

# Signals
DIGITALKIN_GRPC_TIMEOUT=30
DIGITALKIN_SIGNAL_MAX_BATCH_SIZE=50
DIGITALKIN_SIGNAL_FLUSH_INTERVAL=0.1
DIGITALKIN_SIGNAL_QUEUE_SIZE=512
DIGITALKIN_SETUP_CACHE_MAX=200

Medium Instance (8 vCPU, 32 GB)¤

# Server
DIGITALKIN_MAX_CONCURRENT_RPCS=2000
DIGITALKIN_THREAD_POOL_WORKERS=4

# Task management
DIGITALKIN_MAX_CONCURRENT_TASKS=200
DIGITALKIN_MAX_QUEUED_TASKS=3000
DIGITALKIN_ADMISSION_TIMEOUT=5.0

# Lifecycle
DIGITALKIN_COMPLETION_TIMEOUT=600.0
DIGITALKIN_STREAM_DRAIN_TIMEOUT=600.0

# Signals
DIGITALKIN_GRPC_TIMEOUT=60
DIGITALKIN_SIGNAL_MAX_BATCH_SIZE=100
DIGITALKIN_SIGNAL_FLUSH_INTERVAL=0.2
DIGITALKIN_SIGNAL_QUEUE_SIZE=1024
DIGITALKIN_SETUP_CACHE_MAX=500

Large Instance (32 vCPU, 64 GB)¤

# Server
DIGITALKIN_MAX_CONCURRENT_RPCS=5000
DIGITALKIN_THREAD_POOL_WORKERS=4

# Task management
DIGITALKIN_MAX_CONCURRENT_TASKS=400
DIGITALKIN_MAX_QUEUED_TASKS=5000
DIGITALKIN_ADMISSION_TIMEOUT=5.0

# Lifecycle
DIGITALKIN_COMPLETION_TIMEOUT=900.0
DIGITALKIN_STREAM_DRAIN_TIMEOUT=600.0

# Signals
DIGITALKIN_GRPC_TIMEOUT=60
DIGITALKIN_SIGNAL_MAX_BATCH_SIZE=200
DIGITALKIN_SIGNAL_FLUSH_INTERVAL=0.3
DIGITALKIN_SIGNAL_QUEUE_SIZE=2048
DIGITALKIN_SETUP_CACHE_MAX=1000

Railway (Container PaaS)¤

Railway instances have ephemeral IPs and may restart under memory pressure. Optimize for fast recovery and moderate concurrency.

# Server — Railway instances typically have 1-8 vCPU
DIGITALKIN_MAX_CONCURRENT_RPCS=1500
DIGITALKIN_THREAD_POOL_WORKERS=2

# Task management — conservative to avoid OOM kills
DIGITALKIN_MAX_CONCURRENT_TASKS=50
DIGITALKIN_MAX_QUEUED_TASKS=500
DIGITALKIN_ADMISSION_TIMEOUT=5.0
DIGITALKIN_BACKPRESSURE_STRATEGY=block
DIGITALKIN_BACKPRESSURE_TIMEOUT=120.0

# Lifecycle — shorter timeouts to release resources faster on restart
DIGITALKIN_COMPLETION_TIMEOUT=180.0
DIGITALKIN_STREAM_DRAIN_TIMEOUT=120.0

# Signals — tighter batching for lower memory footprint
DIGITALKIN_GRPC_TIMEOUT=30
DIGITALKIN_SIGNAL_MAX_BATCH_SIZE=50
DIGITALKIN_SIGNAL_FLUSH_INTERVAL=0.1
DIGITALKIN_SIGNAL_QUEUE_SIZE=256
DIGITALKIN_SIGNAL_SEND_RETRIES=5
DIGITALKIN_SIGNAL_SEND_BACKOFF_MS=200
DIGITALKIN_SETUP_CACHE_MAX=100

# I/O timing — fail fast on unreachable services
DIGITALKIN_GRPC_QUERY_MAX_RETRIES=1
DIGITALKIN_GRPC_QUERY_BACKOFF_BASE_MS=25
DIGITALKIN_TOOL_RESOLVE_TIMEOUT=3.0
DIGITALKIN_CONFIG_SETUP_TIMEOUT=15.0
DIGITALKIN_GRPC_RETRY_MAX_ATTEMPTS=3
DIGITALKIN_GRPC_RETRY_MAX_BACKOFF=3s
DIGITALKIN_GRPC_KEEPALIVE_TIME_MS=30000
DIGITALKIN_GRPC_MAX_RECONNECT_MS=5000

# Module
DIGITALKIN_CHAT_HISTORY_FLUSH_THRESHOLD=5
DIGITALKIN_LOG_DIR=/app/logs
DIGITALKIN_TIMEZONE=Europe/Paris

Railway-specific notes:

DNS re-resolution is configurable via DIGITALKIN_GRPC_DNS_RESOLUTION_MS (default 500ms) — critical when services restart with new IPs.
Lower SIGNAL_QUEUE_SIZE (256 vs 512) reduces per-task memory. Railway charges by memory usage.
Higher SIGNAL_SEND_RETRIES (5 vs 3) with longer backoff absorbs brief connectivity gaps during Railway deploys.
Shorter lifecycle timeouts prevent orphaned sessions from consuming memory after Railway restarts.
Set DIGITALKIN_MODULE_ID per service if running multiple modules in the same Railway project.

Complete Environment Variable Reference¤

Variable	Type	Default	Layer	Purpose
`DIGITALKIN_MAX_CONCURRENT_RPCS`	int	cpu×200	Server	Async server concurrent RPCs
`DIGITALKIN_THREAD_POOL_WORKERS`	int	min(4, cpu)	Server	Migration thread pool size
`DIGITALKIN_MAX_CONCURRENT_TASKS`	int	100	Task Mgr	Concurrent task execution limit
`DIGITALKIN_MAX_QUEUED_TASKS`	int	0	Task Mgr	Admission queue depth (0 = disabled)
`DIGITALKIN_ADMISSION_TIMEOUT`	float	5.0s	Task Mgr	Fast-fail when queue full
`DIGITALKIN_TASK_WAIT_TIMEOUT`	float	30s	Task Mgr	Legacy slot wait timeout (queue disabled)
`DIGITALKIN_BACKPRESSURE_STRATEGY`	str	block	Task Mgr	`block` or `reject` when slots full
`DIGITALKIN_BACKPRESSURE_TIMEOUT`	float	300.0s	Task Mgr	Max wait when strategy=block
`DIGITALKIN_COMPLETION_TIMEOUT`	float	300.0s	Lifecycle	Wait for job completion after stream ends
`DIGITALKIN_STREAM_DRAIN_TIMEOUT`	float	300.0s	Lifecycle	Wait for output stream to drain before cleanup
`DIGITALKIN_GRPC_TIMEOUT`	float	30s	Signal I/O	SendSignals RPC timeout
`DIGITALKIN_POLL_TIMEOUT`	float	1s	Signal I/O	GetSignals RPC timeout
`DIGITALKIN_SIGNAL_POLL_INTERVAL`	float	1.0s	Signal I/O	Max poll interval (ceiling)
`DIGITALKIN_SIGNAL_INITIAL_POLL_INTERVAL`	float	0.1s	Signal I/O	Initial poll interval
`DIGITALKIN_SIGNAL_QUEUE_SIZE`	int	512	Signal I/O	Per-task signal buffer
`DIGITALKIN_SIGNAL_FLUSH_INTERVAL`	float	0.1s	Signal I/O	Batch flush timer
`DIGITALKIN_SIGNAL_MAX_BATCH_SIZE`	int	50	Signal I/O	Batch flush trigger
`DIGITALKIN_SIGNAL_SEND_RETRIES`	int	3	Signal I/O	Batch send retry attempts
`DIGITALKIN_SIGNAL_SEND_BACKOFF_MS`	float	100ms	Signal I/O	Retry backoff base
`DIGITALKIN_GRPC_QUERY_MAX_RETRIES`	int	2	App retry	App-level retry count for gRPC client calls
`DIGITALKIN_GRPC_QUERY_BACKOFF_BASE_MS`	float	50	App retry	Base backoff (ms), doubles per attempt
`DIGITALKIN_TOOL_RESOLVE_TIMEOUT`	float	10.0s	Tool init	Per-tool resolution timeout
`DIGITALKIN_CONFIG_SETUP_TIMEOUT`	float	30.0s	Job Mgr	Config setup response wait
`DIGITALKIN_GRPC_RETRY_MAX_ATTEMPTS`	int	5	Channel retry	Channel-level retry attempts
`DIGITALKIN_GRPC_RETRY_INITIAL_BACKOFF`	str	0.1s	Channel retry	Channel retry initial backoff
`DIGITALKIN_GRPC_RETRY_MAX_BACKOFF`	str	10s	Channel retry	Channel retry max backoff
`DIGITALKIN_GRPC_RETRY_BACKOFF_MULTIPLIER`	float	2.0	Channel retry	Backoff multiplier
`DIGITALKIN_GRPC_DNS_RESOLUTION_MS`	int	500	Channel opts	DNS re-resolve interval
`DIGITALKIN_GRPC_INITIAL_RECONNECT_MS`	int	1000	Channel opts	First reconnect delay
`DIGITALKIN_GRPC_MAX_RECONNECT_MS`	int	10000	Channel opts	Max reconnect backoff
`DIGITALKIN_GRPC_MIN_RECONNECT_MS`	int	500	Channel opts	Min reconnect backoff
`DIGITALKIN_GRPC_KEEPALIVE_TIME_MS`	int	60000	Channel opts	Keepalive ping interval
`DIGITALKIN_GRPC_KEEPALIVE_TIMEOUT_MS`	int	20000	Channel opts	Keepalive pong timeout
`DIGITALKIN_GRPC_MIN_PING_INTERVAL_MS`	int	30000	Channel opts	Min HTTP/2 ping interval
`DIGITALKIN_SETUP_CACHE_MAX`	int	100	Module	Setup config cache size
`DIGITALKIN_MODULE_ID`	str	metadata	Module	Override module identity at runtime
`DIGITALKIN_CHAT_HISTORY_FLUSH_THRESHOLD`	int	10	Module	Messages buffered before storage write
`DIGITALKIN_TIMEZONE`	str	Europe/Paris	Module	Default timezone (IANA zone name)
`DIGITALKIN_LOG_DIR`	str	/app/logs	Module	Rotating JSON log file directory
`DIGITALKIN_ASYNCIO_INSPECTOR`	bool	false	Debug	Enable asyncio event loop monitoring
`DIGITALKIN_ASYNCIO_INSPECTOR_PORT`	int	8765	Debug	Asyncio inspector port
`DIGITALKIN_PROFILER`	str	none	Debug	Profiler: none, pyinstrument, viztracer, yappi
`DIGITALKIN_PROFILE_OUTPUT_DIR`	str	./profiles	Debug	Profiler output directory

gRPC & Concurrency Tuning Guide¤

Architecture Overview¤

Layer 1: gRPC Async Server¤

Environment Variables¤

Server Channel Options (hardcoded in ServerConfig)¤

Layer 2: Task Manager (Concurrency Control)¤

Environment Variables¤

How the Admission Queue Works¤

Layer 3: Signal I/O (Client-Side gRPC)¤

Outbound Signals (SendSignals Batching)¤

Inbound Signals (GetSignals Polling)¤

Servicer & Lifecycle¤

Client Channel Options (env-configurable in ClientConfig)¤

Keepalive¤

Reconnection¤

Retry (Channel-Level)¤

App-Level Retry (exec_grpc_query)¤

Tool Resolution & Config Setup¤

Retry Architecture (Three Independent Layers)¤

Recommended Configurations¤

Golden Rules¤

Small Instance (4 vCPU, 8 GB)¤

Medium Instance (8 vCPU, 32 GB)¤

Large Instance (32 vCPU, 64 GB)¤

Railway (Container PaaS)¤

Complete Environment Variable Reference¤

Server Channel Options (hardcoded in `ServerConfig`)¤

Client Channel Options (env-configurable in `ClientConfig`)¤

App-Level Retry (`exec_grpc_query`)¤