scaling/stateless

lash is a durable-object / actor-per-session architecture: the durable truth lives behind async store traits, and the in-process runtime holds only a rehydratable working copy. Running lash as a horizontally-scaling microservice is now a concrete deployment shape: Postgres for runtime/process/trigger state, S3-compatible attachment bytes, Restate for durable execution and failover, and session-agnostic ingress for external signals.

No Durable State In Process Memory

LashRuntime.state (RuntimeSessionState) is a rehydratable working copy, not a source of truth. It is loaded via RuntimePersistence::load_session and refreshed via refresh_session_graph_from_store; Residency controls whether a load rehydrates the full graph or just the active path.

The other in-memory fields on LashRuntime are either ephemeral per-turn accumulators (shared_token_ledger, drained into state at commit) or per-worker coordination (managed_sessions, managed_turns, what this instance is currently running). None of it is durable truth.

Durable truth lives behind a single trait, RuntimePersistence (crates/lash-core/src/store/mod.rs), whose surface is exactly a durable-object store:

Any backend that satisfies this trait (and the ProcessRegistry / LashlangArtifactStore traits) is a valid store. The backend-agnostic conformance suite (lash_core::testing::conformance) is what guarantees a new backend behaves identically.

The Model: A Durable Object Per Session

A request for session_id routes to any worker. The worker first claims the session execution lease, then rehydrates the session from the store, runs the turn (pure lash-sansio compute + replayed effects), and commits with the lease fence plus head CAS. Workers are identical and hold no durable state, so the fleet scales by adding replicas.

Horizontal topology
flowchart TB Clients["clients · external signals"] --> LB["load balancer
route by session_id"] LB --> WA["worker A"] LB --> WB["worker B"] LB --> WN["worker N
(identical, stateless)"] WA --> P["shared session store
session execution leases · head CAS · durable work queue"] WB --> P WN --> P WA -.-> H["networked TriggerStore
occurrences · subscriptions · deliveries"] WB -.-> H WN -.-> H R["Restate
durable execution + dispatch"] -.-> WA R -.-> WB R -.-> WN

The store box is shared by the whole fleet, so it must be a networked backend every worker can reach. lash-postgres-store supplies the distributed RuntimePersistence, ProcessRegistry, TriggerStore, and LashlangArtifactStore; lash-s3-store supplies the shared attachment byte store. The local-file SQLite store remains right for embedded and single-host deployments.

Concurrent writers are prevented before effects: the session execution lease lets exactly one runtime owner execute mutating work for a session at a time. commit_runtime_state still performs head CAS in the same transaction as lease-fence verification, so stale writers fail safely if a lease is lost or a backend violates the single-lane contract.

By default every opened runtime gets a fresh owner id and incarnation id. A durable workflow engine that already serializes retries for one invocation can set SessionBuilder::session_execution_owner(LeaseOwnerIdentity::opaque(...)) with a stable owner plus incarnation when it intentionally wants reentry. A new incarnation must not reenter by owner id alone; local-process liveness permits fenced fast reclaim only when the previous holder is definitely dead. Choosing owner identity and lease timings for failover latency is host policy; see running in production.

Per-session request lifecycle
flowchart LR Req["request for
session_id"] --> Lease["claim session
execution lease"] Lease --> Load["load_session
rehydrate working copy"] Load --> Turn["run turn
sans-io compute + replayed effects"] Turn --> Commit{"commit_runtime_state
lease fence + CAS"} Commit -->|wins| Done["committed
release lease"] Commit -->|stale fence / loses CAS| Stop["runtime error
reload before retry"]

You scale across sessions (shard/route by session_id), not within one (see Inherent Limits).

What Already Supports This

The seams a stateless fleet needs are already present and deliberate. The sans-IO / store / effect-replay split was built with exactly this shape in mind.

PrimitiveRoleWhere
lash-sansio pure TurnMachinecompute is deterministic (state, input) → (state, effects)crates/lash-sansio
RuntimePersistencethe single durable store seam; backend-swappablestore/mod.rs, conformance suite
session execution leasesingle durable execution lane for every mutating session pathstore/mod.rs, session_execution_leases
commit_runtime_state fence + CAS on session_headatomic session-head write and stale-writer backstopstore/mod.rs
queue with claim + fencing tokensexactly-once work pickup by any lease-holding session runnerqueued_work_batches
effect replay keys + lash-restatere-run a turn on another worker, replay effectscrates/lash-restate
lash-postgres-store + lash-s3-storenetworked runtime/process/trigger/artifact state plus content-addressed attachment bytescrates/lash-postgres-store, crates/lash-s3-store
rehydratable state + Residencyno durable-only memory; reload anywhereruntime/session_ops.rs

Deployment Choices

The networked storage and durable execution seams exist. What remains is ordinary production packaging and policy: how the fleet is routed, sized, observed, and operated.

ChoiceWhat it needs
Networked storeUse PostgresStorage for sessions, pending turn inputs, queued work, process registry, triggers, and Lashlang artifacts; use S3AttachmentStore for attachment bytes. The backend-agnostic conformance suites run against Postgres.
Turn durability across workersDeploy turns under Restate with RestateRuntimeEffectController; a crash reruns the handler and replays effect outcomes before the final session commit.
Fleet work dispatchRestate can push handlers for turn and process workflows, while shared queue and process rows retain claim/fencing semantics for recoverable work. The distributed worker E2E exercises two workers behind an h2c proxy.
Session-agnostic ingressExternal signals must route to whichever worker holds the session, so ingress must not be bolted to one session. See Triggers below.
Affinity vs pure reloadA routing policy choice: session_id affinity (warm working-copy cache) vs reload-per-request (simpler, more store reads). The session execution lease supports either; Residency tunes reload cost.
Operations envelopeChoose Postgres and object-store sizing, Restate retention/ingress topology, trace collection, idempotency-key policy, and replay observability for the host product.

Inherent Limits: Domain, Not Implementation

Some boundaries are properties of agent conversations, not of lash.

Single-writer-per-session

A session is a serial conversation; the session execution lease enforces one active mutating runner. You scale across many sessions, not within one; ideal for multi-tenant workloads, but a single session's throughput is domain-bounded.

The store is the ceiling

Whatever networked backend is chosen must sustain lease claims, renewals, fenced commits, and queue claims at fleet scale. The store choice is the scaling story.

Long LLM turns

A long turn holds or migrates a worker; a durable effect host is the answer, at the cost of replay on recovery. Restate is the first-party adapter, not the only possible workflow runtime.

Not a blocker: tokio

Statelessness is about where durable truth lives, not the async runtime. lash-core staying on tokio is orthogonal: each worker is a tokio service; the durable truth is already 100% external to the process.

Triggers: Ingress

Triggers are the session-agnostic ingress that completes the stateless model. A trigger occurrence is not attached to a session: it is recorded to its own backend-swappable seam and routed to whichever worker holds the interested session/process. Only the resulting wake is session-coupled.

Ingress: emit → match → idempotent delivery → wake
flowchart LR Sig["external signal
button · mail · cron · webhook"] --> Emit["router.emit
record occurrence"] Emit --> Match["reserve_matching_deliveries
match by source_type + source_key"] Match --> Deliver["idempotent effect delivery
deterministic occurrence / delivery id"] Deliver --> Proc["start trigger process
with stored registrant + env_ref"] Proc --> Wake["wake → queued work
EarliestSafeBoundary"]

Net for the fleet: route external signals to a runtime-level emit → match → idempotent effect-delivery → wake. Trigger delivery reuses the existing claim and replay machinery; wake processing then enters the target session through the session execution lease. The Postgres-backed implementation is exercised by the distributed Restate/Postgres/MinIO E2E, but the contract is the TriggerStore, session-store, and effect-host boundary.

read on ·
previousdurability / workflows nextoperations / production