lash · an embeddable agent runtime for Rust

§ 01

the primitives

one runtime boundary, the whole stack

An agent is a loop around a model, but a useful one needs more: durable state, code execution, subagents, background work. Assemble those from a workflow engine, a sandbox, and an orchestrator and you run several systems that never share a boundary.

lash is that whole stack in one runtime. Providers feed the turn loop, tools expose callable host work, plugins add lifecycle behavior, every effect crosses one typed boundary, and the turn commits atomically.

host app owns: Auth, transport, product state, storage backends, and workflow/effect hosts. The host decides which providers, stores, resources, and authorities exist.
lash owns: The runtime records inside that boundary: session graph, turn commit stamps, committed checkpoints, queued work, process records, attachment manifest, semantic events, and plugin execution.

§ 02

composition

compose the runtime edges

The runtime is assembled from explicit edges, not one extension bucket. Model providers feed LLM calls, ToolProviders expose callable work, plugins contribute prompts, hooks, state, and catalog policy, and stores/effect hosts provide durability. A host composes only what it embeds.

The execution mode is a plugin too. standard and rlm ship as separate protocol crates and install like anything else, so the loop's protocol is swappable.

Provider crates (Anthropic, OpenAI, Gemini, any OpenAI-compatible endpoint), MCP servers, and observability sinks (JSONL traces, Lashlang execution graphs, OpenTelemetry) each attach through their own narrow contract.

§ 03

rlm

the model writes programs

In rlm mode the model's action is a whole program, not one tool call. It writes lashlang, a typed DSL the runtime runs deterministically. This is the CodeAct pattern, except every call crosses a host-granted ability, so the language has no IO of its own. A small controlled language, rather than Python, is a deliberate design choice.

// the model emits this; the runtime runs it deterministically.
// every call is a host-granted ability, the VM has no IO of its own.
hits = await web.search({ query: "rust async runtimes" })?

notes = []
for h in hits.results {
  page = await files.read({ path: h.path })?   // large output, read lazily
  notes = notes + [page.summary]
}

finish notes

Loops, branches, and many tool calls settle in a single turn. Large outputs bind lazily, so reading one field never pulls the whole blob into context.

The same language spawns subagents, picks an identity, and runs durable background work:

// a subagent with its own task and a typed result
finding = await agents.spawn({ task: "audit auth.rs", output: Finding })?

// identity is the path the model names: gmail.work, never gmail.personal
process notify(mail: Gmail, finding: Finding) { await mail.send({ body: finding })? finish true }
start notify(mail: gmail.work, finding: finding)

agents.spawn runs a typed subagent on the side. Identity is the resource path the model names, and the host decides which paths exist. start launches a durable background process the turn can await, signal, or cancel. The process captures its execution environment at start and runs as a self-contained runtime entity; arguments are the only state handover, and it outlives the session that started it.

§ 04

durability

durable by responsibility

Durability is split into two separate contracts: storing the completed turn, and replaying the individual effects inside a turn that crashes.

completed turns: The turn is the unit a host observes. It settles as one atomic commit: the session graph, head revision, and usage applied together. How the turn was executed stays out of this contract, so the settled state is coherent no matter what ran underneath.
in-flight effects: A durable effect host handles the narrower question: what if the host crashes while one nondeterministic effect is in flight? Each named effect (an LLM call, a tool call, process admin, a retry sleep) crosses a scoped controller so a workflow host such as Temporal, Restate, or another engine can replay it from host history.

In-flight replay lives in host wiring. The inline host is local and reopens only the last committed state after a crash. Durable workflow engines own effect replay and timers, while Lash retries the same final commit through the store's turn-commit stamp. Restate is the shipped adapter, not a requirement of the interface.

§ 05

embed it

The shortest path to a working turn: add the crate, pick a provider, open a session, run one turn. Tools, modes, persistence, and tracing layer on after that.

The facade ships on crates.io as lash-runtime and is imported as use lash::…. During the alpha series the versions carry an -alpha.N suffix, so the dep needs the explicit pre-release tag.

[dependencies]
lash-runtime         = "=0.1.0-alpha.84"
lash-provider-openai = "=0.1.0-alpha.84"
anyhow               = "1"
tokio                = { version = "1", features = ["full"] }

One LashCore per app, cloned freely; one LashSession per chat or task; .run() for a single collected turn, .stream_to(&sink) for live events.

use std::sync::Arc;

use lash::{LashCore, ModelSpec, TurnInput, provider::ProviderHandle};
use lash_provider_openai::{OPENROUTER_BASE_URL, OpenAiCompatibleProvider};

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    let api_key = std::env::var("OPENROUTER_API_KEY")?;
    let provider = ProviderHandle::new(
        OpenAiCompatibleProvider::new(api_key, OPENROUTER_BASE_URL).into_components(),
    );

    let model = ModelSpec::from_token_limits("anthropic/claude-sonnet-4.6", None, 200_000, None)
        .map_err(anyhow::Error::msg)?;

    // one LashCore per app, cloned freely.
    let core = lash::LashCore::standard_builder()
        .provider(provider)
        .model(model)
        .effect_host(Arc::new(lash::durability::InlineEffectHost::default()))
        .attachment_store(Arc::new(lash::persistence::InMemoryAttachmentStore::new()))
        .build()?;

    // one session per chat / task; run one turn; read settled prose.
    let session = core.session("hello-1").open().await?;
    let result = session
        .turn(TurnInput::text("Say hi in one short sentence."))
        .run()
        .await?;

    println!("{}", result.assistant_message().unwrap_or_default());
    Ok(())
}

From there the runtime hands a backend more: session.enqueue(...) and session.queued_turn() drive app-owned queues, turn.provider(…) / turn.model(…) override the model per turn, and durable workflow hosts run turns with .turn_id(...).effects(&controller).

read the quickstart

§ 06

status

lash is usable today, but still in the alpha series. Embed it when you want the runtime boundary now; pin versions or commits while the API settles.

works today: Standard turns, RLM turns, plugin composition, trait-backed persistence with first-party SQLite/Postgres adapters, scoped effect-controller turns, background processes, provider crates, MCP wiring, JSONL traces, Lashlang execution graphs, OpenTelemetry export, and the example apps.
still moving: Facade details, docs, and some advanced host seams are still changing quickly. The runtime contracts are being hardened in public rather than hidden behind compatibility shims.

§ 07

read on

Seven doc tracks, grouped by the job you are trying to do.