rust · embeddable agent runtime
lash.
durable runtime for your agents.
Run model calls, tools, RLM typed programs, subagents, and background processes as one resumable turn in Rust.
rust · embeddable agent runtime
durable runtime for your agents.
Run model calls, tools, RLM typed programs, subagents, and background processes as one resumable turn in Rust.
§ 01
the primitives
An agent is a loop around a model, but a useful one needs more: durable state, code execution, subagents, background work. Assemble those from a workflow engine, a sandbox, and an orchestrator and you run several systems that never share a boundary.
lash is that whole stack in one runtime. Providers feed the turn loop, tools expose callable host work, plugins add lifecycle behavior, every effect crosses one typed boundary, and the turn commits atomically.
§ 02
composition
The runtime is assembled from explicit edges, not one extension bucket. Model providers feed LLM calls, ToolProviders expose callable work, plugins contribute prompts, hooks, state, and catalog policy, and stores/effect hosts provide durability. A host composes only what it embeds.
The execution mode is a plugin too. standard and rlm ship as separate protocol crates and install like anything else, so the loop's protocol is swappable.
Provider crates (Anthropic, OpenAI, Gemini, any OpenAI-compatible endpoint), MCP servers, and observability sinks (JSONL traces, Lashlang execution graphs, OpenTelemetry) each attach through their own narrow contract.
§ 03
rlm
In rlm mode the model's action is a whole program, not one tool call. It writes lashlang, a typed DSL the runtime runs deterministically. This is the CodeAct pattern, except every call crosses a host-granted ability, so the language has no IO of its own. A small controlled language, rather than Python, is a deliberate design choice.
// the model emits this; the runtime runs it deterministically.
// every call is a host-granted ability, the VM has no IO of its own.
hits = await web.search({ query: "rust async runtimes" })?
notes = []
for h in hits.results {
page = await files.read({ path: h.path })? // large output, read lazily
notes = notes + [page.summary]
}
finish notes
Loops, branches, and many tool calls settle in a single turn. Large outputs bind lazily, so reading one field never pulls the whole blob into context.
The same language spawns subagents, picks an identity, and runs durable background work:
// a subagent with its own task and a typed result
finding = await agents.spawn({ task: "audit auth.rs", output: Finding })?
// identity is the path the model names: gmail.work, never gmail.personal
process notify(mail: Gmail, finding: Finding) { await mail.send({ body: finding })? finish true }
start notify(mail: gmail.work, finding: finding)
agents.spawn runs a typed subagent on the side. Identity is the resource path the model names, and the host decides which paths exist. start launches a durable background process the turn can await, signal, or cancel. The process captures its execution environment at start and runs as a self-contained runtime entity; arguments are the only state handover, and it outlives the session that started it.
§ 04
durability
Durability is split into two separate contracts: storing the completed turn, and replaying the individual effects inside a turn that crashes.
In-flight replay lives in host wiring. The inline host is local and reopens only the last committed state after a crash. Durable workflow engines own effect replay and timers, while Lash retries the same final commit through the store's turn-commit stamp. Restate is the shipped adapter, not a requirement of the interface.
§ 05
embed it
The shortest path to a working turn: add the crate, pick a provider, open a session, run one turn. Tools, modes, persistence, and tracing layer on after that.
The facade ships on crates.io as lash-runtime and is imported as use lash::…. During the alpha series the versions carry an -alpha.N suffix, so the dep needs the explicit pre-release tag.
[dependencies]
lash-runtime = "=0.1.0-alpha.84"
lash-provider-openai = "=0.1.0-alpha.84"
anyhow = "1"
tokio = { version = "1", features = ["full"] }
One LashCore per app, cloned freely; one LashSession per chat or task; .run() for a single collected turn, .stream_to(&sink) for live events.
use std::sync::Arc;
use lash::{LashCore, ModelSpec, TurnInput, provider::ProviderHandle};
use lash_provider_openai::{OPENROUTER_BASE_URL, OpenAiCompatibleProvider};
#[tokio::main]
async fn main() -> anyhow::Result<()> {
let api_key = std::env::var("OPENROUTER_API_KEY")?;
let provider = ProviderHandle::new(
OpenAiCompatibleProvider::new(api_key, OPENROUTER_BASE_URL).into_components(),
);
let model = ModelSpec::from_token_limits("anthropic/claude-sonnet-4.6", None, 200_000, None)
.map_err(anyhow::Error::msg)?;
// one LashCore per app, cloned freely.
let core = lash::LashCore::standard_builder()
.provider(provider)
.model(model)
.effect_host(Arc::new(lash::durability::InlineEffectHost::default()))
.attachment_store(Arc::new(lash::persistence::InMemoryAttachmentStore::new()))
.build()?;
// one session per chat / task; run one turn; read settled prose.
let session = core.session("hello-1").open().await?;
let result = session
.turn(TurnInput::text("Say hi in one short sentence."))
.run()
.await?;
println!("{}", result.assistant_message().unwrap_or_default());
Ok(())
}
From there the runtime hands a backend more: session.enqueue(...) and session.queued_turn() drive app-owned queues, turn.provider(…) / turn.model(…) override the model per turn, and durable workflow hosts run turns with .turn_id(...).effects(&controller).
§ 06
status
lash is usable today, but still in the alpha series. Embed it when you want the runtime boundary now; pin versions or commits while the API settles.
§ 07
read on
Seven doc tracks, grouped by the job you are trying to do.