lash guide

lash/api

Most agent stacks treat the LLM as the runtime and stitch state around it — a database for memory, a queue for retries, a sandbox for code. lash inverts that. The runtime is the durable end of the pair; the LLM is the variable call. Your app owns the outer boundaries — storage, auth, transport, product state. lash owns the turn — model calls, modes, tools, plugins, semantic stream events, usage, and terminal outcomes.

Runtime Alongside The Model

The unit of work is a turn. A turn is a RuntimeCommit against a durable SessionGraph — either the whole turn lands or none of it does. Model output is one event class flowing through that commit, not the thing holding state between events. Two execution modes — standard for native tool calls and rlm for Lashlang programs in a no-syscall VM — share the same containment, trace shape, and commit unit.

Durable session graph: Conversation records, tool events, mode events, and plugin nodes live in one SessionGraph per session. SessionReadView and ChronologicalProjection read it; nothing else owns persistence.
Per-turn atomic commit: RuntimeCommit writes graph delta, checkpoint blobs, usage deltas, and head revision in one SQLite transaction with optimistic CAS on expected_head_revision. Partial turn = no commit.
Typed plugin capabilities: Tools see a named ToolContext surface — tasks(), sessions(), direct_completion, tool_catalog, snapshot reads — not a general host handle. The surface exposes no ToolContext::host() escape hatch.
Sandboxed code load, ready: RLM mode runs model-emitted Lashlang programs in a VM with no filesystem, process, or network surface. Every effect crosses ToolHost. Use it when the model should compose multiple tool calls per turn instead of one.
Subagent capability boundaries: Capability::resolve(parent_policy) returns a constrained SessionSpec for the child. Interactive-only tools are stripped from every subagent surface regardless of capability.
Semantic event stream: Identity-bearing TurnActivity items: assistant prose deltas, reasoning deltas, tool start/complete pairs with correlation ids, code-block start/complete, terminal SubmittedValue/ToolValue, per-call and rolling usage. Assistant prose deltas are live previews; the settled transcript comes from TurnResult / SessionReadView after the runtime commit.
Tracing as a first-class sink: Every provider call across every session emits TraceRecords through TraceSink implementations. JSONL by default, OTel optional.
Snapshot and restore seams: Plugins, tool state, and the Lashlang VM each persist through versioned snapshot writers so a parked session resumes intact across process restarts.

What sits outside this crate, on purpose: tenancy boundaries, retention and lifecycle for long-lived artifacts, discovery of agent-authored procedures, a shared coherent image across many sessions. lash is the kernel and runtime; the platform-shaped pieces belong to whatever embeds it.

Core Shape

Build one shared LashCore for app-wide configuration, then open one LashSession per app conversation or task.

lash vs lash-core boundary

use std::sync::Arc;

use lash::{tools::*, LashCore, ModeId, ModePreset, PluginStack, TurnEvent, TurnInput};

let store_factory = Arc::new(lash_sqlite_store::SqliteSessionStoreFactory::new("lash-sessions"));

let core = LashCore::builder()
    .install_mode(ModePreset::standard())
    .install_mode(ModePreset::rlm())
    .default_mode(ModeId::rlm())
    .provider(provider)
    .model("anthropic/claude-sonnet-4.6", None)
    .max_context_tokens(200_000)
    .plugins(PluginStack::runtime())
    .tools(Arc::new(AppTools) as Arc<dyn ToolProvider>)
    .store_factory(store_factory)
    .build()?;

let session = core.session("chat-123").rlm().open().await?;
let result = session.turn(TurnInput::text("Use the app tools.")).run().await?;
let assistant_text: String = result.activities.iter().filter_map(|activity| match &activity.event {
    TurnEvent::AssistantProseDelta { text } => Some(text.as_str()),
    _ => None,
}).collect();
println!("{assistant_text}");

LashCore: Cloneable shared configuration: provider, model, installed modes, tool providers, plugin factories, store factory, attachment store, and tracing. Runtime internals such as residency and termination policy live behind .advanced().
ModePreset: Installs execution modes. Use ModePreset::standard() for native provider tool calls, ModePreset::rlm() for Lashlang-driven RLM turns, or install both and choose the default with default_mode.
SessionSpec: Reusable public configuration overlay for provider, model/variant, execution mode, max context tokens, max turns, and prompt layer. Root cores use SessionSpec::new(); child sessions and subagents usually use SessionSpec::inherit().
PluginStack: Ordered plugin factory list. LashCore::standard() and LashCore::rlm() include PluginStack::runtime(); a raw LashCore::builder() stays explicit and needs modes and plugins installed by the host.
LashSession: One app conversation or task. Sessions wrap a parked/resumable runtime, can use a per-session store, and expose turn(TurnInput), run(TurnInput), read_view(), and lower-level control groups through control().
TurnBuilder: Per-turn configuration: cancellation, mode options, typed plugin input, and RLM-projected bindings. Call .stream(&sink) for live events or .run() for a collected ordered activity log.

Session Specs And Plugins

The root builder methods are ergonomic wrappers over one SessionSpec. Use .session_spec(...) when a host already has a complete spec object, and use the same type for parent-relative child session configuration.

Root defaults

use lash::{plugins::PluginFactory, LashCore, PluginStack, SessionSpec};

let root_spec = SessionSpec::new()
    .provider(provider)
    .model("gpt-5.4", None)
    .max_context_tokens(200_000);

let core = LashCore::rlm()
    .session_spec(root_spec)
    .configure_plugins(|plugins| {
        plugins.push(Arc::new(AppPluginFactory) as Arc<dyn PluginFactory>);
    })
    .build()?;

Explicit stacks

let plugins = PluginStack::runtime().configure(|plugins| {
    plugins.replace(Arc::new(CustomBudgetPlugin) as Arc<dyn PluginFactory>);
    plugins.push(Arc::new(AppPluginFactory) as Arc<dyn PluginFactory>);
});

let core = LashCore::builder()
    .install_mode(ModePreset::rlm())
    .default_mode(ModeId::rlm())
    .session_spec(root_spec)
    .plugins(plugins)
    .build()?;

.plugin(...) appends one factory to the current stack, .plugins(...) replaces the full stack, and .configure_plugins(...) mutates the current stack in place. That gives hosts a default runtime set while still allowing precise removal, replacement, or insertion.

Run A Turn

TurnBuilder::run() is the smallest path. It returns a TurnOutput with the terminal TurnResult and the ordered Vec<TurnActivity> emitted during the turn.

Collected result

let collected = session
    .turn(TurnInput::text("Summarize this task."))
    .run()
    .await?;

let live_preview: String = collected.activities.iter().filter_map(|activity| match &activity.event {
    TurnEvent::AssistantProseDelta { text } => Some(text.as_str()),
    _ => None,
}).collect();
let settled_answer = collected.assistant_message().unwrap_or_default();
let parent_usage = collected.result.usage;          // parent's own LLM tokens
let children = &collected.result.children_usage;    // per-(source, model) child entries
let total = collected.result.total_usage();         // parent + children
let outcome = collected.result.outcome;

Live stream

let ui_sink = Arc::new(AppEvents::new(tx));

let turn = session
    .turn(TurnInput::text(user_text))
    .stream(ui_sink.as_ref())
    .await?;

persist(turn.assistant_message().unwrap_or_default(), turn.total_usage())?;

Apps own their live projection policy. Fold TurnActivity directly for partial UI state, terminal-value previews, tool summaries, or timelines. At turn completion, replace the current-turn display from the returned TurnResult.state.read_view() or the latest SessionReadView; do not append a separate final assistant event after streamed prose. Treat TurnActivity.correlation_id as the stable identity for multi-phase UI rows: start events insert rows, completion events update those same rows.

Turn Outcomes

Every successful TurnBuilder::run() / stream() call resolves with a TurnResult whose outcome field is one of three categories: the turn finished cleanly, the turn handed off to a successor session, or the turn was stopped mid-flight for a specific reason. Host code should branch on outcome, not on individual variants, unless it actually cares about a stop reason.

TurnOutcome shape

pub enum TurnOutcome {
    Finished(TurnFinish),
    Handoff { session_id: String },
    Stopped(TurnStop),
}

pub enum TurnFinish {
    AssistantMessage { text: String },
    SubmittedValue { value: serde_json::Value },
    ToolValue { tool_name: String, value: serde_json::Value },
}

Finished means the turn completed cleanly with a terminal value. AssistantMessage is the default — a free-form prose answer — and is the only terminal semantic result for prose in standard and RLM modes. Live prose may already have streamed as AssistantProseDelta, but there is no second "final message" stream event; the shared runtime commit materializes the assistant transcript message exactly once. SubmittedValue and ToolValue come from RLM control flow when the program ends with submit or a tool-authored terminal value; the same JSON value already arrived as a TurnEvent::SubmittedValue / TurnEvent::ToolValue in the stream.

Handoff means a mode plugin or tool returned control to a fresh session — usually a context-compaction successor or a continue-as redirect. The new session is persisted before this returns; resume against session_id to continue.

Stopped means the turn was aborted before producing a terminal value. TurnStop has ten variants:

Variant	Class	Cause
`Cancelled`	Host-driven	The cancellation token passed to `TurnBuilder::cancellation(...)` fired.
`InvalidInput`	Host-driven	Input normalization rejected the `TurnInput` (e.g. missing attachment, malformed image ref).
`Incomplete`	Provider	The LLM hit its output limit mid-message without finishing.
`ProviderError`	Provider	The provider returned an error, a content-filter refusal, or a context-overflow terminal reason.
`MaxTurns`	Runtime	The mode loop hit `session_spec.max_turns` and the final forced-reply turn finished without a terminal.
`ToolFailure`	Runtime	A tool returned `ToolResult::err(...)` and turn assembly flagged the turn as failed.
`PluginAbort`	Runtime	A plugin's turn-preparation or checkpoint hook returned an abort directive.
`RuntimeError`	Fatal	Internal runtime failure or a strict-termination policy with no `Done` event. Treat as a bug or environmental fault.
`SubmittedError { value }`	RLM	An RLM program ended with `submit_error(...)`; `value` is the model-authored payload.
`ToolError { tool_name, value }`	RLM	A mode plugin signalled a tool-authored error terminal; `value` is the payload.

Host-driven stops are recoverable on the next request. Provider stops usually warrant a retry with a different model or a shorter context. Runtime stops mean a configuration / authoring problem the user needs to know about. RuntimeError is the only category where re-running with identical input is unlikely to help.

Optimistic-concurrency commit conflicts on the session graph are not represented as a Stopped variant — they surface as Err from the run()/stream() call itself, with no partial commit. See Persistence → The Contract for the retry shape.

match result.outcome {
    lash::TurnOutcome::Finished(finish) => persist_terminal(finish)?,
    lash::TurnOutcome::Handoff { session_id } => reopen(session_id).await?,
    lash::TurnOutcome::Stopped(stop) => match stop {
        lash::TurnStop::Cancelled
        | lash::TurnStop::InvalidInput => report_user_visible(stop),
        lash::TurnStop::ProviderError
        | lash::TurnStop::Incomplete => offer_retry(stop),
        lash::TurnStop::MaxTurns => suggest_higher_max_turns(),
        other => record_for_diagnosis(other),
    },
}

Input Boundary

TurnInput is semantic core input: text plus image references backed by the attachment store. Filesystem syntax is a host concern.

Text

Pass ordinary user text with TurnInput::text(...) or explicit InputItem::Text values. If your host supports @path, resolve and validate it in the host, then include whatever marker text you want the model to see.

Images

Use InputItem::ImageRef with a matching entry in TurnInput::image_blobs. Lash stores the bytes as typed attachments and resolves them when building provider requests.

Semantic Events

The app stream is semantic and identity-bearing. Assistant prose arrives as deltas, reasoning arrives as deltas, multi-phase tool/code activity carries a correlation id, and terminal values authored by submit or tool controls arrive as SubmittedValue or ToolValue. Normal prose completion does not emit a terminal stream item.

Three event types — only one is for your app.

Apps consume TurnActivity: an identity-bearing wrapper that carries a fresh id, a stable correlation_id, and a TurnEvent payload. The payload-shape enum is TurnEvent — what kind of event happened. SessionEvent sits one layer deeper as a runtime/debug enum used inside the session graph; hosts should not subscribe to it, and there is no app-facing SessionEvent::Message { kind: "final" } contract. If you're writing a TurnActivitySink, you only ever match on TurnEvent variants — that's the app surface.

TurnActivity sink

use async_trait::async_trait;
use lash::{TurnActivity, TurnActivitySink, TurnEvent};

struct AppEvents {
    tx: AppUiTx,
    turn_state: std::sync::Mutex<TurnUiState>,
}

#[derive(Default)]
struct TurnUiState {
    reasoning: Option<UiRowId>,
    tools: std::collections::HashMap<String, UiRowId>,
    code: Option<UiRowId>,
}

#[async_trait]
impl TurnActivitySink for AppEvents {
    async fn emit(&self, activity: TurnActivity) {
        let correlation_id = activity.correlation_id.0.clone();
        match activity.event {
            TurnEvent::AssistantProseDelta { text } => {
                append_live_text(text).await;
            }
            TurnEvent::ReasoningDelta { text } => {
                let row = self.turn_state.lock().unwrap().reasoning.clone();
                let row = upsert_reasoning_row(row, text).await;
                self.turn_state.lock().unwrap().reasoning = Some(row);
            }
            TurnEvent::ToolCallStarted { name, args, .. } => {
                let row = insert_tool_row(name, args).await;
                self.turn_state
                    .lock()
                    .unwrap()
                    .tools
                    .insert(correlation_id, row);
            }
            TurnEvent::ToolCallCompleted { name, output, .. } => {
                let row = self.turn_state.lock().unwrap().tools.remove(&correlation_id);
                update_or_insert_tool_row(row, name, output).await;
            }
            TurnEvent::CodeBlockStarted { language, code } => {
                let row = insert_code_row(language, code).await;
                self.turn_state.lock().unwrap().code = Some(row);
            }
            TurnEvent::CodeBlockCompleted { language, output, error, success, .. } => {
                let row = self.turn_state.lock().unwrap().code.take();
                update_or_insert_code_row(row, language, output, error, success).await;
            }
            TurnEvent::SubmittedValue { value } => {
                append_live_text(render_terminal_value(&value)).await;
            }
            TurnEvent::ToolValue { tool_name, value } => {
                append_live_text(render_terminal_value(&value)).await;
                record_terminal_tool(tool_name).await;
            }
            TurnEvent::Usage { usage, cumulative, .. } => {
                update_usage(usage, cumulative).await;
            }
            TurnEvent::ChildUsage { source, usage, cumulative, .. } => {
                update_child_usage(source, usage, cumulative).await;
            }
            TurnEvent::Error { .. }
            | _ => {}
        }
    }
}

fn render_terminal_value(value: &serde_json::Value) -> String {
    match value {
        serde_json::Value::Null => String::new(),
        serde_json::Value::String(text) => text.clone(),
        other => serde_json::to_string_pretty(other).unwrap_or_else(|_| other.to_string()),
    }
}

Use event identity, not duplicate detection.

ToolCallStarted and ToolCallCompleted describe the same logical row when their correlation_id matches; code-block events work the same way. TurnEvent::SubmittedValue and TurnEvent::ToolValue mean “a new terminal value was authored by a control path.” They are not emitted for a normal assistant prose finish because that prose already streamed as AssistantProseDelta and the settled message is reconstructed from the read view.

Sink Semantics

TurnActivitySink::emit() is an async method and the runtime awaits it. A slow sink slows the turn — every .await inside emit() is on the hot path for the next event. There's no buffering layer between the runtime and the sink, and no spawn-and-drop. Sink authors own their own concurrency.

The conventional pattern is a thin sink that pushes onto an async mpsc channel and returns immediately, with a separate consumer task draining the channel into UI state. tokio::sync::mpsc::Sender::send only awaits when the channel is full, so the channel bound becomes your backpressure dial:

use tokio::sync::mpsc;

struct ChannelSink { tx: mpsc::Sender<TurnActivity> }

#[async_trait::async_trait]
impl TurnActivitySink for ChannelSink {
    async fn emit(&self, activity: TurnActivity) {
        // send().await yields when the channel is full — the turn
        // will pause here if your UI consumer falls behind.
        let _ = self.tx.send(activity).await;
    }
}

If a sink panics or returns an Err-shaped activity, the runtime keeps going — emit failures are observed but do not abort the turn. Persistence and the turn's own activity log are independent of the sink: a dropped sink does not lose the activity, because TurnResult.activities still holds the canonical ordered log when the turn completes.

Token Usage

Four channels surface token usage, finest granularity to coarsest. Pick the one that matches what you want to display or persist.

TraceSink: Every provider call across every session in the runtime. Right for billing, audit, off-line analysis. Heavier than necessary for plain totals.
TurnEvent::Usage / TurnEvent::ChildUsage: Live during a turn, one event per LLM iteration. Usage is the parent's own model call; ChildUsage carries session_id and source so a UI can group child traffic (subagent, compaction, observer). Right for live counters.
TurnResult.usage / TurnResult.children_usage: Per-turn snapshot at completion. usage is parent-only; children_usage is a per-(source, model) breakdown for any child sessions that ran during the turn. TurnResult::total_usage() sums both.
session.usage_report() → SessionUsageReport: Aggregate across the whole session, broken down by source × model. Right for dashboards and "session so far."

The full re-export and well-known source label constants live in lash::usage.

Context Window And History

The max_context_tokens setting on LashCoreBuilder / SessionSpec caps every provider request for the session. When a request would exceed the cap, or the provider itself reports a context-overflow terminal, the turn finishes as TurnStop::ProviderError; the runtime does not retry with a truncated prompt. Hosts should treat context overflow as a signal to compact, switch to a longer-context model, or hand off to a successor session.

Two strategies manage what's actually passed to the model across many turns. Pick one per session (or per core) via the standard mode's context-approach setting:

`RollingHistory` (default)

Every turn includes the full ordered conversation history, trimmed only as needed to satisfy max_context_tokens. Best for short-to-medium sessions where the model genuinely benefits from seeing prior turns verbatim.

`ObservationalMemory`

Older turns are replaced with rolling observations — short summaries authored by the runtime as the session grows. Configurable: observation_message_tokens (default 30 000), observation_buffer_tokens (6 000), and reflection thresholds. Best for long-running agents that should remember themes without quoting old messages.

use lash_core::{ObservationalMemoryConfig, StandardContextApproach};

let session = core
    .session("long-running-thread")
    .standard_context_approach(StandardContextApproach::ObservationalMemory(
        ObservationalMemoryConfig::default(),
    ))
    .open()
    .await?;

RLM mode has its own context-approach knob with the same shape; both modes share the same enum so a core that installs both modes can default to one and override per session.

RLM And Submit

RLM turns can require submit or allow direct prose. With SubmitRequired, a submit statement validates against the configured schema when present, finishes the turn as TurnFinish::SubmittedValue, and emits a SubmittedValue semantic event. With ProseOrSubmit, a no-code prose answer finishes as TurnFinish::AssistantMessage, emits no terminal-value event, and relies on the shared runtime commit to persist the final assistant transcript message.

Per-turn RLM options

let submitted = session
    .turn(TurnInput::text("Move on the board."))
    .require_submit()?
    .stream(&sink)
    .await?;

let prose_or_submit = session
    .turn(TurnInput::text("Answer directly if no code is needed."))
    .allow_prose_or_submit()?
    .run()
    .await?;

Outcome shape

match result.outcome {
    lash::TurnOutcome::Finished(
        lash::TurnFinish::SubmittedValue { value }
    ) => {
        // Same value already arrived as TurnEvent::SubmittedValue.
        persist_typed_value(value)?;
    }
    lash::TurnOutcome::Finished(
        lash::TurnFinish::AssistantMessage { text }
    ) => persist_text(text)?,
    other => handle_other_outcome(other)?,
}

Projected Bindings

RLM programs see host data in three ways. Tool args carry data the model decides to fetch; the prompt carries text the model reads as instructions; projected bindings sit between the two — named, typed, read-only values that appear directly in lashlang scope so a program can write history[-1] or board.cells[2] instead of calling a tool first. Project anything the model should treat as part of the environment: the user's current request, a board state, a tenant id, a task spec, a parent-session digest.

The mechanism is RLM-only. Two scopes:

Per turn

Attach bindings to the input of one turn via TurnInput::rlm_project(...). Right for values that change every turn — the user's current message, fresh board state, a request payload.

Per session

Apply a mode extension at session open so the same bindings appear in every turn of the session. Right for values that persist across turns — tenant id, an immutable task spec, a long-lived parent reference.

use lash::TurnInput;
use lash_mode_rlm::{RlmProjectedBindings, RlmTurnInputExt, rlm_session_projection_extension};

// Session-wide: applies to every turn the session runs.
session
    .control()
    .mode()
    .apply_session_extension(rlm_session_projection_extension(
        RlmProjectedBindings::new()
            .bind_json("tenant_id", serde_json::json!("acme"))?
            .bind_json("task", serde_json::to_value(&task)?)?,
    ))
    .await?;

// Per-turn: layered on top of the session bindings for this turn only.
let input = TurnInput::text("Play one move.")
    .rlm_project(
        RlmProjectedBindings::new()
            .bind_json("board", serde_json::to_value(&board)?)?,
    )?;

let result = session.turn(input).run().await?;

The model sees these names directly in fenced lashlang code:

```lashlang
let move_idx = best_move(board.cells, board.turn)
submit({ "move": move_idx, "tenant": tenant_id })
```

Three ways to bind a value, in order of growing power:

.bind_json(name, value) — the common case. Pass any serde_json::Value; lashlang re-types it on the way in.
.bind_value(name, FlowValue) — pass a native lashlang::Value directly, skipping the JSON round-trip. Use this when you already hold a lashlang value (e.g. from history).
.bind_lazy(name, Arc<dyn ProjectedHostValue>) — for large host structures you don't want to clone. Implement get_field / get_index / len / render / materialize and the executor calls them on demand as the program reads fields.

Reserved names and the read-only guard

Projected names are reserved. Attempting tenant_id = "other" inside lashlang raises `tenant_id` is a read-only projected binding. The runtime also reserves history — every RLM turn already projects the conversation history through that name. A host that wants to override history should narrow it through its own projected handle rather than bind a new value over it.

Duplicate names between session-scope and turn-scope are rejected at session.turn(...).run() time with a protocol error — no silent override. Pick one scope per name; merge at the source if both layers want to contribute.

Seeds for handoffs and subagents

The RLM control tools continue_as and spawn_agent accept a seed: map whose entries are re-projected in the successor or child session. Pass input.prompt, a projected handle, or any other root-projected value as a seed entry and it stays projected on the other side. See Subagents and the Projected Host Bindings chapter for the wire format and the seed-classifier rules.

The deep reference — ProjectedBindings, ProjectedValue, ProjectedHostValue, the read-only assignment rule, and the RLM serialization marker — lives in Architecture → Lashlang → Projected Host Bindings.

Prompt Templates And Slots

lash exposes the same prompt model as the runtime: one effective template chooses the layout, while slot contributions supply app, session, and turn-specific content. Core, session, and turn layers inherit by default; a lower layer can replace or clear one slot without rebuilding the whole template.

Template layout is separate from slot content

use lash::{
    PromptBuiltin, PromptContribution, PromptSlot, PromptTemplate,
    PromptTemplateEntry, PromptTemplateSection,
};

let template = PromptTemplate::new(vec![
    PromptTemplateSection::untitled(vec![
        PromptTemplateEntry::builtin(PromptBuiltin::MainAgentIntro),
        PromptTemplateEntry::slot(PromptSlot::Intro),
    ]),
    PromptTemplateSection::titled(
        "Guidance",
        vec![PromptTemplateEntry::slot(PromptSlot::Guidance)],
    ),
]);

let core = lash::LashCore::standard()
    .provider(provider)
    .model("gpt-5.4", None)
    .max_context_tokens(200_000)
    .prompt_template(template)
    .prompt_contribution(PromptContribution::guidance(
        "App",
        "Answer as the host application assistant.",
    ))
    .build()?;

let session = core
    .session("customer-42")
    .replace_prompt_slot(
        PromptSlot::Guidance,
        [PromptContribution::guidance(
            "Tenant",
            "Use the tenant's support policy.",
        )],
    )
    .open()
    .await?;

let result = session
    .turn(TurnInput::text("Draft the response."))
    .prompt_contribution(PromptContribution::guidance(
        "Turn",
        "Keep this reply under 120 words.",
    ))
    .run()
    .await?;

Typed Plugin Input

Plugin bindings let an app provide strongly typed session configuration and per-turn input. The binding activates an ordinary core plugin factory for that session; the resulting session plugin registers prompt contributions, tools, and other capabilities through PluginRegistrar.

Plugin and tool pattern

#[derive(Clone, Debug)]
struct BoardPlugin;

#[derive(Clone, Debug)]
struct BoardConfig;

#[derive(Clone, Debug)]
struct BoardTurnInput {
    board: BoardState,
}

impl lash::PluginBinding for BoardPlugin {
    const ID: &'static str = "board";
    type SessionConfig = BoardConfig;
    type Input = BoardTurnInput;

    fn factory(_: &Self::SessionConfig) -> Arc<dyn lash::plugins::PluginFactory> {
        Arc::new(BoardPluginFactory)
    }

    fn requires_turn_input(_: &Self::SessionConfig) -> bool {
        true
    }
}

impl lash::plugins::SessionPlugin for BoardSessionPlugin {
    fn id(&self) -> &'static str { BoardPlugin::ID }

    fn register(&self, reg: &mut lash::plugins::PluginRegistrar) -> Result<(), lash::plugins::PluginError> {
        reg.prompt().contribute(Arc::new(|ctx| {
            Box::pin(async move {
                let Some(input) = ctx.turn_context.plugin_input::<BoardTurnInput>(BoardPlugin::ID) else {
                    return Ok(Vec::new());
                };
                Ok(vec![lash::PromptContribution::environment(
                    "Board",
                    format_board(&input.board),
                )])
            })
        }));
        reg.tools().provider(Arc::new(BoardTools))
    }
}

#[async_trait::async_trait]
impl lash::tools::ToolProvider for BoardTools {
    fn definitions(&self) -> Vec<lash::tools::ToolDefinition> {
        vec![/* read_board, play_move, app tools */]
    }

    async fn execute(&self, call: lash::tools::ToolCall<'_>) -> lash::tools::ToolResult {
        let Some(input) = call.context.plugin_input::<BoardTurnInput>(BoardPlugin::ID) else {
            return lash::tools::ToolResult::err_fmt("missing board input");
        };
        run_board_tool(call.name, call.args, &input.board)
    }
}

Human input follows the same rule: expose it as a host-owned tool that waits inside its tool implementation and returns a normal tool result. App streams then see ordinary tool start/completion events; there is no separate runtime prompt event to handle.

Plugin crates should export a domain extension trait that wraps the generic input primitive:

trait BoardTurnExt {
    fn with_board(self, board: BoardState) -> Self;
}

impl BoardTurnExt for lash::TurnBuilder {
    fn with_board(self, board: BoardState) -> Self {
        self.with_plugin_input::<BoardPlugin>(BoardTurnInput { board })
    }
}

Install typed plugins on the session and use the extension method before each run:

let core = LashCore::rlm()
    .provider(provider)
    .model(model, None)
    .max_context_tokens(200_000)
    .build()?;

let session = core
    .session(chat_id)
    .rlm()
    .plugin::<BoardPlugin>(BoardConfig)
    .open()
    .await?;

let result = session
    .turn(TurnInput::text("Play one move."))
    .with_board(board)
    .require_submit()?
    .collect_with(&sink)
    .await?;

App Storage

Keep product storage separate from runtime storage. Your app database can store chats, users, messages, and UI state; Lash session persistence stores the runtime graph and checkpoints through a SessionStoreFactory.

App state

Own chat tables, account ids, frontend board state, request auth, and transport formats. The example app stores chat messages in SQLite and streams newline-delimited JSON to the browser.

Runtime state

Pass an explicit store factory such as lash_sqlite_store::SqliteSessionStoreFactory::new(...) to LashCoreBuilder::store_factory, or pass a concrete store to SessionBuilder::store, when sessions need durable runtime state across process restarts.

Session Lifecycle

A LashSession lives in one of two states: active (resident in memory, ready to run a turn immediately) or parked (in-memory state has been flushed to the store, only the session id and store reference are retained). Opening a session against an existing id rehydrates it from the store; the cost is one read of the graph head plus any plugin / mode snapshots referenced from it.

The runtime parks sessions automatically when the LashSession handle is dropped: in-flight state is flushed to the store, and the runtime keeps only the session id plus the store reference around. Long-running servers can hold thousands of session ids cheaply this way — work happens only while a handle exists. Residency policy controls how aggressively in-memory state is trimmed while a session is still active:

`Residency::KeepAll` (default)

Every node in the session graph stays resident. Fastest re-reads, highest memory. Right for interactive sessions that the user is actively driving.

`Residency::ActivePathOnly`

Only nodes on the active path are kept in memory; orphaned branches stay on disk. Lower memory at the cost of an occasional store read when crossing a forked branch. Right for long server processes managing many sessions.

use lash::Residency;

let core = LashCore::rlm()
    .provider(provider)
    .model(model, None)
    .max_context_tokens(200_000)
    .store_factory(store_factory)
    .advanced()
    .residency(Residency::ActivePathOnly)
    .build()?;

Some plugin work is genuinely background — spawned tasks that the plugin tracks through ToolContext::tasks(), observation summaries that compute after the turn returns, MCP server warm-ups. By default a turn returns as soon as the model's terminal value is committed; background work continues until the plugin finishes it or the runtime is dropped. To force the runtime to drain pending background work before returning (handy for one-shot processes that need every observation persisted before exit), call:

session.control().state().await_background_work().await?;

The CLI exposes this as the --await-background-work flag on lash --print; embedders typically don't need it unless they're running short-lived processes that exit immediately after a turn.

Durable Workflows

The standard LashCore path owns its own I/O loop and persists at the turn boundary. Some workloads need finer-grained durability — an LLM call expensive enough to replay, a tool batch that runs for minutes and can't afford to lose partial progress, a deployment story that crashes and migrates workers mid-turn. For those, lash-sansio exposes the underlying state machine directly as an effect / response protocol that an external workflow runtime (Temporal, Restate, similar) can drive.

The shape: TurnMachine::poll_effect() yields the next side effect (LlmCall, ToolCalls, ExecCode, Checkpoint, …), the workflow runtime invokes it as a durable activity keyed by the effect's EffectId, and TurnMachine::handle_response(...) feeds the result back. The whole machine round-trips through TurnMachine::checkpoint() / restore_from_checkpoint(...), and the EffectId counter is deterministic across replays — the n-th effect of a fresh turn is always EffectId(n) regardless of how many crashes and restarts happened in between, so workflow activities deduplicate naturally.

See Architecture → Durability for the full contract: which effects exist, what each waiting state retains across restore, replay-safety guarantees, the per-test coverage matrix, and a worked example of the workflow loop. Embedders who don't need workflow-level orchestration can ignore this seam — the regular session-store path already handles in-process durability.

Subagents

Subagents are configured with the same SessionSpec shape as root sessions. The factory owns the capability registry and host; child policy resolution always starts from the live parent snapshot.

use std::sync::Arc;
use lash::{plugins::PluginFactory, SessionSpec};
use lash_subagents::{default_registry, SubagentsPluginFactory};

let registry = Arc::new(default_registry(&tier_models));
let host = Arc::new(AppSubagentHost::new(child_store_factory));

let subagents = SubagentsPluginFactory::new(registry, host)
    .with_session_spec(SessionSpec::inherit().max_turns(8));

let core = LashCore::rlm()
    .provider(provider)
    .model(model, None)
    .max_context_tokens(200_000)
    .plugin(Arc::new(subagents) as Arc<dyn PluginFactory>)
    .build()?;

Capability implementations return SessionSpec overlays. StaticCapability is for exact child authority, while TierCapability implements the built-in explore and peer model/mode selection. Tool authors should not construct SessionPolicy for child configuration; it remains the resolved runtime artifact.

Built-in tiers

The default registry ships two tiers, each a thin TierCapability:

`explore`

Read-only investigator. Runs in RLM mode by default and cannot recurse — explore subagents may not spawn further subagents. Use it for "scan / summarise / verify" tasks where the agent should never mutate state.

`peer`

Parallel self. Inherits the parent's execution mode and full tool authority, including the ability to spawn its own subagents. Use it for "go do a parallel branch of the same work in a fresh window."

Both tiers consult the host's tier_models map for an explicit model override; if absent, they fall back to the provider's default_agent_model(tier) selection, and finally the parent session's model. Pass overrides through default_registry(&tier_models).

Capability resolution

Capability::resolve(ctx) sees the live SessionPolicy of the parent and returns a SessionSpec. The runtime composes that spec against the parent's policy to produce the child's effective configuration; the spec layer is what's persisted, so a resumed subagent re-derives the same authority from the parent state.

Regardless of which capability resolves a spawn, three interactive-only tools are always stripped from the subagent's tool surface: ask, showcase, and plan_exit. Those tools assume a human at the keyboard; a subagent that called them would block indefinitely. Subagents see every other tool the parent has access to, modulo whatever the capability's spec narrowed.

Usage attribution

Child sessions emit their own TurnEvent::ChildUsage entries on the parent's stream, tagged with source (e.g. "subagent", "compaction", "observer") and the child session_id. TurnResult.children_usage rolls these up per (source, model) at turn end; TurnResult::total_usage() sums parent and children. Dashboards that want a finer split should fold ChildUsage directly off the stream rather than computing from totals.

MCP Servers

MCP (Model Context Protocol) servers attach via the lash-plugin-mcp crate, which wraps the official rmcp SDK and supports the three standard transports — stdio (child process), streamable_http, and sse. The plugin owns a shared connection pool: every session built from the same LashCore reuses the same client per server, so stdio servers spawn once per process instead of per session.

use std::collections::BTreeMap;
use lash_plugin_mcp::{McpPluginFactory, McpServerConfig};

let mut servers = BTreeMap::new();
servers.insert(
    "docs".to_string(),
    McpServerConfig::stdio("uvx", vec!["mcp-server-docs".into()]),
);
servers.insert(
    "web".to_string(),
    McpServerConfig::streamable_http("https://mcp.example.com/rpc"),
);

let mcp = McpPluginFactory::new(servers).await?;

let core = LashCore::rlm()
    .provider(provider)
    .model(model, None)
    .max_context_tokens(200_000)
    .plugin(std::sync::Arc::new(mcp))
    .build()?;

Tools are surfaced under mcp__<server>__<tool> names with their original input and output schemas preserved. The factory's attach_server / detach_server methods let hosts add or remove servers at runtime without rebuilding the core.

Lifecycle and failure isolation

Each configured server connects once when McpPluginFactory::new(...) resolves. The factory holds the Arc<McpConnectionPool>; every SessionPlugin the factory builds clones that Arc, so stdio servers spawn once per process regardless of how many sessions are live. The CLI and a multi-tenant embedder both pay one server startup cost, not one per chat.

Runtime mutation is async and operates on the live pool. After a successful attach_server, new tools become visible to sessions created from the same core; existing sessions pick them up on their next tool-surface refresh (typically the start of the next turn). detach_server shuts the server down and removes its tools — in-flight tool calls against that server fail rather than hang.

// Hot-swap a server at runtime.
mcp.attach_server(
    "new-tool".to_string(),
    McpServerConfig::stdio("uvx", vec!["mcp-server-new".into()]),
).await?;

mcp.detach_server("old-tool").await?;

One bad server does not cascade. A stdio child that exits, an HTTP endpoint that 500s, or an SSE stream that drops surfaces as a per-call failure: the affected tool returns ToolResult::err(...) with the MCP error message; other servers and other tools on the same server keep working. If a server is unreachable at McpPluginFactory::new time, that single call fails — use McpPluginFactory::empty() plus runtime attach_server when initial connectivity is unreliable.

Advanced Runtime

The normal builder exposes app-facing setup. Use .advanced() only for runtime-host concerns such as custom plugin hosts, shared background task hosts, residency policy, imported RuntimeCoreConfig, and termination policy.

let core = LashCore::rlm()
    .provider(provider)
    .model("anthropic/claude-sonnet-4.6", None)
    .max_context_tokens(200_000)
    .store_factory(store_factory)
    .advanced()
    .residency(Residency::ActivePathOnly)
    .build()?;

Turn streaming is semantic by default: TurnBuilder::stream emits TurnActivity items and resolves with a rich TurnResult. Raw runtime telemetry belongs in tracing and lower-level runtime debugging, not the normal lash API surface.

Complete Example

The runnable browser example shows the full pattern: app-owned chat database, RLM mode, typed plugin input, tools that read turn input, semantic stream events for live UI, terminal output rendering from TurnOutput, and app-owned product persistence.

OPENROUTER_API_KEY=... cargo run -p agent-service
# then open http://127.0.0.1:3000

Source: examples/agent-service. The dedicated walkthrough is Agent Service.

Lash API anchors: crates/lash/src/lib.rs, crates/lash/src/core.rs, crates/lash/src/session.rs, crates/lash-plugin-mcp/src/lib.rs, examples/agent-service/src/main.rs, crates/lash-core/src/runtime/*.