- Durable session graph
- Conversation records, tool events, mode events, and plugin nodes live in one
SessionGraph per session. SessionReadView and ChronologicalProjection read it; nothing else owns persistence.
- Per-turn atomic commit
RuntimeCommit writes graph delta, checkpoint blobs, usage deltas, and head revision in one SQLite transaction with optimistic CAS on expected_head_revision. Partial turn = no commit.
- Typed plugin capabilities
- Tools see a named
ToolContext surface — tasks(), sessions(), direct_completion, tool_catalog, snapshot reads — not a general host handle. The surface exposes no ToolContext::host() escape hatch.
- Sandboxed code load, ready
- RLM mode runs model-emitted Lashlang programs in a VM with no filesystem, process, or network surface. Every effect crosses
ToolHost. Use it when the model should compose multiple tool calls per turn instead of one.
- Subagent capability boundaries
Capability::resolve(parent_policy) returns a constrained SessionSpec for the child. Interactive-only tools are stripped from every subagent surface regardless of capability.
- Semantic event stream
- Identity-bearing
TurnActivity items: assistant prose deltas, reasoning deltas, tool start/complete pairs with correlation ids, code-block start/complete, terminal SubmittedValue/ToolValue, per-call and rolling usage. Assistant prose deltas are live previews; the settled transcript comes from TurnResult / SessionReadView after the runtime commit.
- Tracing as a first-class sink
- Every provider call across every session emits
TraceRecords through TraceSink implementations. JSONL by default, OTel optional.
- Snapshot and restore seams
- Plugins, tool state, and the Lashlang VM each persist through versioned snapshot writers so a parked session resumes intact across process restarts.
What sits outside this crate, on purpose: tenancy boundaries, retention and lifecycle for long-lived artifacts, discovery of agent-authored procedures, a shared coherent image across many sessions. lash is the kernel and runtime; the platform-shaped pieces belong to whatever embeds it.
lash vs lash-core boundary
use std::sync::Arc;
use lash::{tools::*, LashCore, ModeId, ModePreset, PluginStack, TurnEvent, TurnInput};
let store_factory = Arc::new(lash_sqlite_store::SqliteSessionStoreFactory::new("lash-sessions"));
let core = LashCore::builder()
.install_mode(ModePreset::standard())
.install_mode(ModePreset::rlm())
.default_mode(ModeId::rlm())
.provider(provider)
.model("anthropic/claude-sonnet-4.6", None)
.max_context_tokens(200_000)
.plugins(PluginStack::runtime())
.tools(Arc::new(AppTools) as Arc<dyn ToolProvider>)
.store_factory(store_factory)
.build()?;
let session = core.session("chat-123").rlm().open().await?;
let result = session.turn(TurnInput::text("Use the app tools.")).run().await?;
let assistant_text: String = result.activities.iter().filter_map(|activity| match &activity.event {
TurnEvent::AssistantProseDelta { text } => Some(text.as_str()),
_ => None,
}).collect();
println!("{assistant_text}");
LashCore
- Cloneable shared configuration: provider, model, installed modes, tool providers, plugin factories, store factory, attachment store, and tracing. Runtime internals such as residency and termination policy live behind
.advanced().
ModePreset
- Installs execution modes. Use
ModePreset::standard() for native provider tool calls, ModePreset::rlm() for Lashlang-driven RLM turns, or install both and choose the default with default_mode.
SessionSpec
- Reusable public configuration overlay for provider, model/variant, execution mode, max context tokens, max turns, and prompt layer. Root cores use
SessionSpec::new(); child sessions and subagents usually use SessionSpec::inherit().
PluginStack
- Ordered plugin factory list.
LashCore::standard() and LashCore::rlm() include PluginStack::runtime(); a raw LashCore::builder() stays explicit and needs modes and plugins installed by the host.
LashSession
- One app conversation or task. Sessions wrap a parked/resumable runtime, can use a per-session store, and expose
turn(TurnInput), run(TurnInput), read_view(), and lower-level control groups through control().
TurnBuilder
- Per-turn configuration: cancellation, mode options, typed plugin input, and RLM-projected bindings. Call
.stream(&sink) for live events or .run() for a collected ordered activity log.
Root defaults
use lash::{plugins::PluginFactory, LashCore, PluginStack, SessionSpec};
let root_spec = SessionSpec::new()
.provider(provider)
.model("gpt-5.4", None)
.max_context_tokens(200_000);
let core = LashCore::rlm()
.session_spec(root_spec)
.configure_plugins(|plugins| {
plugins.push(Arc::new(AppPluginFactory) as Arc<dyn PluginFactory>);
})
.build()?;
Explicit stacks
let plugins = PluginStack::runtime().configure(|plugins| {
plugins.replace(Arc::new(CustomBudgetPlugin) as Arc<dyn PluginFactory>);
plugins.push(Arc::new(AppPluginFactory) as Arc<dyn PluginFactory>);
});
let core = LashCore::builder()
.install_mode(ModePreset::rlm())
.default_mode(ModeId::rlm())
.session_spec(root_spec)
.plugins(plugins)
.build()?;
.plugin(...) appends one factory to the current stack, .plugins(...) replaces the full stack, and .configure_plugins(...) mutates the current stack in place. That gives hosts a default runtime set while still allowing precise removal, replacement, or insertion.
Collected result
let collected = session
.turn(TurnInput::text("Summarize this task."))
.run()
.await?;
let live_preview: String = collected.activities.iter().filter_map(|activity| match &activity.event {
TurnEvent::AssistantProseDelta { text } => Some(text.as_str()),
_ => None,
}).collect();
let settled_answer = collected.assistant_message().unwrap_or_default();
let parent_usage = collected.result.usage; // parent's own LLM tokens
let children = &collected.result.children_usage; // per-(source, model) child entries
let total = collected.result.total_usage(); // parent + children
let outcome = collected.result.outcome;
Live stream
let ui_sink = Arc::new(AppEvents::new(tx));
let turn = session
.turn(TurnInput::text(user_text))
.stream(ui_sink.as_ref())
.await?;
persist(turn.assistant_message().unwrap_or_default(), turn.total_usage())?;
Apps own their live projection policy. Fold TurnActivity directly for partial UI state, terminal-value previews, tool summaries, or timelines. At turn completion, replace the current-turn display from the returned TurnResult.state.read_view() or the latest SessionReadView; do not append a separate final assistant event after streamed prose. Treat TurnActivity.correlation_id as the stable identity for multi-phase UI rows: start events insert rows, completion events update those same rows.
TurnOutcome shape
pub enum TurnOutcome {
Finished(TurnFinish),
Handoff { session_id: String },
Stopped(TurnStop),
}
pub enum TurnFinish {
AssistantMessage { text: String },
SubmittedValue { value: serde_json::Value },
ToolValue { tool_name: String, value: serde_json::Value },
}
Finished means the turn completed cleanly with a terminal value. AssistantMessage is the default — a free-form prose answer — and is the only terminal semantic result for prose in standard and RLM modes. Live prose may already have streamed as AssistantProseDelta, but there is no second "final message" stream event; the shared runtime commit materializes the assistant transcript message exactly once. SubmittedValue and ToolValue come from RLM control flow when the program ends with submit or a tool-authored terminal value; the same JSON value already arrived as a TurnEvent::SubmittedValue / TurnEvent::ToolValue in the stream.
Handoff means a mode plugin or tool returned control to a fresh session — usually a context-compaction successor or a continue-as redirect. The new session is persisted before this returns; resume against session_id to continue.
Stopped means the turn was aborted before producing a terminal value. TurnStop has ten variants:
| Variant | Class | Cause |
Cancelled | Host-driven | The cancellation token passed to TurnBuilder::cancellation(...) fired. |
InvalidInput | Host-driven | Input normalization rejected the TurnInput (e.g. missing attachment, malformed image ref). |
Incomplete | Provider | The LLM hit its output limit mid-message without finishing. |
ProviderError | Provider | The provider returned an error, a content-filter refusal, or a context-overflow terminal reason. |
MaxTurns | Runtime | The mode loop hit session_spec.max_turns and the final forced-reply turn finished without a terminal. |
ToolFailure | Runtime | A tool returned ToolResult::err(...) and turn assembly flagged the turn as failed. |
PluginAbort | Runtime | A plugin's turn-preparation or checkpoint hook returned an abort directive. |
RuntimeError | Fatal | Internal runtime failure or a strict-termination policy with no Done event. Treat as a bug or environmental fault. |
SubmittedError { value } | RLM | An RLM program ended with submit_error(...); value is the model-authored payload. |
ToolError { tool_name, value } | RLM | A mode plugin signalled a tool-authored error terminal; value is the payload. |
Host-driven stops are recoverable on the next request. Provider stops usually warrant a retry with a different model or a shorter context. Runtime stops mean a configuration / authoring problem the user needs to know about. RuntimeError is the only category where re-running with identical input is unlikely to help.
Optimistic-concurrency commit conflicts on the session graph are not represented as a Stopped variant — they surface as Err from the run()/stream() call itself, with no partial commit. See Persistence → The Contract for the retry shape.
match result.outcome {
lash::TurnOutcome::Finished(finish) => persist_terminal(finish)?,
lash::TurnOutcome::Handoff { session_id } => reopen(session_id).await?,
lash::TurnOutcome::Stopped(stop) => match stop {
lash::TurnStop::Cancelled
| lash::TurnStop::InvalidInput => report_user_visible(stop),
lash::TurnStop::ProviderError
| lash::TurnStop::Incomplete => offer_retry(stop),
lash::TurnStop::MaxTurns => suggest_higher_max_turns(),
other => record_for_diagnosis(other),
},
}
Three event types — only one is for your app.
Apps consume TurnActivity: an identity-bearing wrapper that carries a fresh id, a stable correlation_id, and a TurnEvent payload. The payload-shape enum is TurnEvent — what kind of event happened. SessionEvent sits one layer deeper as a runtime/debug enum used inside the session graph; hosts should not subscribe to it, and there is no app-facing SessionEvent::Message { kind: "final" } contract. If you're writing a TurnActivitySink, you only ever match on TurnEvent variants — that's the app surface.
TurnActivity sink
use async_trait::async_trait;
use lash::{TurnActivity, TurnActivitySink, TurnEvent};
struct AppEvents {
tx: AppUiTx,
turn_state: std::sync::Mutex<TurnUiState>,
}
#[derive(Default)]
struct TurnUiState {
reasoning: Option<UiRowId>,
tools: std::collections::HashMap<String, UiRowId>,
code: Option<UiRowId>,
}
#[async_trait]
impl TurnActivitySink for AppEvents {
async fn emit(&self, activity: TurnActivity) {
let correlation_id = activity.correlation_id.0.clone();
match activity.event {
TurnEvent::AssistantProseDelta { text } => {
append_live_text(text).await;
}
TurnEvent::ReasoningDelta { text } => {
let row = self.turn_state.lock().unwrap().reasoning.clone();
let row = upsert_reasoning_row(row, text).await;
self.turn_state.lock().unwrap().reasoning = Some(row);
}
TurnEvent::ToolCallStarted { name, args, .. } => {
let row = insert_tool_row(name, args).await;
self.turn_state
.lock()
.unwrap()
.tools
.insert(correlation_id, row);
}
TurnEvent::ToolCallCompleted { name, output, .. } => {
let row = self.turn_state.lock().unwrap().tools.remove(&correlation_id);
update_or_insert_tool_row(row, name, output).await;
}
TurnEvent::CodeBlockStarted { language, code } => {
let row = insert_code_row(language, code).await;
self.turn_state.lock().unwrap().code = Some(row);
}
TurnEvent::CodeBlockCompleted { language, output, error, success, .. } => {
let row = self.turn_state.lock().unwrap().code.take();
update_or_insert_code_row(row, language, output, error, success).await;
}
TurnEvent::SubmittedValue { value } => {
append_live_text(render_terminal_value(&value)).await;
}
TurnEvent::ToolValue { tool_name, value } => {
append_live_text(render_terminal_value(&value)).await;
record_terminal_tool(tool_name).await;
}
TurnEvent::Usage { usage, cumulative, .. } => {
update_usage(usage, cumulative).await;
}
TurnEvent::ChildUsage { source, usage, cumulative, .. } => {
update_child_usage(source, usage, cumulative).await;
}
TurnEvent::Error { .. }
| _ => {}
}
}
}
fn render_terminal_value(value: &serde_json::Value) -> String {
match value {
serde_json::Value::Null => String::new(),
serde_json::Value::String(text) => text.clone(),
other => serde_json::to_string_pretty(other).unwrap_or_else(|_| other.to_string()),
}
}
Use event identity, not duplicate detection.
ToolCallStarted and ToolCallCompleted describe the same logical row when their correlation_id matches; code-block events work the same way. TurnEvent::SubmittedValue and TurnEvent::ToolValue mean “a new terminal value was authored by a control path.” They are not emitted for a normal assistant prose finish because that prose already streamed as AssistantProseDelta and the settled message is reconstructed from the read view.
The conventional pattern is a thin sink that pushes onto an async mpsc channel and returns immediately, with a separate consumer task draining the channel into UI state. tokio::sync::mpsc::Sender::send only awaits when the channel is full, so the channel bound becomes your backpressure dial:
use tokio::sync::mpsc;
struct ChannelSink { tx: mpsc::Sender<TurnActivity> }
#[async_trait::async_trait]
impl TurnActivitySink for ChannelSink {
async fn emit(&self, activity: TurnActivity) {
// send().await yields when the channel is full — the turn
// will pause here if your UI consumer falls behind.
let _ = self.tx.send(activity).await;
}
}
If a sink panics or returns an Err-shaped activity, the runtime keeps going — emit failures are observed but do not abort the turn. Persistence and the turn's own activity log are independent of the sink: a dropped sink does not lose the activity, because TurnResult.activities still holds the canonical ordered log when the turn completes.
TraceSink
- Every provider call across every session in the runtime. Right for billing, audit, off-line analysis. Heavier than necessary for plain totals.
TurnEvent::Usage / TurnEvent::ChildUsage
- Live during a turn, one event per LLM iteration.
Usage is the parent's own model call; ChildUsage carries session_id and source so a UI can group child traffic (subagent, compaction, observer). Right for live counters.
TurnResult.usage / TurnResult.children_usage
- Per-turn snapshot at completion.
usage is parent-only; children_usage is a per-(source, model) breakdown for any child sessions that ran during the turn. TurnResult::total_usage() sums both.
session.usage_report() → SessionUsageReport
- Aggregate across the whole session, broken down by
source × model. Right for dashboards and "session so far."
The full re-export and well-known source label constants live in lash::usage.
Two strategies manage what's actually passed to the model across many turns. Pick one per session (or per core) via the standard mode's context-approach setting:
RollingHistory (default)
Every turn includes the full ordered conversation history, trimmed only as needed to satisfy max_context_tokens. Best for short-to-medium sessions where the model genuinely benefits from seeing prior turns verbatim.
ObservationalMemory
Older turns are replaced with rolling observations — short summaries authored by the runtime as the session grows. Configurable: observation_message_tokens (default 30 000), observation_buffer_tokens (6 000), and reflection thresholds. Best for long-running agents that should remember themes without quoting old messages.
use lash_core::{ObservationalMemoryConfig, StandardContextApproach};
let session = core
.session("long-running-thread")
.standard_context_approach(StandardContextApproach::ObservationalMemory(
ObservationalMemoryConfig::default(),
))
.open()
.await?;
RLM mode has its own context-approach knob with the same shape; both modes share the same enum so a core that installs both modes can default to one and override per session.
Per-turn RLM options
let submitted = session
.turn(TurnInput::text("Move on the board."))
.require_submit()?
.stream(&sink)
.await?;
let prose_or_submit = session
.turn(TurnInput::text("Answer directly if no code is needed."))
.allow_prose_or_submit()?
.run()
.await?;
Outcome shape
match result.outcome {
lash::TurnOutcome::Finished(
lash::TurnFinish::SubmittedValue { value }
) => {
// Same value already arrived as TurnEvent::SubmittedValue.
persist_typed_value(value)?;
}
lash::TurnOutcome::Finished(
lash::TurnFinish::AssistantMessage { text }
) => persist_text(text)?,
other => handle_other_outcome(other)?,
}
The mechanism is RLM-only. Two scopes:
Per turn
Attach bindings to the input of one turn via TurnInput::rlm_project(...). Right for values that change every turn — the user's current message, fresh board state, a request payload.
Per session
Apply a mode extension at session open so the same bindings appear in every turn of the session. Right for values that persist across turns — tenant id, an immutable task spec, a long-lived parent reference.
use lash::TurnInput;
use lash_mode_rlm::{RlmProjectedBindings, RlmTurnInputExt, rlm_session_projection_extension};
// Session-wide: applies to every turn the session runs.
session
.control()
.mode()
.apply_session_extension(rlm_session_projection_extension(
RlmProjectedBindings::new()
.bind_json("tenant_id", serde_json::json!("acme"))?
.bind_json("task", serde_json::to_value(&task)?)?,
))
.await?;
// Per-turn: layered on top of the session bindings for this turn only.
let input = TurnInput::text("Play one move.")
.rlm_project(
RlmProjectedBindings::new()
.bind_json("board", serde_json::to_value(&board)?)?,
)?;
let result = session.turn(input).run().await?;
The model sees these names directly in fenced lashlang code:
```lashlang
let move_idx = best_move(board.cells, board.turn)
submit({ "move": move_idx, "tenant": tenant_id })
```
Three ways to bind a value, in order of growing power:
.bind_json(name, value) — the common case. Pass any serde_json::Value; lashlang re-types it on the way in.
.bind_value(name, FlowValue) — pass a native lashlang::Value directly, skipping the JSON round-trip. Use this when you already hold a lashlang value (e.g. from history).
.bind_lazy(name, Arc<dyn ProjectedHostValue>) — for large host structures you don't want to clone. Implement get_field / get_index / len / render / materialize and the executor calls them on demand as the program reads fields.
Reserved names and the read-only guard
Projected names are reserved. Attempting tenant_id = "other" inside lashlang raises `tenant_id` is a read-only projected binding. The runtime also reserves history — every RLM turn already projects the conversation history through that name. A host that wants to override history should narrow it through its own projected handle rather than bind a new value over it.
Duplicate names between session-scope and turn-scope are rejected at session.turn(...).run() time with a protocol error — no silent override. Pick one scope per name; merge at the source if both layers want to contribute.
Seeds for handoffs and subagents
The RLM control tools continue_as and spawn_agent accept a seed: map whose entries are re-projected in the successor or child session. Pass input.prompt, a projected handle, or any other root-projected value as a seed entry and it stays projected on the other side. See Subagents and the Projected Host Bindings chapter for the wire format and the seed-classifier rules.
The deep reference — ProjectedBindings, ProjectedValue, ProjectedHostValue, the read-only assignment rule, and the RLM serialization marker — lives in Architecture → Lashlang → Projected Host Bindings.
Template layout is separate from slot content
use lash::{
PromptBuiltin, PromptContribution, PromptSlot, PromptTemplate,
PromptTemplateEntry, PromptTemplateSection,
};
let template = PromptTemplate::new(vec![
PromptTemplateSection::untitled(vec![
PromptTemplateEntry::builtin(PromptBuiltin::MainAgentIntro),
PromptTemplateEntry::slot(PromptSlot::Intro),
]),
PromptTemplateSection::titled(
"Guidance",
vec![PromptTemplateEntry::slot(PromptSlot::Guidance)],
),
]);
let core = lash::LashCore::standard()
.provider(provider)
.model("gpt-5.4", None)
.max_context_tokens(200_000)
.prompt_template(template)
.prompt_contribution(PromptContribution::guidance(
"App",
"Answer as the host application assistant.",
))
.build()?;
let session = core
.session("customer-42")
.replace_prompt_slot(
PromptSlot::Guidance,
[PromptContribution::guidance(
"Tenant",
"Use the tenant's support policy.",
)],
)
.open()
.await?;
let result = session
.turn(TurnInput::text("Draft the response."))
.prompt_contribution(PromptContribution::guidance(
"Turn",
"Keep this reply under 120 words.",
))
.run()
.await?;
App state
Own chat tables, account ids, frontend board state, request auth, and transport formats. The example app stores chat messages in SQLite and streams newline-delimited JSON to the browser.
Runtime state
Pass an explicit store factory such as lash_sqlite_store::SqliteSessionStoreFactory::new(...) to LashCoreBuilder::store_factory, or pass a concrete store to SessionBuilder::store, when sessions need durable runtime state across process restarts.
The runtime parks sessions automatically when the LashSession handle is dropped: in-flight state is flushed to the store, and the runtime keeps only the session id plus the store reference around. Long-running servers can hold thousands of session ids cheaply this way — work happens only while a handle exists. Residency policy controls how aggressively in-memory state is trimmed while a session is still active:
Residency::KeepAll (default)
Every node in the session graph stays resident. Fastest re-reads, highest memory. Right for interactive sessions that the user is actively driving.
Residency::ActivePathOnly
Only nodes on the active path are kept in memory; orphaned branches stay on disk. Lower memory at the cost of an occasional store read when crossing a forked branch. Right for long server processes managing many sessions.
use lash::Residency;
let core = LashCore::rlm()
.provider(provider)
.model(model, None)
.max_context_tokens(200_000)
.store_factory(store_factory)
.advanced()
.residency(Residency::ActivePathOnly)
.build()?;
Some plugin work is genuinely background — spawned tasks that the plugin tracks through ToolContext::tasks(), observation summaries that compute after the turn returns, MCP server warm-ups. By default a turn returns as soon as the model's terminal value is committed; background work continues until the plugin finishes it or the runtime is dropped. To force the runtime to drain pending background work before returning (handy for one-shot processes that need every observation persisted before exit), call:
session.control().state().await_background_work().await?;
The CLI exposes this as the --await-background-work flag on lash --print; embedders typically don't need it unless they're running short-lived processes that exit immediately after a turn.
The shape: TurnMachine::poll_effect() yields the next side effect (LlmCall, ToolCalls, ExecCode, Checkpoint, …), the workflow runtime invokes it as a durable activity keyed by the effect's EffectId, and TurnMachine::handle_response(...) feeds the result back. The whole machine round-trips through TurnMachine::checkpoint() / restore_from_checkpoint(...), and the EffectId counter is deterministic across replays — the n-th effect of a fresh turn is always EffectId(n) regardless of how many crashes and restarts happened in between, so workflow activities deduplicate naturally.
See Architecture → Durability for the full contract: which effects exist, what each waiting state retains across restore, replay-safety guarantees, the per-test coverage matrix, and a worked example of the workflow loop. Embedders who don't need workflow-level orchestration can ignore this seam — the regular session-store path already handles in-process durability.
use std::sync::Arc;
use lash::{plugins::PluginFactory, SessionSpec};
use lash_subagents::{default_registry, SubagentsPluginFactory};
let registry = Arc::new(default_registry(&tier_models));
let host = Arc::new(AppSubagentHost::new(child_store_factory));
let subagents = SubagentsPluginFactory::new(registry, host)
.with_session_spec(SessionSpec::inherit().max_turns(8));
let core = LashCore::rlm()
.provider(provider)
.model(model, None)
.max_context_tokens(200_000)
.plugin(Arc::new(subagents) as Arc<dyn PluginFactory>)
.build()?;
Capability implementations return SessionSpec overlays. StaticCapability is for exact child authority, while TierCapability implements the built-in explore and peer model/mode selection. Tool authors should not construct SessionPolicy for child configuration; it remains the resolved runtime artifact.
Built-in tiers
The default registry ships two tiers, each a thin TierCapability:
explore
Read-only investigator. Runs in RLM mode by default and cannot recurse — explore subagents may not spawn further subagents. Use it for "scan / summarise / verify" tasks where the agent should never mutate state.
peer
Parallel self. Inherits the parent's execution mode and full tool authority, including the ability to spawn its own subagents. Use it for "go do a parallel branch of the same work in a fresh window."
Both tiers consult the host's tier_models map for an explicit model override; if absent, they fall back to the provider's default_agent_model(tier) selection, and finally the parent session's model. Pass overrides through default_registry(&tier_models).
Capability resolution
Capability::resolve(ctx) sees the live SessionPolicy of the parent and returns a SessionSpec. The runtime composes that spec against the parent's policy to produce the child's effective configuration; the spec layer is what's persisted, so a resumed subagent re-derives the same authority from the parent state.
Regardless of which capability resolves a spawn, three interactive-only tools are always stripped from the subagent's tool surface: ask, showcase, and plan_exit. Those tools assume a human at the keyboard; a subagent that called them would block indefinitely. Subagents see every other tool the parent has access to, modulo whatever the capability's spec narrowed.
Usage attribution
Child sessions emit their own TurnEvent::ChildUsage entries on the parent's stream, tagged with source (e.g. "subagent", "compaction", "observer") and the child session_id. TurnResult.children_usage rolls these up per (source, model) at turn end; TurnResult::total_usage() sums parent and children. Dashboards that want a finer split should fold ChildUsage directly off the stream rather than computing from totals.
use std::collections::BTreeMap;
use lash_plugin_mcp::{McpPluginFactory, McpServerConfig};
let mut servers = BTreeMap::new();
servers.insert(
"docs".to_string(),
McpServerConfig::stdio("uvx", vec!["mcp-server-docs".into()]),
);
servers.insert(
"web".to_string(),
McpServerConfig::streamable_http("https://mcp.example.com/rpc"),
);
let mcp = McpPluginFactory::new(servers).await?;
let core = LashCore::rlm()
.provider(provider)
.model(model, None)
.max_context_tokens(200_000)
.plugin(std::sync::Arc::new(mcp))
.build()?;
Tools are surfaced under mcp__<server>__<tool> names with their original input and output schemas preserved. The factory's attach_server / detach_server methods let hosts add or remove servers at runtime without rebuilding the core.
Lifecycle and failure isolation
Each configured server connects once when McpPluginFactory::new(...) resolves. The factory holds the Arc<McpConnectionPool>; every SessionPlugin the factory builds clones that Arc, so stdio servers spawn once per process regardless of how many sessions are live. The CLI and a multi-tenant embedder both pay one server startup cost, not one per chat.
Runtime mutation is async and operates on the live pool. After a successful attach_server, new tools become visible to sessions created from the same core; existing sessions pick them up on their next tool-surface refresh (typically the start of the next turn). detach_server shuts the server down and removes its tools — in-flight tool calls against that server fail rather than hang.
// Hot-swap a server at runtime.
mcp.attach_server(
"new-tool".to_string(),
McpServerConfig::stdio("uvx", vec!["mcp-server-new".into()]),
).await?;
mcp.detach_server("old-tool").await?;
One bad server does not cascade. A stdio child that exits, an HTTP endpoint that 500s, or an SSE stream that drops surfaces as a per-call failure: the affected tool returns ToolResult::err(...) with the MCP error message; other servers and other tools on the same server keep working. If a server is unreachable at McpPluginFactory::new time, that single call fails — use McpPluginFactory::empty() plus runtime attach_server when initial connectivity is unreliable.
let core = LashCore::rlm()
.provider(provider)
.model("anthropic/claude-sonnet-4.6", None)
.max_context_tokens(200_000)
.store_factory(store_factory)
.advanced()
.residency(Residency::ActivePathOnly)
.build()?;
Turn streaming is semantic by default: TurnBuilder::stream emits TurnActivity items and resolves with a rich TurnResult. Raw runtime telemetry belongs in tracing and lower-level runtime debugging, not the normal lash API surface.
OPENROUTER_API_KEY=... cargo run -p agent-service
# then open http://127.0.0.1:3000
Source: examples/agent-service. The dedicated walkthrough is Agent Service.