RollingHistory (default)
Prompt View shaping each turn: old images can be elided and the prompt can project a tail window. Durable session graph content remains append-only.
Context-window strategy, projected bindings into RLM scope, prompt template slots, and typed plugin input. For the full RLM prompt/request contract, start with the RLM protocol guide.
ModelSpec.limits.context_window_tokens caps every provider request. It is the model's prompt (input) budget, not the total input+output context: the value the rolling-history / prompt-view layer prunes against. The CLI resolves it from a built-in limits table first and falls back to models.dev, applying a provider-level clamp where the route's real ceiling is lower than the catalog (e.g. the Codex OAuth route is clamped to ~256k). Overflow finishes as TurnStop::ProviderError; the runtime does not rewrite committed history. Compact into a new Agent Frame, switch model, or switch frames.
Two context strategies, picked via the mode's context-approach setting:
RollingHistory (default)Prompt View shaping each turn: old images can be elided and the prompt can project a tail window. Durable session graph content remains append-only.
ObservationalMemoryOlder context is represented through runtime-authored observations in the Prompt View. Configurable: observation_message_tokens (default 30 000), observation_buffer_tokens (6 000), reflection thresholds.
use std::sync::Arc;
use lash_standard_plugins::{
ObservationalMemoryConfig, StandardContextApproach, StandardToolStackOptions,
standard_tool_stack,
};
let core = lash::LashCore::standard_builder()
.plugins(standard_tool_stack(StandardToolStackOptions {
standard_context_approach: Some(StandardContextApproach::ObservationalMemory(
ObservationalMemoryConfig::default(),
)),
..Default::default()
}))
.effect_host(Arc::new(lash::durability::InlineEffectHost::default()))
.attachment_store(Arc::new(lash::persistence::InMemoryAttachmentStore::new()))
.build()?;
The CLI applies this only to the standard preset; RLM uses its protocol projection state and locked-down runtime stack instead. See RLM provider request shape for the model-visible request shape.
Named, typed, read-only values placed directly in lashlang scope. RLM-only. Use for environment data the program should read without a tool call: current request, board state, tenant id, task spec, parent-session digest.
Two scopes:
TurnInput::rlm_project(...): values that change every turn.
Mode extension at session open: values that persist across turns.
use lash::TurnInput;
use lash::rlm::{RlmProjectedBindings, RlmTurnInputExt, rlm_session_projection_extension};
// Session-wide: applies to every turn the session runs.
session
.admin()
.protocol()
.apply_session_extension(rlm_session_projection_extension(
RlmProjectedBindings::new()
.bind_json("tenant_id", serde_json::json!("acme"))?
.bind_json("task", serde_json::to_value(&task)?)?,
))
.await?;
// Per-turn: layered on top of the session bindings for this turn only.
let input = TurnInput::text("Play one move.").rlm_project(
RlmProjectedBindings::new().bind_json("board", serde_json::to_value(&board)?)?,
)?;
let result = session.turn(input).run().await?;
The model reads these names directly inside a paired <lashlang> block:
<lashlang>
move_idx = best_move(board.cells, board.turn)
finish { move: move_idx, tenant: tenant_id }
</lashlang>
Three bind methods, in order of growing power:
.bind_json(name, value): common case. Any serde_json::Value; re-typed on entry..bind_value(name, FlowValue): pass a native lashlang value, skipping the JSON round-trip..bind_lazy(name, ProjectionRef): for large host structures. Register the host object with a ProjectionResolver, bind the stable ref, and let RLM resolve it immediately before Lashlang execution.Lazy bindings no longer accept a raw Arc<dyn ProjectedHostDescriptor> through the public RLM API. Keep the object behind a resolver so snapshots, AgentFrame switches, and child seeds carry an opaque ProjectionRef instead of accidentally materializing the large value.
use std::sync::Arc;
use lash::rlm::{ProjectionRegistry, RlmProjectedBindings, RlmTurnInputExt};
use lash::{TurnInput, plugins::runtime_plugin_stack};
let registry = Arc::new(ProjectionRegistry::new());
let factory = lash::rlm::RlmProtocolPluginFactory::new(
lash::rlm::RlmProtocolPluginConfig::default(),
Arc::new(lash::persistence::InMemoryLashlangArtifactStore::new()),
)
.with_projection_resolver(registry.clone());
let core = lash::LashCore::rlm_builder(factory)
.plugins(runtime_plugin_stack())
.effect_host(Arc::new(lash::durability::InlineEffectHost::default()))
.attachment_store(Arc::new(lash::persistence::InMemoryAttachmentStore::new()))
.build()?;
// `my_docs_projection` implements `lashlang::ProjectedHostDescriptor`.
let docs_ref = registry.register_memory(Arc::new(my_docs_projection));
let input = TurnInput::text("Answer using docs only when needed.")
.rlm_project(RlmProjectedBindings::new().bind_lazy("docs", docs_ref)?)?;
ProjectionRegistry::register_memory(...) returns refs that remain valid while that registry lives. After process loss, or if the RLM factory was built with a different resolver, execution fails with an unavailable-projection error rather than replacing the binding with null.
Projected names are read-only; assignment inside lashlang raises a protocol error. The runtime reserves history; override by narrowing through your own projected handle, not by rebinding.
Duplicate names between session and turn scopes are rejected at run() time. Pick one scope; merge at the source.
continue_as and spawn_agent accept a seed: map whose entries re-project in the AgentFrame or child session. Ref-backed projected values keep their ProjectionRef; value-backed projected values serialize as JSON. See Subagents and Projected Host Bindings.
Prompt-level contract: RLM variables and state. Deep reference: Architecture → Lashlang → Projected Host Bindings.
One template chooses the layout; slot contributions supply per-layer content. Core, session, and turn layers inherit; lower layers replace or clear single slots without rebuilding the template.
use std::sync::Arc;
use lash::prompt::{
PromptBuiltin, PromptContribution, PromptSlot, PromptTemplate, PromptTemplateEntry,
PromptTemplateSection,
};
use lash::{PromptLayerSink, TurnInput};
let template = PromptTemplate::new(vec![
PromptTemplateSection::untitled(vec![
PromptTemplateEntry::builtin(PromptBuiltin::MainAgentIntro),
PromptTemplateEntry::slot(PromptSlot::Intro),
]),
PromptTemplateSection::titled(
"Guidance",
vec![PromptTemplateEntry::slot(PromptSlot::Guidance)],
),
]);
let core = lash::LashCore::standard_builder()
.provider(provider)
.model(
lash::ModelSpec::from_token_limits("gpt-5.4", None, 200_000, None)
.expect("valid model metadata"),
)
.effect_host(Arc::new(lash::durability::InlineEffectHost::default()))
.attachment_store(Arc::new(lash::persistence::InMemoryAttachmentStore::new()))
.prompt_template(template)
.prompt_contribution(PromptContribution::guidance(
"App",
"Answer as the host application assistant.",
))
.build()?;
let session = core
.session("customer-42")
.replace_prompt_slot(
PromptSlot::Guidance,
[PromptContribution::guidance(
"Tenant",
"Use the tenant's support policy.",
)],
)
.open()
.await?;
let result = session
.turn(TurnInput::text("Draft the response."))
.prompt_contribution(PromptContribution::guidance(
"Turn",
"Keep this reply under 120 words.",
))
.run()
.await?;
Strongly-typed per-session configuration and per-turn input. The binding activates a core plugin factory; the session plugin registers prompt contributions, tools, and hooks through PluginRegistrar.
#[derive(Clone, Debug)]
struct ToneConfig;
#[derive(Clone, Debug, serde::Serialize, serde::Deserialize)]
struct ToneTurnInput {
tone: String,
}
#[derive(Clone, Debug)]
struct TonePlugin;
impl lash::PluginBinding for TonePlugin {
const ID: &'static str = "tone";
type SessionConfig = ToneConfig;
type Input = ToneTurnInput;
fn factory(_: &Self::SessionConfig) -> Arc<dyn lash::plugins::PluginFactory> {
Arc::new(TonePluginFactory)
}
fn requires_turn_input(_: &Self::SessionConfig) -> bool {
true
}
}
impl lash::plugins::SessionPlugin for ToneSessionPlugin {
fn id(&self) -> &'static str {
TonePlugin::ID
}
fn register(
&self,
reg: &mut lash::plugins::PluginRegistrar,
) -> Result<(), lash::plugins::PluginError> {
reg.prompt().contribute(Arc::new(|ctx| {
Box::pin(async move {
let Some(input) = ctx
.turn_context
.plugin_input::<ToneTurnInput>(TonePlugin::ID)
else {
return Ok(Vec::new());
};
Ok(vec![lash::prompt::PromptContribution::environment(
"Tone",
format!("Use this response tone: {}", input.tone),
)])
})
}));
reg.tools().provider(Arc::new(ToneTools))
}
}
#[async_trait::async_trait]
impl lash::tools::ToolProvider for ToneTools {
fn tool_manifests(&self) -> Vec<lash::tools::ToolManifest> {
tone_tool_definitions()
.into_iter()
.map(|definition| definition.manifest())
.collect()
}
fn resolve_contract(&self, name: &str) -> Option<Arc<lash::tools::ToolContract>> {
tone_tool_definitions()
.into_iter()
.find(|definition| definition.name() == name)
.map(|definition| Arc::new(definition.contract()))
}
// Typed turn input is read at prepare time, where ToolPrepareContext
// exposes plugin_input, then threaded into execute as the prepared payload.
async fn prepare_tool_call(
&self,
call: lash::tools::ToolPrepareCall<'_>,
) -> Result<lash::tools::PreparedToolCall, lash::tools::ToolResult> {
let Some(input) = call.context.plugin_input::<ToneTurnInput>(TonePlugin::ID) else {
return Err(lash::tools::ToolResult::err_fmt("missing tone input"));
};
let prepared_payload = serde_json::to_value(input).map_err(|err| {
lash::tools::ToolResult::err_fmt(format!("invalid tone input: {err}"))
})?;
Ok(lash::tools::PreparedToolCall::from_parts(
call.pending.call_id,
call.tool_id.clone(),
call.pending.tool_name,
call.pending.args,
call.pending.replay,
prepared_payload,
))
}
async fn execute(&self, call: lash::tools::ToolCall<'_>) -> lash::tools::ToolResult {
let input = match call.context.decode_prepared_payload::<ToneTurnInput>() {
Ok(input) => input,
Err(err) => {
return lash::tools::ToolResult::err_fmt(format!("missing tone input: {err}"));
}
};
run_tone_tool(call.name, call.args, &input.tone)
}
}
Each surface reads typed input where it is available: the prompt hook from its TurnContext, the tool in prepare_tool_call through ToolPrepareContext::plugin_input, then carried into execute as the prepared payload. The execute-time ToolContext has no plugin_input accessor by design.
Human input follows the same rule: a host-supplied tool that waits inside its implementation and returns a normal tool result. No separate runtime prompt event.
Per-turn plugin input is live host state. It is useful for local, synchronous turns, but durable workflow integrations that may resume a turn later should keep canonical app state in a store and have prompt hooks or tools reload it by session id. The agent-service example uses that store-backed pattern for its board state.
Plugin crates should export a domain extension trait wrapping the generic input primitive:
trait ToneTurnExt {
fn with_tone(self, tone: impl Into<String>) -> Self;
}
impl ToneTurnExt for lash::TurnBuilder {
fn with_tone(self, tone: impl Into<String>) -> Self {
self.with_plugin_input::<TonePlugin>(ToneTurnInput { tone: tone.into() })
}
}
Install on the session; use the extension method before each run:
let factory = lash::rlm::RlmProtocolPluginFactory::new(
lash::rlm::RlmProtocolPluginConfig::default(),
std::sync::Arc::new(lash::persistence::InMemoryLashlangArtifactStore::new()),
);
let core = lash::LashCore::rlm_builder(factory)
.provider(provider)
.model(
lash::ModelSpec::from_token_limits(model.clone(), None, 200_000, None)
.expect("valid model metadata"),
)
.effect_host(std::sync::Arc::new(
lash::durability::InlineEffectHost::default(),
))
.attachment_store(std::sync::Arc::new(
lash::persistence::InMemoryAttachmentStore::new(),
))
.build()?;
let session = core
.session(chat_id)
.plugin::<TonePlugin>(ToneConfig)
.open()
.await?;
use lash::rlm::RlmTurnBuilderExt as _;
let result = session
.turn(TurnInput::text("Summarize this incident."))
.with_tone("brief and factual")
.require_finish()?
.stream_to(&sink)
.await?;