lash/prompts

Context-window strategy, projected bindings into RLM scope, prompt template slots, and typed plugin input. For the full RLM prompt/request contract, start with the RLM protocol guide.

Context Window And History

ModelSpec.limits.context_window_tokens caps every provider request. It is the model's prompt (input) budget, not the total input+output context: the value the rolling-history / prompt-view layer prunes against. The CLI resolves it from a built-in limits table first and falls back to models.dev, applying a provider-level clamp where the route's real ceiling is lower than the catalog (e.g. the Codex OAuth route is clamped to ~256k). Overflow finishes as TurnStop::ProviderError; the runtime does not rewrite committed history. Compact into a new Agent Frame, switch model, or switch frames.

Two context strategies, picked via the mode's context-approach setting:

RollingHistory (default)

Prompt View shaping each turn: old images can be elided and the prompt can project a tail window. Durable session graph content remains append-only.

ObservationalMemory

Older context is represented through runtime-authored observations in the Prompt View. Configurable: observation_message_tokens (default 30 000), observation_buffer_tokens (6 000), reflection thresholds.

use std::sync::Arc;

use lash_standard_plugins::{
    ObservationalMemoryConfig, StandardContextApproach, StandardToolStackOptions,
    standard_tool_stack,
};

let core = lash::LashCore::standard_builder()
    .plugins(standard_tool_stack(StandardToolStackOptions {
        standard_context_approach: Some(StandardContextApproach::ObservationalMemory(
            ObservationalMemoryConfig::default(),
        )),
        ..Default::default()
    }))
    .effect_host(Arc::new(lash::durability::InlineEffectHost::default()))
    .attachment_store(Arc::new(lash::persistence::InMemoryAttachmentStore::new()))
    .build()?;

The CLI applies this only to the standard preset; RLM uses its protocol projection state and locked-down runtime stack instead. See RLM provider request shape for the model-visible request shape.

Projected Bindings

Named, typed, read-only values placed directly in lashlang scope. RLM-only. Use for environment data the program should read without a tool call: current request, board state, tenant id, task spec, parent-session digest.

Two scopes:

Per turn

TurnInput::rlm_project(...): values that change every turn.

Per session

Mode extension at session open: values that persist across turns.

use lash::TurnInput;
use lash::rlm::{RlmProjectedBindings, RlmTurnInputExt, rlm_session_projection_extension};

// Session-wide: applies to every turn the session runs.
session
    .admin()
    .protocol()
    .apply_session_extension(rlm_session_projection_extension(
        RlmProjectedBindings::new()
            .bind_json("tenant_id", serde_json::json!("acme"))?
            .bind_json("task", serde_json::to_value(&task)?)?,
    ))
    .await?;

// Per-turn: layered on top of the session bindings for this turn only.
let input = TurnInput::text("Play one move.").rlm_project(
    RlmProjectedBindings::new().bind_json("board", serde_json::to_value(&board)?)?,
)?;

let result = session.turn(input).run().await?;

The model reads these names directly inside a paired <lashlang> block:

<lashlang>
move_idx = best_move(board.cells, board.turn)
finish { move: move_idx, tenant: tenant_id }
</lashlang>

Three bind methods, in order of growing power:

Lazy bindings no longer accept a raw Arc<dyn ProjectedHostDescriptor> through the public RLM API. Keep the object behind a resolver so snapshots, AgentFrame switches, and child seeds carry an opaque ProjectionRef instead of accidentally materializing the large value.

use std::sync::Arc;

use lash::rlm::{ProjectionRegistry, RlmProjectedBindings, RlmTurnInputExt};
use lash::{TurnInput, plugins::runtime_plugin_stack};

let registry = Arc::new(ProjectionRegistry::new());
let factory = lash::rlm::RlmProtocolPluginFactory::new(
    lash::rlm::RlmProtocolPluginConfig::default(),
    Arc::new(lash::persistence::InMemoryLashlangArtifactStore::new()),
)
.with_projection_resolver(registry.clone());
let core = lash::LashCore::rlm_builder(factory)
    .plugins(runtime_plugin_stack())
    .effect_host(Arc::new(lash::durability::InlineEffectHost::default()))
    .attachment_store(Arc::new(lash::persistence::InMemoryAttachmentStore::new()))
    .build()?;

// `my_docs_projection` implements `lashlang::ProjectedHostDescriptor`.
let docs_ref = registry.register_memory(Arc::new(my_docs_projection));
let input = TurnInput::text("Answer using docs only when needed.")
    .rlm_project(RlmProjectedBindings::new().bind_lazy("docs", docs_ref)?)?;

ProjectionRegistry::register_memory(...) returns refs that remain valid while that registry lives. After process loss, or if the RLM factory was built with a different resolver, execution fails with an unavailable-projection error rather than replacing the binding with null.

Reserved names and the read-only guard

Projected names are read-only; assignment inside lashlang raises a protocol error. The runtime reserves history; override by narrowing through your own projected handle, not by rebinding.

Duplicate names between session and turn scopes are rejected at run() time. Pick one scope; merge at the source.

Seeds for AgentFrame switches and subagents

continue_as and spawn_agent accept a seed: map whose entries re-project in the AgentFrame or child session. Ref-backed projected values keep their ProjectionRef; value-backed projected values serialize as JSON. See Subagents and Projected Host Bindings.

Prompt-level contract: RLM variables and state. Deep reference: Architecture → Lashlang → Projected Host Bindings.

Prompt Templates And Slots

One template chooses the layout; slot contributions supply per-layer content. Core, session, and turn layers inherit; lower layers replace or clear single slots without rebuilding the template.

Template layout is separate from slot content
use std::sync::Arc;

use lash::prompt::{
    PromptBuiltin, PromptContribution, PromptSlot, PromptTemplate, PromptTemplateEntry,
    PromptTemplateSection,
};
use lash::{PromptLayerSink, TurnInput};

let template = PromptTemplate::new(vec![
    PromptTemplateSection::untitled(vec![
        PromptTemplateEntry::builtin(PromptBuiltin::MainAgentIntro),
        PromptTemplateEntry::slot(PromptSlot::Intro),
    ]),
    PromptTemplateSection::titled(
        "Guidance",
        vec![PromptTemplateEntry::slot(PromptSlot::Guidance)],
    ),
]);

let core = lash::LashCore::standard_builder()
    .provider(provider)
    .model(
        lash::ModelSpec::from_token_limits("gpt-5.4", None, 200_000, None)
            .expect("valid model metadata"),
    )
    .effect_host(Arc::new(lash::durability::InlineEffectHost::default()))
    .attachment_store(Arc::new(lash::persistence::InMemoryAttachmentStore::new()))
    .prompt_template(template)
    .prompt_contribution(PromptContribution::guidance(
        "App",
        "Answer as the host application assistant.",
    ))
    .build()?;

let session = core
    .session("customer-42")
    .replace_prompt_slot(
        PromptSlot::Guidance,
        [PromptContribution::guidance(
            "Tenant",
            "Use the tenant's support policy.",
        )],
    )
    .open()
    .await?;

let result = session
    .turn(TurnInput::text("Draft the response."))
    .prompt_contribution(PromptContribution::guidance(
        "Turn",
        "Keep this reply under 120 words.",
    ))
    .run()
    .await?;

Typed Plugin Input

Strongly-typed per-session configuration and per-turn input. The binding activates a core plugin factory; the session plugin registers prompt contributions, tools, and hooks through PluginRegistrar.

Plugin and tool pattern
#[derive(Clone, Debug)]
struct ToneConfig;

#[derive(Clone, Debug, serde::Serialize, serde::Deserialize)]
struct ToneTurnInput {
    tone: String,
}

#[derive(Clone, Debug)]
struct TonePlugin;

impl lash::PluginBinding for TonePlugin {
    const ID: &'static str = "tone";
    type SessionConfig = ToneConfig;
    type Input = ToneTurnInput;

    fn factory(_: &Self::SessionConfig) -> Arc<dyn lash::plugins::PluginFactory> {
        Arc::new(TonePluginFactory)
    }

    fn requires_turn_input(_: &Self::SessionConfig) -> bool {
        true
    }
}

impl lash::plugins::SessionPlugin for ToneSessionPlugin {
    fn id(&self) -> &'static str {
        TonePlugin::ID
    }

    fn register(
        &self,
        reg: &mut lash::plugins::PluginRegistrar,
    ) -> Result<(), lash::plugins::PluginError> {
        reg.prompt().contribute(Arc::new(|ctx| {
            Box::pin(async move {
                let Some(input) = ctx
                    .turn_context
                    .plugin_input::<ToneTurnInput>(TonePlugin::ID)
                else {
                    return Ok(Vec::new());
                };
                Ok(vec![lash::prompt::PromptContribution::environment(
                    "Tone",
                    format!("Use this response tone: {}", input.tone),
                )])
            })
        }));
        reg.tools().provider(Arc::new(ToneTools))
    }
}

#[async_trait::async_trait]
impl lash::tools::ToolProvider for ToneTools {
    fn tool_manifests(&self) -> Vec<lash::tools::ToolManifest> {
        tone_tool_definitions()
            .into_iter()
            .map(|definition| definition.manifest())
            .collect()
    }

    fn resolve_contract(&self, name: &str) -> Option<Arc<lash::tools::ToolContract>> {
        tone_tool_definitions()
            .into_iter()
            .find(|definition| definition.name() == name)
            .map(|definition| Arc::new(definition.contract()))
    }

    // Typed turn input is read at prepare time, where ToolPrepareContext
    // exposes plugin_input, then threaded into execute as the prepared payload.
    async fn prepare_tool_call(
        &self,
        call: lash::tools::ToolPrepareCall<'_>,
    ) -> Result<lash::tools::PreparedToolCall, lash::tools::ToolResult> {
        let Some(input) = call.context.plugin_input::<ToneTurnInput>(TonePlugin::ID) else {
            return Err(lash::tools::ToolResult::err_fmt("missing tone input"));
        };
        let prepared_payload = serde_json::to_value(input).map_err(|err| {
            lash::tools::ToolResult::err_fmt(format!("invalid tone input: {err}"))
        })?;
        Ok(lash::tools::PreparedToolCall::from_parts(
            call.pending.call_id,
            call.tool_id.clone(),
            call.pending.tool_name,
            call.pending.args,
            call.pending.replay,
            prepared_payload,
        ))
    }

    async fn execute(&self, call: lash::tools::ToolCall<'_>) -> lash::tools::ToolResult {
        let input = match call.context.decode_prepared_payload::<ToneTurnInput>() {
            Ok(input) => input,
            Err(err) => {
                return lash::tools::ToolResult::err_fmt(format!("missing tone input: {err}"));
            }
        };
        run_tone_tool(call.name, call.args, &input.tone)
    }
}

Each surface reads typed input where it is available: the prompt hook from its TurnContext, the tool in prepare_tool_call through ToolPrepareContext::plugin_input, then carried into execute as the prepared payload. The execute-time ToolContext has no plugin_input accessor by design.

Human input follows the same rule: a host-supplied tool that waits inside its implementation and returns a normal tool result. No separate runtime prompt event.

Per-turn plugin input is live host state. It is useful for local, synchronous turns, but durable workflow integrations that may resume a turn later should keep canonical app state in a store and have prompt hooks or tools reload it by session id. The agent-service example uses that store-backed pattern for its board state.

Plugin crates should export a domain extension trait wrapping the generic input primitive:

trait ToneTurnExt {
    fn with_tone(self, tone: impl Into<String>) -> Self;
}

impl ToneTurnExt for lash::TurnBuilder {
    fn with_tone(self, tone: impl Into<String>) -> Self {
        self.with_plugin_input::<TonePlugin>(ToneTurnInput { tone: tone.into() })
    }
}

Install on the session; use the extension method before each run:

let factory = lash::rlm::RlmProtocolPluginFactory::new(
    lash::rlm::RlmProtocolPluginConfig::default(),
    std::sync::Arc::new(lash::persistence::InMemoryLashlangArtifactStore::new()),
);
let core = lash::LashCore::rlm_builder(factory)
    .provider(provider)
    .model(
        lash::ModelSpec::from_token_limits(model.clone(), None, 200_000, None)
            .expect("valid model metadata"),
    )
    .effect_host(std::sync::Arc::new(
        lash::durability::InlineEffectHost::default(),
    ))
    .attachment_store(std::sync::Arc::new(
        lash::persistence::InMemoryAttachmentStore::new(),
    ))
    .build()?;

let session = core
    .session(chat_id)
    .plugin::<TonePlugin>(ToneConfig)
    .open()
    .await?;

use lash::rlm::RlmTurnBuilderExt as _;

let result = session
    .turn(TurnInput::text("Summarize this incident."))
    .with_tone("brief and factual")
    .require_finish()?
    .stream_to(&sink)
    .await?;
read on ·