There is a pattern that shows up in almost every team that starts working seriously with agents: the prompt keeps growing. It starts with a reasonable AGENTS.md, then the module docs, then three related files, then conversation history, then the output of a script, then “just in case” the full database schema. After a few weeks, the window is packed and the agent reasons worse than when it had half of that.

The natural reflex is to think “I need a bigger window.” It almost never is. The problem is not window size. It is that context is being treated as a pile when it should be treated as a budget. This post is about how to decide, per task, which slices actually change the agent’s output, and load only those.

Why piling context up degrades the output

When you dump everything into the prompt, three things happen at once, and none of them is good.

The first is dilution. The signal that matters for the specific task competes with hundreds of lines that do not. The model still “sees” everything, but the chance of it anchoring on the right passage drops.

The second is inconsistency across sessions. When half of the prompt is whatever someone pasted without a filter, the output stops being reproducible. Two runs of the same task can diverge simply because the noise was different.

The third is hidden cost. Tokens that do not affect the answer still pay for latency, API cost, and human review time. Worse: when it goes wrong, you cannot tell which piece of context was responsible, because there is too much context to inspect.

Context is a budget, not a pile

The mental shift that fixes this is treating context as a per-task budget. It is not “what fits in the window.” It is “what changes the output enough to justify the space it takes.”

If a file does not change the agent’s decision, it should not be there, regardless of whether it fits. If a doc excerpt only repeats what the project constitution already says, it is taking space twice. If a 2,000-line log has the useful information in five, only those five belong in the prompt.

The operational question stops being “what can I include?” and becomes “what can I remove without making the answer worse?”.

Three context layers

In practice, the context an agent sees falls into three layers that behave differently.

LayerWhat it isChange frequencyWhere it lives
StableProject constitution, conventions, commands, golden rulesRarely changesAGENTS.md, CLAUDE.md, scoped rule files
TaskSpecs, ACs (acceptance criteria), target files, cited ADRs (architecture decision records), relevant schema excerptsNew per taskStory, tech-spec, explicit selection
EphemeralCommand output, stack trace, diff, search result, log snippetWithin a single runSession messages

The stable layer is small and worth keeping in every session. The task layer should be selected with intent, not dragged along out of habit. The ephemeral layer is the most dangerous one: it is where the prompt quietly gets fat, because every command you run throws more tokens onto the pile.

The working rule is simple: the more ephemeral the context, the shorter its lifespan should be. Stack trace already fixed the bug? Drop it. ls output helped you pick a file? Only the file name needs to stick around.

This is where it gets operational. For every candidate piece of context, there are four possible destinations, not one.

DestinationWhen to useExamples
Load in fullThe agent needs the exact text to decide or editFile about to be modified, AC of the current story, a cited ADR
SummarizeThe information matters, but the original has too much noiseLong chat history, extensive doc where only three rules apply, meeting transcript
LinkMight be useful later, but does not change the current taskAdjacent docs, historical ADRs, reference manuals
Leave outDoes not change the output for this taskCode from unrelated modules, schema of untouched tables, logs from another feature

The classic mistake is to treat everything as “load in full” just in case. The “just in case” is the signal that the piece probably did not need to be there.

An honest test: before including a block, answer in one sentence what changes in the output if this block is gone. If you cannot answer, the block probably does not belong.

Common failures when context is poorly managed

Four patterns show up often enough to deserve their own names.

Bloat. The prompt grows session after session because nobody prunes. Each new task inherits trash from the previous ones. The symptom is slower, less focused answers, even on small tasks.

Stale context. Some snippet pasted days ago no longer reflects the current state of the code or the spec. The agent decides based on wrong information and produces code that “made sense” in the old version. The symptom is a conflict between what the agent claims and what the repository actually contains.

Lost-in-the-middle. When a prompt gets very long, information sitting in the middle tends to get less attention than the beginning and the end. Operationally, this means putting the critical rule in the middle of a huge block is roughly the same as not having it. The symptom is the agent ignoring a constraint that is “written down.”

Context poisoning. A wrong output from one iteration becomes the input of the next. A hallucination cited as fact contaminates every downstream decision. The symptom is the agent defending a claim with growing confidence while it gets worse at each step.

None of these problems is solved by a bigger window. All of them are solved by curation.

A flow to decide context before running the agent

Before kicking off the task, it pays to go through the flow below. It is deliberately short: the goal is to block the reflex of pasting everything.

flowchart TB T["Task defined"] --> Q1{"What decision does the agent need to make?"} Q1 --> Q2{"Which information changes that decision?"} Q2 --> C1["Stable: already in the constitution?"] Q2 --> C2["Task: specs, target files, cited ADRs"] Q2 --> C3["Ephemeral: only the useful slice of output"] C1 --> R["Assemble minimum context"] C2 --> R C3 --> R R --> CHECK{"Can anything be removed without hurting the output?"} CHECK -->|Yes| R CHECK -->|No| RUN["Run the agent"] accTitle: Context curation flow before running the agent accDescr: A flow that starts from the decision the agent needs to make, identifies which information changes that decision, distributes it into stable, task, and ephemeral layers, assembles the minimum context, and only runs after confirming there is no removable piece left.

In practice, most of the value comes from the first two questions. “What decision does the agent need to make?” filters by intent. “Which information changes that decision?” filters by relevance. The rest is pruning.

A short checklist before each task

If the flow above still feels abstract, this is the format that works well as a habit.

  • The project constitution (AGENTS.md, CLAUDE.md, or equivalent) is loaded and up to date.
  • The story or tech-spec for the task is explicit, with verifiable ACs.
  • Target files are named, not inferred.
  • Only ADRs actually cited by the decision are included; the rest stay out.
  • Only the useful slice of command output goes in, not the whole log.
  • Prior history is summarized or dropped if it does not inform the current task.
  • Nothing in the current context contradicts the real state of the repository.

Five minutes here saves hours of review later, especially on long-running tasks.

Honest trade-offs

Context curation has a cost. The trade-off is worth stating outright:

  • Manual selection is more work per task than pasting everything.
  • Summarizing well requires knowing the material.
  • Links depend on the agent being able to reach them when needed, which is not always guaranteed.
  • On exploratory tasks, you actually want broader context, not less.

The answer is not “always minimize.” It is to match the budget to the nature of the task. An isolated bug fix calls for surgical context. Discovery in an unfamiliar module calls for broader context, even if it needs aggressive pruning afterwards.

Where to start today

If your team already works with agents and the prompt has been growing out of control, three moves solve most of the problem:

  1. Separate stable from ephemeral. Anything that is a project rule leaves the message and becomes a constitution file loaded once.
  2. Require explicit selection of target files. Before running, list the files going in. “Include the whole folder” is a sign that the task is not well defined yet.
  3. Prune ephemeral aggressively. Stack trace, output, diff: only the slice that changes the decision. The rest goes.

The operational summary is short: treat context as a budget, not a pile. Ask what changes the output. Load a little, summarize well, link what can wait, leave out what does not matter. Smaller prompt, better reasoning, more reproducible failures.