How to give agents context without turning everything into a giant prompt

There is a pattern that shows up in almost every team that starts working seriously with agents: the prompt keeps growing. It starts with a reasonable AGENTS.md, then the module docs, then three related files, then conversation history, then the output of a script, then “just in case” the full database schema. After a few weeks, the window is packed and the agent reasons worse than when it had half of that.

The natural reflex is to think “I need a bigger window.” It almost never is. The problem is not window size. It is that context is being treated as a pile when it should be treated as a budget. This post is about how to decide, per task, which slices actually change the agent’s output, and load only those.

Why piling context up degrades the output

When you dump everything into the prompt, three things happen at once, and none of them is good.

The first is dilution. The signal that matters for the specific task competes with hundreds of lines that do not. The model still “sees” everything, but the chance of it anchoring on the right passage drops.

The second is inconsistency across sessions. When half of the prompt is whatever someone pasted without a filter, the output stops being reproducible. Two runs of the same task can diverge simply because the noise was different.

The third is hidden cost. Tokens that do not affect the answer still pay for latency, API cost, and human review time. Worse: when it goes wrong, you cannot tell which piece of context was responsible, because there is too much context to inspect.

Context is a budget, not a pile

The mental shift that fixes this is treating context as a per-task budget. It is not “what fits in the window.” It is “what changes the output enough to justify the space it takes.”

If a file does not change the agent’s decision, it should not be there, regardless of whether it fits. If a doc excerpt only repeats what the project constitution already says, it is taking space twice. If a 2,000-line log has the useful information in five, only those five belong in the prompt.

The operational question stops being “what can I include?” and becomes “what can I remove without making the answer worse?”.

Three context layers

In practice, the context an agent sees falls into three layers that behave differently.

Layer	What it is	Change frequency	Where it lives
Stable	Project constitution, conventions, commands, golden rules	Rarely changes	`AGENTS.md`, `CLAUDE.md`, scoped rule files
Task	Specs, ACs (acceptance criteria), target files, cited ADRs (architecture decision records), relevant schema excerpts	New per task	Story, tech-spec, explicit selection
Ephemeral	Command output, stack trace, diff, search result, log snippet	Within a single run	Session messages

The stable layer is small and worth keeping in every session. The task layer should be selected with intent, not dragged along out of habit. The ephemeral layer is the most dangerous one: it is where the prompt quietly gets fat, because every command you run throws more tokens onto the pile.

The working rule is simple: the more ephemeral the context, the shorter its lifespan should be. Stack trace already fixed the bug? Drop it. ls output helped you pick a file? Only the file name needs to stick around.

What to load, what to summarize, what to link, what to leave out

This is where it gets operational. For every candidate piece of context, there are four possible destinations, not one.

Destination	When to use	Examples
Load in full	The agent needs the exact text to decide or edit	File about to be modified, AC of the current story, a cited ADR
Summarize	The information matters, but the original has too much noise	Long chat history, extensive doc where only three rules apply, meeting transcript
Link	Might be useful later, but does not change the current task	Adjacent docs, historical ADRs, reference manuals
Leave out	Does not change the output for this task	Code from unrelated modules, schema of untouched tables, logs from another feature

The classic mistake is to treat everything as “load in full” just in case. The “just in case” is the signal that the piece probably did not need to be there.

An honest test: before including a block, answer in one sentence what changes in the output if this block is gone. If you cannot answer, the block probably does not belong.

Common failures when context is poorly managed

Four patterns show up often enough to deserve their own names.

Bloat. The prompt grows session after session because nobody prunes. Each new task inherits trash from the previous ones. The symptom is slower, less focused answers, even on small tasks.

Stale context. Some snippet pasted days ago no longer reflects the current state of the code or the spec. The agent decides based on wrong information and produces code that “made sense” in the old version. The symptom is a conflict between what the agent claims and what the repository actually contains.

Lost-in-the-middle. When a prompt gets very long, information sitting in the middle tends to get less attention than the beginning and the end. Operationally, this means putting the critical rule in the middle of a huge block is roughly the same as not having it. The symptom is the agent ignoring a constraint that is “written down.”

Context poisoning. A wrong output from one iteration becomes the input of the next. A hallucination cited as fact contaminates every downstream decision. The symptom is the agent defending a claim with growing confidence while it gets worse at each step.

None of these problems is solved by a bigger window. All of them are solved by curation.

A flow to decide context before running the agent

Before kicking off the task, it pays to go through the flow below. It is deliberately short: the goal is to block the reflex of pasting everything.

flowchart TB T["Task defined"] --> Q1{"What decision does the agent need to make?"} Q1 --> Q2{"Which information changes that decision?"} Q2 --> C1["Stable: already in the constitution?"] Q2 --> C2["Task: specs, target files, cited ADRs"] Q2 --> C3["Ephemeral: only the useful slice of output"] C1 --> R["Assemble minimum context"] C2 --> R C3 --> R R --> CHECK{"Can anything be removed without hurting the output?"} CHECK -->|Yes| R CHECK -->|No| RUN["Run the agent"] accTitle: Context curation flow before running the agent accDescr: A flow that starts from the decision the agent needs to make, identifies which information changes that decision, distributes it into stable, task, and ephemeral layers, assembles the minimum context, and only runs after confirming there is no removable piece left.

In practice, most of the value comes from the first two questions. “What decision does the agent need to make?” filters by intent. “Which information changes that decision?” filters by relevance. The rest is pruning.

A short checklist before each task

If the flow above still feels abstract, this is the format that works well as a habit.

The project constitution (AGENTS.md, CLAUDE.md, or equivalent) is loaded and up to date.
The story or tech-spec for the task is explicit, with verifiable ACs.
Target files are named, not inferred.
Only ADRs actually cited by the decision are included; the rest stay out.
Only the useful slice of command output goes in, not the whole log.
Prior history is summarized or dropped if it does not inform the current task.
Nothing in the current context contradicts the real state of the repository.

Five minutes here saves hours of review later, especially on long-running tasks.

Honest trade-offs

Context curation has a cost. The trade-off is worth stating outright:

Manual selection is more work per task than pasting everything.
Summarizing well requires knowing the material.
Links depend on the agent being able to reach them when needed, which is not always guaranteed.
On exploratory tasks, you actually want broader context, not less.

The answer is not “always minimize.” It is to match the budget to the nature of the task. An isolated bug fix calls for surgical context. Discovery in an unfamiliar module calls for broader context, even if it needs aggressive pruning afterwards.

Where to start today

If your team already works with agents and the prompt has been growing out of control, three moves solve most of the problem:

Separate stable from ephemeral. Anything that is a project rule leaves the message and becomes a constitution file loaded once.
Require explicit selection of target files. Before running, list the files going in. “Include the whole folder” is a sign that the task is not well defined yet.
Prune ephemeral aggressively. Stack trace, output, diff: only the slice that changes the decision. The rest goes.

The operational summary is short: treat context as a budget, not a pile. Ask what changes the output. Load a little, summarize well, link what can wait, leave out what does not matter. Smaller prompt, better reasoning, more reproducible failures.

Why piling context up degrades the output#

Context is a budget, not a pile#

Three context layers#

What to load, what to summarize, what to link, what to leave out#

Common failures when context is poorly managed#

A flow to decide context before running the agent#

A short checklist before each task#

Honest trade-offs#

Where to start today#

Comments