ReportWire

LLM Council, With a Dash of Assess-Decide-Do – Dragos Roua

Last weekend I stumbled upon Andrej Karpathy’s LLM Council project. A Saturday hack, he called it—born from wanting to read books alongside multiple AI models simultaneously. The idea is simple: instead of asking one LLM your question, you ask four. LLMs at the same time Then you make them evaluate each other’s work. Then a “chairman” synthesizes everything into a conclusion.

What caught my attention wasn’t just the technical elegance. It was the underlying structure. Those stages looked suspiciously familiar.

How LLM Council Works

The system operates in three sequential phases:

Stage 1: First Opinions. Your query goes to all council members in parallel—GPT, Claude, Gemini, Grok, whoever you’ve configured. Each model responds independently. You can inspect all responses in tabs, side by side.

Stage 2: Peer Review. Here’s where it gets interesting. Each model receives all the other responses, but anonymized. “Response A, Response B, Response C.” No model names attached. Each evaluator must rank all responses by quality, without knowing whose work they’re judging.

Stage 3: Synthesis. A designated chairman—one of the models, or a different one—receives everything: the original responses, the rankings, the evaluations. It synthesizes a final answer that represents the council’s collective wisdom.

The anonymization in Stage 2 is pretty clever, because models can’t play favorites. They can’t defer to perceived authority. They evaluate purely on “merit”.

The Interwoven Assess-Decide-Do Pattern

If you’ve been following my work on the Assess-Decide-Do framework, the parallel should be obvious. The LLM Council isn’t just a technical architecture—it’s a cognitive process embedded in code.

Stage 1 is pure assessment. Gather information. Multiple perspectives. No judgment yet, just collection.

Stage 2 is decision-making. Weigh the options. Rank them. Make choices about what’s valuable and what isn’t. The anonymization forces honest evaluation—no shortcuts, no biases based on reputation.

Stage 3 is execution. Take the assessed information and the decisions made, produce the output. Do the work that matters based on what you now know.

I don’t think Karpathy was thinking about ADD when he built this-not sure he even knows about the framework. He was solving a practical problem for himself: “I want to compare LLM outputs while reading books.” But the structure emerged anyway.

ADD Inside the Council

Recognizing the pattern was interesting. But it raised a question: what if we made it explicit?

The original LLM Council treats all queries the same way. Ask about quantum physics, ask about your dinner plans—same three-stage process. But human queries aren’t uniform. Sometimes we’re exploring (“what options do I have?”), sometimes we’re deciding (“which should I choose?”), sometimes we’re executing (“how do I implement this?”).

The ADD framework maps these cognitive modes:

  • Assess (exploration mode): “I’m thinking about,” “considering,” “what are the options”
  • Decide (choice mode): “should I,” “which one,” “comparing between”
  • Do (execution mode): “how do I,” “implementing,” “next steps for”

What if the council could recognize which mode you’re in and respond accordingly?

I submitted a pull request that integrates the ADD framework directly into LLM Council. The implementation adds a configuration option with four modes:

  • "none" — baseline, no framework (original behavior)
  • "all" — all models use ADD cognitive scaffolding
  • "chairman_only" — only the synthesizing chairman applies the framework
  • "council_only" — council members use it, chairman doesn’t

The most effective configuration turned out to be chairman_only with the full megaprompt—66% improvement over the condensed version in my testing. The chairman receives the ADD framework and uses it to recognize what cognitive realm the user is operating in, then synthesizes accordingly.

Why Assess-Decide-Do Improves the Council

Language models are pattern-matching engines. They’re excellent at generating plausible text. But plausibility isn’t wisdom. A single model can confidently produce nonsense, and you’d never know unless you have something to compare against.

The council approach introduces deliberation. Multiple viewpoints, structured disagreement and forced synthesis. That’s already an improvement over single-model queries.

But the council still treats every query as a generic question needing a generic answer. ADD adds another layer: cognitive alignment. When the chairman knows you’re in assessment mode, it doesn’t push you toward decisions. When you’re ready to execute, it doesn’t keep exploring options. The framework matches the response to your actual mental state.

This matters because the best answer to “what are my options for X” is different from the best answer to “how do I implement X.” Without the framework, both get the same treatment. With it, the council adapts.

Looking at the Code

The core council logic lives in backend/council.py—about 300 lines of Python that orchestrate the three stages. The ADD integration adds a parallel module (council_add.py) that wraps the same stages with cognitive scaffolding.

The key function is stage3_synthesize_final(). In the original, the chairman prompt says:

Your task as Chairman is to synthesize all of this information
into a single, comprehensive, accurate answer to the user's
original question.

With ADD, the chairman first identifies which realm the user is in, then synthesizes with that context. The synthesis becomes realm-appropriate rather than generic.

The detection uses linguistic markers. Phrases like “I’m thinking about” or “considering” trigger assessment mode. “Should I” or “which one” trigger decision mode. “How do I” or “implementing” trigger execution mode. Simple pattern matching, but effective—it catches how people actually phrase questions differently depending on what they need.

Playing With It

Karpathy released LLM Council with a warning: “I’m not going to support it in any way. Code is ephemeral now and libraries are over, ask your LLM to change it in whatever way you like.”

That’s refreshingly honest. It’s also an invitation. If you want to experiment:

  1. Clone the repo
  2. Get an OpenRouter API key
  3. Configure which models sit on your council
  4. Set ADD_FRAMEWORK_MODE to test different configurations
  5. Run the start script

Then try asking questions in different cognitive modes. Ask something exploratory: “What are the approaches to learning a new language?” Then something decisive: “Should I use Duolingo or a private tutor?” Then something executable: “How do I structure my first week of Spanish practice?”

Watch how the council responds differently when it knows which mode you’re in versus when it treats all queries identically.

What This Means

There are two ways to make AI think more structurally: you can prompt a single model to follow a framework, or you can embed the framework into multi-model architecture.

Both work. They work better together.

A prompted framework (like ADD in a mega-prompt) makes one model more reflective. A council architecture makes multiple models more rigorous through external pressure—anonymized peer review that none can game. Combining them gives you structured multi-perspective reasoning that adapts to how you’re actually thinking.

LLMs are still pattern-matchers generating plausible outputs. But structured pattern-matching, like structured productivity, produces better results than unstructured generation.

Assess what you’re dealing with. Decide what matters. Do what needs doing. Whether that’s your Tuesday task list or an AI deliberation system, the rhythm is the same.


LLM Council is available on GitHub. The ADD integration PR is #89. The ADD Framework posts are collected on this blog in the Assess-Decide-Do Framework page. For the mega-prompt that applies ADD to Claude, see Supercharging Claude with the Assess-Decide-Do Framework.

dragos@dragosroua.com (Dragos Roua)

Source link