Most 'AI agents' don't need AI in the loop

Thiago Avelino

Most 'AI agents' don't need AI in the loop

Posted on May 20, 2026

I burned through a Claude Max plan in under two hours. First and only time. I was not training a model, generating a dataset, or running an autonomous agent in a loop. I was running my morning schedule. Tech briefing, deploy summary, inbox classification. Five tasks that fit in a fifty-line shell script each.

The framework underneath was OpenClaw. Self-hosted, MCP-native, multi-provider, durable event log. Clean marketing, serious engineering. It worked exactly as promised. And it was still the wrong tool.

What an agent is, according to the people who built the models

The canonical paper is from 2022. Yao et al., ReAct: Synergizing Reasoning and Acting in Language Models, ICLR 2023, Google Brain. ReAct introduced the loop every framework has copied since. The model generates reasoning traces and actions in an interleaved manner. Thought, action, observation, repeat.

The paper's scope is explicit. ReAct was proposed for interactive environments where the path to the answer is not known. The benchmarks they validated against - HotpotQA, FEVER, ALFWorld, WebShop - are all open environments. Yao et al. did not propose ReAct as a default architecture. They proposed it as a solution to a specific class of problem.

Anthropic formalized the distinction in Building Effective Agents, December 2024:

Workflows are systems where LLMs and tools are orchestrated through predefined code paths. Agents, on the other hand, are systems where LLMs dynamically direct their own processes and tool usage, maintaining control over how they accomplish tasks.

Right after, the part the framework ecosystem pretends not to have read:

When building applications with LLMs, we recommend finding the simplest solution possible, and only increasing complexity when needed. This might mean not building agentic systems at all. Agentic systems often trade latency and cost for better task performance, and you should consider when this tradeoff makes sense.

OpenAI converged on the same position in A Practical Guide to Building Agents, April 2025. Their criteria for using an agent: complex decision-making with nuanced judgment, hard-to-maintain rules, unstructured data. Three narrow conditions. Outside them, OpenAI tells you to use deterministic code.

The literature is unanimous. Agent is a specific architectural choice - the LLM controls the flow - for a narrow set of problems where ambiguity makes deterministic code expensive or impossible. Workflow is the other option. Workflow is what you need most of the time. The criterion is simple: does the model need to decide what comes next, or do you already know?

The framework ecosystem erased that question. OpenClaw, LangGraph, CrewAI, AutoGPT. The whole category sells agent as the default mode. The hidden premise: every task is ambiguous, every path is unknown, every step needs reasoning. That premise is false for most of what a CTO, an engineer, or anyone running a knowledge-work routine does day to day.

Where the bill ended up

Back to the Claude Max plan drained in two hours. The schedule on OpenClaw was small. Five tasks, YAML manifest, schedule to run every day at eight in the morning. I picked OpenClaw because clean marketing got me, and because I wanted to test the thesis that a reasoning agent could handle a routine.

It cannot. And the paper explains why.

ReAct decides from scratch on every execution. No compiled artifact. No 'I already wrote this script yesterday'. Every morning the model read the manifest, reasoned about which tool to call first, generated a Python or shell snippet to do the fetch, observed the result, reasoned about the next step, generated another snippet. By the end of one morning run, the model had produced and discarded code it had produced and discarded the day before. And the day before that.

The schedule was identical every morning. The path was known. The model was solving the same problem from scratch every day, because that is what ReAct does.

Anthropic warns in the paragraph above. Agentic systems trade latency and cost for better task performance. I was not getting better task performance. I was paying for a non-deterministic, expensive version of a cron job.

The realization was not 'OpenClaw is bad'. OpenClaw is a serious piece of engineering, built by serious people, solving a real problem. The realization was that I had used an agent (in the Anthropic sense) to run a workflow (in the Anthropic sense). The tool worked. The fit was wrong. And the fit was wrong because the entire framework category positions itself as 'the way to build AI agents', with no architectural taxonomy to help you notice you do not need one.

What is left when you admit it is a workflow

This is the part that took me longest to accept.

Workflow orchestration is a solved problem. It was solved thirty years ago.

Cron, 1975. Anacron, 1998. Daemontools, 1997. Monit, 2001. Runit, 2004. Supervisord, 2004. Launchd, 2005. Systemd timers, 2010. Alertmanager, 2016. Every piece of what an 'AI agent platform' claims to provide - scheduling, supervision, retries with backoff, health checks, notification - already exists. It has been hardened over decades. It runs on every server you have ever deployed to.

The 'AI agent platform' industry is reinventing infra older than most of its founders. Badly. And charging in tokens.

What the LLM actually adds is narrow and valuable. Ambiguous classification. Short synthesis. Draft generation. The exact spots where deterministic code does not reach. A call that asks 'is this email urgent?' is answered better by a model than by a regex. A call that asks 'summarize this Slack thread' is answered better by a model than by a template. These are tools. You call them when you need them, inside code that knows what it is doing.

The inversion the market does not sell: the supervisor is the skeleton, the LLM is muscle at specific points. Not the other way around.

Three of the agents I run every day make this concrete.

The first is a tech briefing that crawls Slack, Sentry, GitHub, Linear, and the Cloudflare dashboard every morning at eight. It joins the signals, prioritizes, and hands me a list of what to look at first. 95% of it is deterministic fetch. Authenticated HTTP client, GraphQL queries, timestamp filters, event dedup. Only the final step - the prioritization - calls a model. One call per execution. The whole briefing costs cents per day.

The second tracks DORA metrics. Specifically what is stuck. It cross-references open PRs, commits from the last seven days, and Sentry issues that appeared in post-release windows. It identifies what stopped. PR without review for three days. Branch without a commit for five. Release with an unassigned regression. Zero LLM. It is SQL and API. The intelligence is in the cross-reference I designed. PR crossed with Sentry crossed with commit log. The model has nothing to do here.

The third is the dumbest and the most used. Email labeling. Inbox falls into three buckets: reply, follow up, ignore. Here the LLM makes sense, because email classification is exactly the 'hard-to-maintain rules' case that OpenAI cites. But the way it shows up is specific. One model call per email, short prompt, structured output. It is not an agent deciding what to do with the inbox. It is a classification function called N times by a loop I wrote. The supervisor is me plus the systemd timer. The model is a tool.

Of the three agents, two barely use the LLM. The third uses it in small, surgical amounts.

The first version of these agents was written in fish script. It worked for months. In May 2026 I wrote in my Roam: "nossos fish agents estão crescendo muito e ficam meio chato de gerir em fish script né! vamos olhar pro que fizemos até aqui e criar um novo projeto chamado dotagent". I rebuilt them in Rust as dotagent. It is not an AI framework. It is a scheduling and supervision framework for tasks that know what they are doing, written in any language, with the LLM showing up as a function call when it makes sense. And absent when it does not.

This architecture has no pitch. It has cron, retry, notify, plus a model call in three specific places. It works. It costs less than a dollar a week. And it is exactly what Anthropic and OpenAI tell you to do, if you read what they wrote instead of what the next agent startup is selling.

Read what Anthropic and OpenAI wrote about agents before you build one. They are telling you not to, most of the time.

If your morning schedule is the same set of steps every day, you do not have an agent. You have a cron job with a superiority complex.

Most 'AI agents' don't need AI in the loop

What an agent is, according to the people who built the models

Where the bill ended up

What is left when you admit it is a workflow

Related Posts