The Problems of AI Coding

The move from compilers to AI coding brings with it two main problems: non-determinism, and the loss of our mental models of software.

The first problem means that prompt outputs aren’t stable, and in long-term agentic work the compounding instability leads to chaotic systems with highly divergent outcomes.

As agent errors are often not mechanistically detectable, the system needs to act as a feedback loop with the agent being the detector and corrector, aided by mechanistic detection and correction, like compilers and tests, where possible. People have discussed this enough though, so I won’t linger here; I am more interested in the second and harder problem of no longer building mental models of software.

During the move from assembly to compilers, programmers lost the detailed mental model they had about how the software was structured and how it worked. This turned out to not matter much in practice because the new approach, the programming language and its compiler, encoded machine code in a new way that resulted in mental models better suited to the type of work humans care about, while guaranteeing those encodings will be translated in a well-defined way back to machine code.

Much of what was not encoded, like address offsets, proved not to be part of the valuable mental model but an artifact of the underlying physical implementation, not core to the mathematical object being created. The mathematical object here is the abstract intent we want to materialize into the world via software, like a feature to send emails, with all its goals and constraints.

The important part here isn’t the determinism or the specific encoding used by compilers, but rather the fact that by thinking about the wanted mathematical objects and by encoding those objects in software, the programmer builds a unified mental model of the mathematical object and of its encoding in software.

This mental model is what allows a programmer to reject a seemingly innocent change on the basis of it violating an important constraint, even though an outside observer wouldn’t find this constraint anywhere, not in documentation or product requirements or code, but the constraint is nevertheless real. Such critical constraints on how the program can be manipulated emerge from the interaction of the many components of a system, what it’s trying to represent (the intent), and the leftovers of past decisions, technical or otherwise.

The problem with coding completely via AI then is the near-complete loss of the mental model. Note that the mental model is not transferred from being owned by the human to being owned by the AI. The mental model is lost and nowhere to be found. After all, the human no longer writes code, is overwhelmed by AI code and so rarely or never reads it, the AI doesn’t learn, and so there is no one building and maintaining the mental model.

This single fact, the loss of the mental model, is a core reason for the commonly seen problems with AI-produced code like bad abstractions, duplicated code, unnecessary logic, inelegance, and the constant regressions when, told to build a feature, it breaks another.

All these are symptoms of a lacking, or in this case nonexistent, mental model. For example, if I notice that three different systems in my code all implement a certain operation, I can make a function that does the generalized form of that operation needed by the three systems, then call this function from those systems.

By doing this, I have deduplicated code, produced a good abstraction, made the code more elegant, and reduced the chance of diverging behavior or regressions, all in one action.

This action is only possible because I, the programmer, have built a mental model that allowed me to reason about the program holistically, as if it were one thing instead of many components, and therefore to take actions that improve the program globally rather than just locally.

AI coding makes the above nearly impossible because the AI doesn’t generally see these systems in the same context window, and any system it sees is quickly forgotten, directly causing massive, fragile, slow codebases that do too little for their size. The only way to keep building becomes to over-modularize software and to throw increasing amounts of AI compute at it with ever more complex tooling, which is exactly what we are seeing today.

If code is too low-level for the human and too large for the AI to handle all at once, and if normal prompts are too high-level to be used for building and maintaining production software, then perhaps we need an equivalent of high-level programming constructs to sit between requirements and code.

In the often-used assembly->compiler->AI analogy, there is a forgotten, implied piece: the programming language. The compiler is not foundational, but rather follows the language. The designer dreams up a programming language, its goals, constraints, and syntax, and then a compiler is built to bring it into existence, and if the language changes, the compiler changes to support it, but not vice versa.

Can a new type of representation, higher-level than code yet specifying it, more formal than natural language yet usable through it via AI, explicitly encoding constraints while being compact enough to represent production software in a context window, be a way forward? And if so, what would it look like?

Well for one, it definitely wouldn’t look like what we have today. Programming languages brought into existence constructs, such as structs and if-else blocks, that do not exist in machine code, but that can be well-defined through machine code, and made those constructs the interface of interaction. You think of and through those constructs, and build larger structures using them, enabling software to be made just as reliably as before, but faster and in a more compact representation than machine code.

The history of programming provides us with inspiration for how to evolve software development with AI. If you stop coding without a replacement representation that allows you to build deep mental models of how your software works and of its constraints, then you and the AI and the users will all suffer.

Note though that the development of new software representations is slow both because the problem is hard, and because making the new paradigm reach its full potential requires updating, replacing, or even inventing new tooling specialized for this new paradigm. For example, programming languages brought with them IDEs, advanced syntax highlighting, LSPs, text-optimized source control, and so on.

Nevertheless, I think a new software representation is one of the most promising angles of attack I have come across so far for tackling the challenges of AI coding. If this is something you have been thinking about as well, please do reach out on X or send me an email at [email protected], I would love to hear from you!

A Note on the Limits of Human Mental Models

At a certain level of scale, even humans fail to build complete mental models, which is why we break companies into departments and teams, and break systems into services with API contracts between them. The same thing is bound to happen to the mental model of humans and AIs coding together, no matter how good the representation and tooling is.

So the point isn’t to say a new representation allows an infinite mental model, but that the builders of a system require a mental model of their system much more detailed than what today’s AI coding or prompts allow, and that a new representation is an important step in fixing this.

A Note on the Limits of Human Mental Models#

A Note on the Limits of Human Mental Models