Friday, March 20, 2026

measuring code quality in an agentic world

My buddy Scott at Red Hat shared https://redmonk.com/sogrady/2026/02/10/besieged/ in the group chat.

It's a worthwhile summary of this point in development history.

Earlier I was appreciating the zeitgeist that maybe we were near the top of the S-shaped curve of what LLM could do, things might be settling down a bit. But recently I've been catching a different spin, that specifically the model releases last November really pointed to a new level of capability - something beyond the intuitions programmers experimenting with LLM the years before would have developed.  (The future might be the dev as head chef, with a bot sous chef bossing around a small hoard of underling agents.)

Scott mentioned Red Hat is pivoting to agentic workflow: "One interesting thing our CTO said was, we will not measure AI consumption, we will measure code coverage, quality, etc. Focus on outcomes."

Now, casting a blind eye to consumption reminds me of a tale from AOL/Millennial Media, when they thought pivoting to the cloud/AWS was giving them a bunch of dev and qa servers "too cheap to meter" - an expensive lesson they had to correct course around. And of course, there's the environmental impact (energy and water consumption - not to mention the "environment" of the crazy costs of GPUs and memory). And for all the democratization of code LLM offers, the idea that you must shell out month after month on a service (or buy some ridiculously priced hardware) in order to be competitive threatens to undermine the self-starting developer paradigm that got its start when home computers became cheap in the 80s.

But as I try to figure out how I can maintain my role as mediator between Business Folk and Machines I want to think about the second part of Scott's paraphrase, "code coverage, quality, etc"

Too often (in my heretical, anti-reductionist opinion) organizations go to quantify code coverage via unit tests percent coverage. But AI is going to require a new sense of what "quality" is:

* ability for future gens of your AI hoards to pick things up and iterate on earlier work

* consistent behaviors (externally testable)

* human-friendly UX (when applicable)

* edge cases covered

* one layer of AI check looking for weirdass shortcuts AI may have taken

* security, security, security

All of these have parallels in what traditional software development has to be keeping an eye on, but with humans being not as deeply embedded in the core loops, we will need more a more sophisticated way of quantifying how well things are going.

No comments:

Post a Comment