Rebuilding DOMAutopsy in four hours, and the five things AI couldn't decide
I spent a Sunday afternoon rebuilding DOMAutopsy, the visual locator harvester I maintain, from a PySide6 desktop tool into a FastAPI server with a live web UI. The work took roughly four hours. Claude Code wrote most of the ~4 000 lines that landed.
That number, on its own, says nothing. It’s the kind of figure people quote on LinkedIn to sell a service. What it actually means depends entirely on which decisions you let the model take, and which decisions you keep for yourself.
Five decisions stuck with me from that session. None of them were technically hard. All of them would have produced a noticeably worse codebase if I had accepted the model’s first suggestion.
1. The system did not need RAG
Section titled “1. The system did not need RAG”The agent’s first proposal for capturing context across a long browsing session was to embed the page snapshots, store them in a vector index, and retrieve top-k chunks when the LLM needed to reason about the trajectory. It is a reasonable suggestion in 2026. It would also have added a dependency on a vector database, an embeddings model, and a retrieval pipeline that nobody on the user side asked for.
The real need was simpler. The agent observes the DOM live, the listener captures the selectors, the report aggregates everything at the end. There is no retrieval problem. There is a context augmentation problem, and it is solved by a flat structured payload passed to the LLM at each step.
The first architectural decision was to remove an entire layer that the model would have built without flinching.
2. CDP screencast, not noVNC
Section titled “2. CDP screencast, not noVNC”For the live web UI, two viable options surfaced: stream the Chromium headless instance over CDP screencast, or wrap it in a VNC server and serve it over noVNC. Both work. The model recommended noVNC because the literature it’s trained on shows it as the standard.
In practice, CDP screencast is lighter, runs in the same Playwright process, and skips an entire VNC server installation in the Docker image. The noVNC route would have added moving parts that nobody needs once Playwright already exposes everything via CDP.
This is the kind of call that takes thirty seconds of judgment and that the model has no way of making on its own. It optimizes for the average answer in its training set, not for the architecture you happen to be building today.
3. A seven-tier selector cascade beats a single LLM matcher
Section titled “3. A seven-tier selector cascade beats a single LLM matcher”Several open source projects from the last twelve months propose to delegate selector generation entirely to an LLM at each interaction. The model handles the DOM, picks the most stable selector, returns the result. Elegant on paper.
DOMAutopsy uses a different approach: a deterministic seven-tier cascade in a 372-line JavaScript listener, with the LLM only intervening at cleanup time. Tier 1 is data-testid, id, name. Tier 7 is short CSS or text XPath. The LLM never sees a selector unless the cascade fails or produces ambiguous matches.
The reason is straightforward. A pure LLM-based matcher is non-deterministic, expensive at scale, and creates a hard dependency on a third party at every interaction. A static cascade is auditable, replayable offline, and cheap. The LLM adds value only where deterministic logic stops being effective.
This is a structural decision that no agent will make for you, because every example it has read describes the opposite approach.
4. Regex parser, not AST parser, for legacy scripts
Section titled “4. Regex parser, not AST parser, for legacy scripts”DOMAutopsy can reverse-engineer existing test scripts (Katalon, Playwright, Cypress, Selenium) into structured tasks for the AI agent. The proposal from the model was to write a proper AST parser per language: Tree-sitter for Groovy, ts-morph for TypeScript, and so on.
For about 95 percent of real-world production test scripts, regex is enough. They are linear, mechanically generated, repetitive. AST parsing would have multiplied the implementation cost by ten and shifted the failure surface from “regex misses a syntax case” to “AST parser crashes on a malformed legacy file”.
I picked regex. I also added a clear note in the documentation: ~95 percent coverage, the rest goes to manual fallback. Honest constraint, manageable scope.
5. No SaaS lock-in, even when it would have been faster
Section titled “5. No SaaS lock-in, even when it would have been faster”At one point during the session, the model proposed to integrate a third-party hosted screenshot diff service to offload the visual comparison work. It would have shaved off some implementation time.
OculiX and DOMAutopsy both live in regulated environments. Banks, defense contractors, healthcare providers. Adding a third-party SaaS dependency without conscious decision would close doors I would later have to pry back open. I declined, the model accepted, the screenshot diff happens locally.
This is a non-technical call dressed up as a technical one. The model does not know that adding external-api.com to the data flow will make the procurement file twice as long for a CISO in a regulated sector. I do.
What AI actually does well, and what it doesn’t
Section titled “What AI actually does well, and what it doesn’t”In four hours, AI produced clean, working, mostly idiomatic code. The throughput is real. The output is not magic. It is the result of structured prompting, a clear mental model on the architect side, and a continuous flow of small judgments that close off bad branches before they grow.
The interesting question is not “can AI replace senior architects”. It is: which decisions are you still going to take yourself, and which are you ready to delegate. The five above happen to be calls I would refuse to delegate even with another twenty years of experience. They are domain-specific, context-aware, and they shape the rest of the project for years.
The rest, AI shipped it.
—